Page 121

4—
Design And Evaluation

Designing any sort of computer-mediated device for ordinary people for effective and pleasant everyday use has proven to be surprisingly difficult. The evidence for this observation comes from the myriad problems cited above in this report and at the workshop organized by the steering committee, from systematic empirical studies cited in this chapter, and from anecdotes involving frequent complaints from ordinary people when they are required to use the currently most common public-oriented application-telephone-based voice response menu systems-as well as from more sophisticated users of World Wide Web concerning the complexities and frustrations that have led as many to abandon the on-line life as to join it. (Consideration of the experiences and needs of people without specific special needs, referred to here as "ordinary" people, is an important complement to discussion of those with special needs (see Chapter 2) for developing ideas for research to support interfaces that work for more, if not most, of the population.)

It is, of course, possible that the greater power, utility, and desirability of computer-based functions as compared to traditional mass-market technologies (e.g., television, telephony) mean that greater difficulty of use is inevitable, worth a high price in human effort and inconvenience, and solvable only by increased education with its concomitant risk of leaving out those with insufficient time, resources, or ability. However, an alternate view is that it should be possible to use the power of the new technologies not only to do more and better things but also to do most of them at least as, or more, easily. Much of the burden of introducing new



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 121
Page 121 4— Design And Evaluation Designing any sort of computer-mediated device for ordinary people for effective and pleasant everyday use has proven to be surprisingly difficult. The evidence for this observation comes from the myriad problems cited above in this report and at the workshop organized by the steering committee, from systematic empirical studies cited in this chapter, and from anecdotes involving frequent complaints from ordinary people when they are required to use the currently most common public-oriented application-telephone-based voice response menu systems-as well as from more sophisticated users of World Wide Web concerning the complexities and frustrations that have led as many to abandon the on-line life as to join it. (Consideration of the experiences and needs of people without specific special needs, referred to here as "ordinary" people, is an important complement to discussion of those with special needs (see Chapter 2) for developing ideas for research to support interfaces that work for more, if not most, of the population.) It is, of course, possible that the greater power, utility, and desirability of computer-based functions as compared to traditional mass-market technologies (e.g., television, telephony) mean that greater difficulty of use is inevitable, worth a high price in human effort and inconvenience, and solvable only by increased education with its concomitant risk of leaving out those with insufficient time, resources, or ability. However, an alternate view is that it should be possible to use the power of the new technologies not only to do more and better things but also to do most of them at least as, or more, easily. Much of the burden of introducing new

OCR for page 121
Page 122 information technologies to the public can be removed or relieved by better design of the functions and interfaces with which most people will deal. The steering committee assumes that it is often or usually possible to design more widely useful functions and to make them easier to use through design activities specifically aimed at these goals. Proof of the existence of this opportunity is readily available, beginning with popular knowledge of such consumer devices as cars and television sets, which were very complex initially but became, from the user's perspective, less so through sequences of adjustments over time. The Handbook of Human-Computer Interaction (Helander, 1988) contains many examples of prohibitively difficult systems made very much easier and more effective by redesign, and many more recent examples are reviewed by Nielsen (1993) and Landauer (1995). Some of these successes are reviewed in more detail below in this chapter. To set the stage, one is mentioned here that involves comparatively simple store-and-forward (as opposed to more complex multimedia, hypermedia, or collaboration support) technology-a case that has particular relevance to much of the expected uses in the every-citizen interface (ECI) environment. Gould et al. (1987a) designed an electronic message system for the 1984 Olympics in Los Angeles. The system was to be used by athletes, coaches, families, and members of the press from all corners of the globe. The original design was done by a very experienced team at IBM's J.T. Watson Research Center. When first tested in mock-up with representatives of its intended user population, it was virtually impossible to operate effectively. By the time an extensive program of iterative user testing and redesign was finished, more than 250 changes in the interface, the system-user dialogue, and the functionality were found to be necessary or advantageous. The final system was widely used without any special training by an extremely diverse population. Another example comes from the digital libraries context and relates to the Cypress on-line database of some 13,000 color images and associated metadata from the Film Library of the California Department of Water Resources (Van House, 1996). Iterative usability testing led to improvements for two groups of users, a group from inside the film library and a more diverse and less expert group of outsiders. Both direct user suggestions and ideas based on observing users' difficulties gave rise to design changes that were implemented incrementally. A central research challenge lies in better design and evaluation for ordinary use by ordinary users and, more basically, in how to accomplish these goals. The future is not out there to be discovered: it has to be invented and designed. The scientific challenge is to understand much better than we do now (1) why computer use is difficult when it is, (2)

OCR for page 121
Page 123 how to design and ensure a design for easier and more effective use, and (3) how to teach effectively both school children and those past school age to take advantage of what there is to use (a complex topic outside the scope of this report). Available research and expert opinion point to at least three reasons why many computer-mediated tools (including, especially, communications systems) are currently difficult or ineffective for use by a large part of the population: (1) complexity and power of computer-mediated tools, (2) emphasis on users with unusual abilities, and (3) sophistication of designers and their discipline. The Problem Complexity and Power of Tools Computer-mediated tools, as compared with traditional technologies, can be extremely powerful and complex, doing a vast array of different things with enormous speed. Of course, this is their advantage and appeal, but it is also their temptation. It means that a communications facility such as e-mail can be designed not only to let a user send an asynchronous text message to another subscriber but also to send multiple messages, create mailing lists, respond automatically, forward, save, retrieve, edit, cut and paste, attach attachments, create vacation messages, fax, and so on. If the design is not handled extremely well, users will have to learn how to negotiate this vast array of options, to know about them and how to operate them if they want to use them, and at least how to ignore them if they do not, and will always be required somehow to choose whether and what. The situation can become analogous to providing the cockpit control panel of an airliner for use by its passengers to turn on their reading lights. The consequences in computing range from the proliferation of features in software products to observations that most amateur spreadsheets contain serious errors, and that employee hand-holding costs as much as hardware for business personal computer users, to additional but seldom-used features on standard computer keyboards.1 The concept of multimodal interfaces that would accommodate alternative approaches to input and/or output, discussed in Chapters 2 and 3, will introduce considerable complexity into the technology development process, without adding any new functional features. Great power and complexity also bring the opportunity to make very costly errors. Pressing the wrong key on an ordinary telephone touch-tone pad leads at worst to a wrong number. With a computer-mediated system it can, and often does, lead to hours of lost work or inadvertently sending, for example, a "take me off this mailing list" message to 300

OCR for page 121
Page 124 people, many of whom also wish they were not on it. Laboratory studies have found repeatedly that the majority of user time spent with popular applications such as word processors (which will be incorporated into many ECI applications) or spreadsheets is occupied with recovery from errors (see, for example, Card et al., 1983). This is one of two reasons why computer-mediated activities (see below for the second) create very much more variability in task completion times than do traditional technologies (Egan, 1988). Contemporary discussions in the business and personal press about the "futz factor"-extra time and effort to adjust various aspects of a computer-based system-attest to continuing problems resulting from increased complexity and power. The irony is that in some cases (e.g., early cellular phones, personal computer software), a significant amount of complexity appears to derive from software and sometimes hardware added with the intention of "enhancing" usability.2 Emphasis on Users with Unusual Abilities Computer-mediated tools emphasize individual differences in ability more than do traditional technologies. Egan (1988) reviewed a large number of studies of individual differences in the time taken and errors made in using common computer applications. In every case in which comparisons could be made, the variability among different people was much greater when they used computers rather than precomputer approaches to doing the same sorts of operations. An approximate summary of the data from these studies is that while most traditional tasks, such as operating a conventional cash register, calculating a sum (manually), or running around the block, will take about twice as long for the slowest of 20 randomly chosen people than for the fastest, in computer-mediated tasks the range is never that small; typically, it is around 4 or 5 to 1, and may be as high as 20 to 1, even among well-trained and experienced users such as professional programmers. In several instances a good portion of the greater between-individual differences in computerized tasks has been traced to measurable differences in cognitive abilities. In the aggregate, workshop participants commented, such differences contribute to observations about the concentration of computer use among teenage males; they also contribute to reports in the business press about the frustrations of "information overload."3 Egan (1988) and Landauer (1995) reviewed studies in which measures of spatial memory, logical reasoning, and verbal fluency, as well as age and interest in mathematics and things mechanical, show greater than two-to-one differences between the highest and lowest quarter of the sampled potential user populations (see Figures 4.1 and 4.2 for examples). The participants in the studies illustrated were mostly noncareer middle-class

OCR for page 121
Page 125 suburban women with little or no computer experience, fairly representative of the average citizens one might expect to be future network users, although not of their range. How significant a problem is this? One guess comes from studies of the efficiency gains expected for computer applications to common business tasks. From the sparse available data, Landauer (1995) estimated that computer augmentation speeds work by 30 percent on average (with large variations). Combining this with the individual-difference estimates and a normal probability distribution suggests that about a third of the population would usually be better off without computer help as now provided because they do not possess the basic abilities prerequisite to its effective use. This is without consideration of the part of the population ordinarily designated as disabled or disadvantaged. While education and training can usually reduce individual differences, there are two reasons why computer-mediated tasks may be less susceptible to this solution. One is the aforementioned vastly greater complexity usually offered-the much larger variety of different functions available and alternate means for achieving the same effects (e.g., five or more ways to cut and paste in most recent text editors). This variety often means that it can take longer to acquire high skill, akin on a smaller scale to the greater difficulty of learning to fly a jetliner than to drive a car. It also means, often, that some users will find better ways to operate their system than will others, not because there are large differences in which method serves which person best-such ''treatment-aptitude interactions," despite widespread folk belief in their existence, have virtually never been found in carefully controlled studies-but merely through chance variations in which operations users learn first, make habitual, and thus allow to become dominant over other possibilities that it thereafter takes excessive time to find and retrain for. The second reason that computer training is less helpful than training for earlier technologies is the much more rapid and challenging changes in the technology itself. The basic automobile, typewriter, and telephone have not changed significantly from the user's perspective in almost half a century, and changes from their very beginning have been few, slow, relatively minor, and learnable without help (nonuse of extra features, such as the clocks on video cassette recorders or cruise control on cars, does not tend to be associated with an inability to use these devices for their essential functions). By contrast, every new model of a personal computing software package, even from the same manufacturer, comes with many new features and functions, new menu arrangements with new labels, and a large instruction book (and built-in help system). Such enhancements can affect even basic tasks. And every few years another new computer-based technology is offered. Thus, there is simply not the time available to consider yearlong high school courses for each computer

OCR for page 121
Page 126 technology every citizen might want to use-this year e-mail, next year the World Wide Web-as there was for typewriters and accounting machines, or 7-year apprenticeships, as there were for steam shovels and looms; the systems would be obsolete and gone before expertise was gained. The result is that high-functionality computational systems are never completely learned nor is their power fully exploited, and the primary learning strategy is based on learning on demand (Fischer, 1991). The challenge is to design so as to exploit the potential power for ease of learning and use as well as for increased functionality. Discontent with proliferating features contributed to mid-1990s experiments with so-called network computers, with fewer features than conventional personal computers, as well as to periodic articles in the business press about the persistently high costs of owning and using personal computers.4 Several members of the steering committee and reviewers of a draft of this report wondered whether the low-efficiency gains and large individual differences found in studies in the 1980s may have been overcome by technological advances in the 1990s. Although market statistics attest to growing use of information technologies, the sparse empirical evaluations of these issues in earlier periods appear to have become no more common in the past 5 years. While it was not possible to mount a systematic search for empirical evidence on trends in usability, the consensus of the usability engineers on the steering committee and among workshop participants was that things have not in general improved: for the most part, technological advances, particularly in software, have increased complexity, and, while some vendors are doing more usability testing, increased competition to be first to market with new features has brought a growing tendency to omit the kinds of early and iterative design and evaluation activities these experts think is essential to ensure ease of learning and use. In addition, what testing is done often generates results that vendors hold closely in the interests of gaining or preserving proprietary competitive advantages. Market forces alone are unlikely to yield interfaces for every citizen because the rapid pace of the commercial market fosters an emphasis on sales performance as an indirect measure of value or effectiveness rather than direct presale evaluation of interface quality. At the workshop, Dennis Egan suggested several reasons for the lack of attention to interface evaluation by the industry. First, industrial research groups have reoriented themselves toward identifying near-term profit-making products and services, not performing longer-term research to evaluate new interface concepts and technologies and usually not publishing helpfully detailed results when they do. Second, the acceleration of product life cycles-particularly software-leaves little time for interface evaluation studies. Third, information technology products may succeed despite

OCR for page 121
Page 127 having inferior user interfaces by supporting highly desired functions, reaching the market before their competition and becoming de facto standards. Workshop participants from industry and report reviewers emphasized the commercial dependence on marketplace Darwinism, noting that vendors seem to find fielding their best guesses in products more cost effective than added precommercialization testing. Some went further to suggest that the World Wide Web had provided a mechanism for harnessing market cycles, noting that some vendors are using Web sites for beta testing of products and for eliciting feedback from those users (mostly sophisticated and eager "early adopters" unrepresentative of the average citizen) who opt to try the products. The constant release of beta versions of software over the Web represents a limited kind of software evaluation and user involvement on a massive scale; some of these releases are now reviewed, sometimes even on the basis of modest empirical tests, in trade publications, and usability and other design experts are dedicated by some vendors to some releases. Several companies are using this mechanism for iterative design. However, work by Hartson et al. (1996) suggests that methods for using "the Web as a usability lab" effectively, while promising, are in their infancy and face a number of problems that will be resolved only by considerable research. For example, this approach will require significant innovation in system instrumentation and user sampling techniques because, as outlined later, the untutored opinions of programmers and other power users are usually of little value for detecting the functionality and usability problems that are important for ordinary people (Nielsen, 1993). Tracking such efforts in broad-based user involvement and assessing their effectiveness might provide a productive starting place for research on large-scale participatory design and evaluation methods. Sophistication of Computer Hardware and Software Designers and Their Discipline Most of the people involved in the design and implementation of functions and interfaces for computer applications are themselves sophisticated computer users. Feature requests and inventions come primarily from experienced users and are supplemented and implemented by programmers. The situation is unlike that in other consumer-oriented technologies in two important respects. As noted earlier, computers offer and usually provide a larger range of functions and controls and therefore almost always greater complexity in the choices and actions required of the user. Hence, expertise with a computer technology can often play a much greater role in its use. Computer technology started as an aid for a highly technical portion of the population-scientists, engineers, and

OCR for page 121
Page 128 mathematicians, most of whom were capable of designing at least some software themselves, and often did.
5 To a considerable extent, computer designers have designed for their own use and for that of people like them; the design of computer applications is still primarily in the hands of programmers and other software specialists, albeit now as leaders of large teams and abetted by marketers, physical designers, and managers. Perspectives of other kinds of people-the differently abled, those with low levels of income and education, those resistant to technology merely for its own sake, and so on-have been represented most commonly by proxy or surrogate, if at all. Not only are software specialists typically more experienced with the technology, but they are also, in general, quite different from the average user in the characteristics and abilities currently needed to deal effectively with computers: youth, mechanical and mathematical interests, good spatial memory, verbal fluency, and logical ability. They also tend to be less socially and pragmatically oriented in personality (Tognazzini, 1992). Although they may attempt to incorporate models of user behavior, behavioral scientists at the workshop noted that in practice designers tend to assume model users-users whose behavior poses fewer problems than actually experienced. As a result, it is extremely difficult for today's computer-based systems designers to have good intuitions about what will and will not be easy and gratifying for all citizens. This situation is illustrated by a press account of a Microsoft consumer product team's visits to five families for 3 hours each, reporting surprise about and better understanding of presumably ordinary households (Pitta, 1995). The rise of the World Wide Web and experimentation with it by a widening range of people provide many illustrations of the challenge to designers. In a recent e-mail discussion on the topic, it was mentioned that ordinary citizens might experience difficulties in finding e-mail and Web addresses, to which a well-meaning expert replied that there were three universal resource locators (URLs) on the Web that could be searched and that at least one of them would usually locate a person's address. It seems unlikely that this procedure would be very appealing or effective for most citizens who are not already frequent and accomplished users-the suggestion is consistent with an expert rather than an every-citizen interface. True, the availability of these searchable databases means that the possibility of eventually providing a good directory service for every citizen exists, but the necessary next steps need taking. The anecdote suggests that this may be a larger than obvious task, since the difference between usually and always locating a person's address may affect how broad a segment of the population finds the service desirable and how much the Internet can contribute to a truly national information infrastructure (NII). The trends toward supporting do-it-yourself

OCR for page 121
Page 129 activity that extends to assembly and customization of software systems from components and modules by users appear likely, in the near term, to exacerbate the challenge of serving more of the population. (One of the steering committee members has been told by a major manufacturer that less than 10 percent of office workers ever change the factory configuration of their ergonomically adjustable chairs.) In short, evidence discussed at the workshop indicates that an organized design and development process that ensures that the needs and abilities of potential average citizen users will be well taken account of has not yet become standard practice in software to nearly the extent that it has in the manufacture of most other mass-market products. Workshop discussions among technical experts and social scientists knowledgeable about specific population segments attested to the diversity of needs, reactions, and other qualities within the population as well as the uneven appreciation for that diversity. The Possibility Of Easier-To-Use, More Effective Systems There is ample evidence that computer systems with highly useful functions can be designed and built for easy, pleasant, and effective use by every citizen. Figures 4.1 and 4.2 give two examples. In both cases a function that could be used at an adequate level by only a minority of people was redesigned so that everyone could use it well. Moreover, improving usability for the less capable users did not penalize the more capable. These cases and others like them show that paying attention to the needs of novice users can often be accomplished without undesirable tradeoffs for expert users. Indeed, it is commonly the case that redesigns that help occasional users are even more helpful for frequent users; for example, effective free-form queries such as those provided by Excite and Latent Semantic Indexing will allow both novices and the most sophisticated systems analysts to search the Web more easily and effectively and with fewer frustrating, time-consuming errors than are common with Standard Query Language (SQL) or Boolean search formats. Two additional examples of success in improving usability through redesign are instructive. In one case, an e-mail system was redesigned for simple text message interchange, e-mail's most popular use. The system was always on (no log-on was required), like a telephone, and had a screen that said "To," a simple backspace and retype editor, a button labeled "Send," and a printer that printed only when a message arrived. A group of elderly women-a segment of the population shown by data and demographics to be especially technophobic and less likely to succeed at computing-learned the system after about 30 minutes of training

OCR for page 121
Page 130 FIGURE 4.1 Using a standard relational query language (SQL) to make simple database searches after a half-day of training required greater-than-average logic-expressing abilities. Based on user studies, a new query language was invented that everyone finds easy to use and effective. SOURCE: Landauer (1995). and used it eagerly. (The same e-mail system was preferred by several high-level executives of a telecommunications research company, all technical Ph.D.s who had easy access to a much more powerful system.) By contrast, today's typical e-mail systems are usually introduced to business employees in full-day training classes. The second example involves hypertext. In the majority of experiments evaluating how well people can find information in the same large book, manual, or encyclopedia using traditional print versions and on-line hypertext versions, people did significantly better with paper (see, for example, Gould and Grischkowsky, 1984; Gould et al., 1987b6). But in a few cases, people using the hypertext systems have greatly outperformed those using the old technology (see Landauer, 1995, for a review). The difference has been attributable to the design of the hypertext system, and especially the methods by which the design was done. When conflict between ensuring usability for relative novices and providing power for the highly trained is unavoidable, perhaps because

OCR for page 121
Page 131 FIGURE 4.2 When people had to compose queries in a natural language to find information, only those with above-average verbal fluency succeeded. When, instead, people could submit examples of documents they wanted, everyone did well. SOURCE: Landauer (1995). of entrenched development and marketing techniques or the inherent sophistication of some applications, two complementary approaches are possible. One is to provide differing levels of functionality for different users. Several usability specialists at the workshop reported routinely advising designers to provide as many functions, features, and options as will be useful, feasible, and in demand by experts, but to "hide" them from users who want only basic functions-by, for example, retaining the simplicity of short menus that emphasize only the best general functions and offer the option of selecting an "advanced functions" button for access to special features. The second approach is, of course, to increase the sophistication of users through education, training, and access to good guides and manuals (e.g., "training wheels" and "minimal manual" techniques, and scaffolded and staged advancement). Future computing functions of use to many citizens may well require fundamental understanding of concepts and operations that are not now taught in school-iterative

OCR for page 121
Page 143 (e.g., HomeNet (Kraut et al., 1996); Blacksburg (Va.) Electronic Village project, (http://duke.bev.net)), are helping to address the need for both realism and control in evaluating social applications. However, as the size of user groups increases-as reference to "every citizen" suggests-some participants were not certain about how well any of these approaches would scale up. (These concerns surface above in discussions of social-interest applications of the NII.) Several workshop participants noted that once one moves beyond a focus on personal computers as the access device and considers all manner of devices-telephones, television remote controls, and so on, as well as embedded systems-the problems and opportunities add up to a very large set. Inherent Unpredictability of Use For reasons both practical and theoretical, predicting the performance of social applications in real-world use on the basis of prior research is inherently difficult. Practically speaking, cut-and-try or design-and-fix methods-those most likely to yield accurate results-are least likely to be employed for social applications because of the time frame and scope of uses they entail, as suggested above. In theory as well, groupware uses are hard to anticipate because they are embedded in a social system that exerts effects quite independently of the technology. For instance, the social system of work had a significant influence on the automatic scheduling applications studied by Grudin (1988; see above). Managers, who most often called meetings, were most likely to have secretaries who kept their on-line calendars up to date and handled "workarounds" by phone when others had not put their schedules on-line; so managers benefited from the application but experienced none of its burdens. Lacking secretaries, professional users experienced all of the burdens but few of the benefits and soon gave up on it (Grudin, 1988). There had been several task analysis studies of the problems and promise of schedulers, and many had predicted just the problems Grudin cites, but they were ignored or unknown to proponent designers. As Markus and Connolly (1990) have pointed out, managers sometimes solve these kinds of problems simply by mandating the use of an application. In turn, however, clever professionals respond by gaming the system so that what appears in the on-line calendar is what is most convenient or most socially desirable, regardless of the actual status of the individual's time commitments. Similar results have been reported for use of shared databases by Patriotta (1996). These outcomes, reflecting interventions by the social system, are even more removed from expectations based on untested designer intuitions. It should be emphasized that

OCR for page 121
Page 144 social "reinventions" of technology are not necessarily negative; on the contrary, research literature provides a great many instances of user-based improvements (e.g., Bikson, 1996; Orlikowski, 1996). The point, rather, is that unpredictability inevitably characterizes the use of groupware because of the reciprocal adaptations of the technology and the social context in which it is situated. Implementation as a Critical Success Factor Implementation, construed as the complex series of decisions and actions by which a new technology is incorporated into an existing context of use, assumes critical importance as a success factor for groupware given the reciprocal influence of social and technical sources of effect cited above. During implementation, the new technology must be adapted to work in particular user settings even as users must learn new behaviors and change old ones to take advantage of it. At the workshop, Sara Kiesler cautioned that experiences related to the performance of specific tasks (e.g., by telephone operators) will not necessarily generalize to the larger NII. Specific tasks tend to be tightly delimited and jobs of the performers in typical studies depend on their use of the system; in the NII, in contrast, there is a huge variety of tasks, a huge variety of users, and the users have more choice in what they do and how. Walter Feurzeig, of BBN, argued that it is nevertheless difficult to consider user interfaces independent of specific activities. Sara Czaja, for example, drew from her work in medical trauma settings to emphasize that real experience in real contexts is necessary to understand interface needs at, for example, physician workstations. Help features of the system and user training as well as modifications of the application and changes in users' behavior, for example, affect the course of implementation. Current research on work group computing corroborates the conclusion that the effectiveness of the implementation process itself has a substantial impact on the usability and usefulness of social applications somewhat independently of their design features (Mankin et al., 1996; see also the literature reviews in Bikson and Eveland, 1990). The vital role of implementation also emerges as a salient factor in the life of new civic networks, according to their administrators (see Anderson et al., 1995). Nonetheless, evaluation efforts frequently target technology design as it bears on specific functions, leaving implementation processes and related features (e.g., help screens, on-line tutorials, user manuals) out of account in attempting to predict use. Further, although it is clear that many desirable changes in social technologies cannot be anticipated before their deployment in specific user settings, these applications are not usually designed with a view toward ease of modification either by end

OCR for page 121
Page 145 users or by service providers who maintain end-user systems. On the contrary, desires on the part of end users or those who provide information technology assistance are usually regarded with suspicion by designers and developers (Ciborra, 1992). Given the significant variation in uses, users, and user contexts represented by everyday citizens, along with serious questions about how their NII-based interactions can be supported, such implementation issues merit considerable attention. Directions for Improvement For reasons like those reviewed here, it is manifest that systems intended for use by communicating social groups-including large populations-raise many kinds of questions that individual applications do not. The design and evaluation techniques appropriate for individual applications need to be extended or supplemented with approaches more suitable for the envisioned NII environment. While there is not a large body of empirical work on which to draw for this purpose, research on computer-supported cooperative work and technologies for collaboration yields suggestive directions for improvement. Some promising approaches are summarized below. Involve Representative Users in Substantive Design and Evaluation Activity Early and Often Participatory design is difficult to arrange, as noted above, and so more likely to be slighted. The goal is to understand how interfaces to connected communities may prove more than skin deep, how they may affect how we locate and remain aware of one another and find shared information, as well as how we understand, enact, and track our roles in group activities, recover from errors, merge our work with others, and so on. An illustrative example comes from an exploration of how new technologies could assist wildlife habitat development by the U.S. Forest Service. To support wildlife habitat protection, forest service teams needed an interface to varied databases (e.g., about soil, vegetation, water quality, forest wildlife) that would permit different experts literally to overlay their views of a geographic territory on a shared map, create and manipulate jointly devised scenarios, and observe the results. The design of such an application required the participation of users with specialized domain expertise from its inception to its evaluation in field trials. NII-based applications envisioned for ordinary use (see Chapter 2) are no less complex and are similarly likely to require participatory design with representative users; offering lifelong learning, continuing education, or targeted

OCR for page 121
Page 146 training, for instance, or delivering selected health services on-line, are cases in point. In these and other social applications, methods for design and evaluation that discover and fix problems before they are widely promulgated are especially important. Many workshop participants believe these needs are particularly acute in areas-such as education and health care-that are now being eagerly promoted and anticipated for NII applications. One obvious approach is to conduct field trials with smaller than universal, but still representative, population samples; this procedure is as yet seldom followed. Often, as workshop participants noted, experts-both system designers and such specialists as speech or occupational therapists-may play the role of representative users; sometimes a think-aloud approach is used in which users comment on their experiences as they use a system. A related question is simply how to design and evaluate with the full range of the population in view, rather than drawing on educated middle-class citizens who have constituted the potential or actual computer user samples typically studied in the past. Expand the Repertoire of Research Methods to Be More Inclusive and Innovative There is a pressing need for social-psychological, sociological, and organizational research into how innovation, development, and implementation processes should be arranged and managed so that the goal of every-citizen utility is effectively pursued. Issues like those raised above clearly require techniques for research with large populations, for instance, by survey methods or perhaps sampled observations; as yet there is little experience in the use of these techniques for design and evaluation of large networked social applications. In discussing the prospects for instrumenting various systems, an interesting opportunity broached by participants was to use the Internet itself to conduct experiments and surveys, to record usage data (in anonymized ways) stratified by user categories and applications, and to assess the properties of emerging social networks (for examples, see Eveland et al., 1994; Huberman, 1996; Eveland and Bikson, 1987; Dubrovsky et al., 1991; Finholt et al., 1991; and Kraut et al., 1996). Practical issues may relate to protection of user privacy and to the nature of actual user populations (e.g., early adopters of the Internet may not be representative). Thus, consideration of how to get back good information is itself a research issue. Trials and assessments of the suitability of these and other design-and-evaluation techniques for large and widely varying populations would be very worthwhile.

OCR for page 121
Page 147 Consider Ways to Minimize the Separation of Design and Evaluation from Implementation and Use On the one hand, new computer-based technologies continue to emerge in the market at an incredibly rapid pace, and this trend will only be accelerated by population-wide access to the NII. On the other hand, recommendations to use methods for research with representative population samples to ensure the usefulness and usability of social applications before their implementation and use seemingly entail a much more leisurely pace for innovation. This dilemma suggests that it might be worthwhile to reconceptualize as concurrent or overlapping processes the traditional linear sequence from design, iterative trials, and redesign to implementation, use, and inevitable user ''reinvention." This suggestion draws, in part, on the concurrent engineering model; in bringing together the designing and building stages of technology development, it reduced the total time involved while enabling designers and engineers to learn more from one another in the course of coproduction. It also builds on rapid prototyping approaches that draw no sharp boundaries between prototype trials with representative users, field pilot projects, and early-stage implementation processes (e.g., Seybold, 1994; Mankin et al., 1996). Finally, it takes into account the unfeasibility of "getting it right the first time" as a guiding principle for NII applications. As virtually every study of communicating social applications has shown, these technologies are invariably modified in use in ways that respond to user contexts, changes in skills or task demands, and changes in the suite of applications with which they must be integrated. That is, the application should not be viewed as "finished" or static just because it has left the developer's world (Bikson, 1996). New back-end technologies (e.g., client-server architectures, middleware) make it possible to keep the infrastructure or platform in place while delivering, updating, and supporting new tools and applications in user environments over a network. This is the principle behind new efforts to conduct product beta-tests via the Web, as noted earlier. Given the desirability of involving greater numbers of representative users in application design and evaluation as well as field trials and implementation, and given the capability of networked systems to enable both the provision of usable prototypes and the collection of user feedback, it would be desirable to explore options for leaving applications intentionally underdesigned, to be adaptively developed as they are implemented in contexts of use (see Box 4.2).

OCR for page 121
Page 148

BOX 4.2 Toward Informed Participation Technology that genuinely supports informed participation will be inherently democratic and adaptable. It will allow us to take advantage of our social diversity and not force us to conform to the limits of our limited foresight. The philosophical model for understanding knowledge acquisition and the communication of information holds at least three primary lessons for anyone designing or deploying information systems for groups of people, as follows: Focus more on relationships than things. Information technology can and should change relationships among people; that is where its chief value lies. Information technology that changes the nature of relationships can change the fundamental features of a given complex system. Honor "emergent behavior." The new theories of complex adaptive systems hold that the adaptability of any system greatly depends on the "genetic variance"-or pluralism of competing models-within it. Therefore, information technology should allow the emergence of competing agents (or models or schema) and enhance their interrelationships. Underdesign systems in order to let new truths emerge. It is a mistake to set forth some a priori notion of truth or to try to design in totality (which requires an infinite intelligence in any case). Rather, one should underdesign a system in order to assist the emergence of new ideas. The brilliant logic of an underdesigned information system is well illustrated by the constitutional and cultural principles espoused by Thomas Jefferson, one of the preeminent information architects of all time. SOURCE: Brown et al. (1994). Consider the Prospect of Research-based Principles for Design Regardless of the perspective taken, the bottom line is that what we know now about evaluation and design methods is not good enough to meet the challenges presented by every-citizen applications in an NII context. Although there are good methods and techniques available for evaluating ideas and systems for individuals at all stages of development and providing tests of usability and guidance for design, none of the workshop participants thought that evaluation methodology was a solved problem. Although a few comparative studies have been made of some of the different methods in use-user testing, heuristic evaluation, cognitive walkthroughs, scenario analysis, ordinary and video ethnography-these studies have not reached any unequivocal conclusions; indeed, there is active controversy about their relative advantages. This is an area in

OCR for page 121
Page 149 which more and more systematic research would almost certainly have great impact. Some of the current evaluation methods are orders of magnitude more expensive in terms of time and money than others, often prohibiting their use and often inhibiting the use of any evaluation, yet we do not know for sure whether they reliably produce better, or even different, information or result in better or different products. Such research should, of course, also be aimed at finding better methods. In particular, research is needed on what kinds of evaluation give not just summative quality estimates but also useful formative guidance that leads to better design. These kinds of problems and uncertainties about evaluation techniques and methods lead naturally to reawakened interest in the prospect of research-based principles for design. It has often been hoped by the scientists and technologists involved, and perhaps even more often by their managers, that the design of useful and usable interfaces could be based on theory, engineering principles, and models rather than sheer cut-and-try and creativity. There have been some modest successes along this line. As mentioned earlier, there are models of the perceptual-motor processes involved in operating an interactive device that can predict the times required with useful accuracy. So far, these have had their greatest utility in the design of computer-based work tools where large numbers of people will do the same operations large numbers of times so that small savings in time will add up to large savings in money. In addition, there are some models and means of analyzing and simulating the cognitive operations of users of complex computer-based systems that are often capable of yielding important insights for design or redesign (e.g., Kieras and Polson, 1985; Kitajima and Polson, 1996; Olson and Olson, 1995; Carroll, 1990). And there are a dozen or so basic principles from experimental, perceptual, and cognitive psychology that can be put to work on occasion by insightful experts. However, for everyday guidance about the design of everyday interfaces and functions for every citizen, current science and engineering theory are of little help. One reason is that both the human and the potential computer-based agents involved, and especially their combination, are extremely complex dynamic systems of the sort that are not often reducible to practical closed-form models. They appear to be more like the phenomenon of turbulence that plagues airframe design or the chaos that confronts weather prediction than they are like the design of circuits; they are matters in which test-and try is unavoidable. It is often mystifying to usability professionals that testing is resisted as strongly as it is and that calls for doing principle-based design are so frequent in this arena, when practitioners and managers concerned with other complex dynamic systems (even electronic circuits and software) can easily see the need and strongly support empirical methods.

OCR for page 121
Page 150 This hope of avoiding test-and-fix methods is astonishingly persistent. For example, there is a myth in circulation that the Macintosh interface, which for certain basic functions has demonstrated large usability advantages over its predecessors, was accomplished without user testing. The truth could not be more different. At Apple Computer, the Macintosh interface was developed originally for the Lisa computer, building, in turn, on the highly structured design and testing process for the Xerox Star system. During its development, it was subjected to an exemplary application of formative evaluation and involving nearly daily user testing and redesign. Moreover, the graphical user interface (GUI) components of the Macintosh interface can be and have been combined in ways that do not produce superior results, while some old-style command-based applications that have been iteratively designed are just as usable as the comparable Macintosh-style GUI applications (see Landauer, 1995, for a review and examples). While research on both the fundamental science of human abilities and performance and the engineering principles for better usability certainly could be highly worthwhile in the long run if adequately pressed, progress to date has been slow, and a principle-based approach probably cannot be counted on to underwrite the design of effective every-citizen interfaces in the near term. On the other hand, many of the scientists who have worked on these problems believe that attempting to understand the issues involved in the interaction of people with computer-based intellectual tools and with one another through these tools offers an excellent laboratory for studying human cognition. The problems posed, and the nature of the response of the world to what a human does, can be controlled much better in this environment than, say, in a classroom, and yet are much more realistically complex than in the traditional psychological laboratory. Moreover, the end-result test, making interactions among and between humans and computers go better, requires not just piecemeal modeling but also complete understanding, an especially useful criterion in studying human cognition and communication that can take so many new forms and functions. Thus, more support (of which there is currently very little) of basic human-computer interaction research, especially at the level of the cognitive and social processes involved, could be quite valuable as science. Progress In concluding this discussion, the steering committee notes that some technologists, economists, and others have expressed the belief that problems of usefulness and usability are sufficiently solved by market competition and that, in particular, most earlier problems with user productivity

OCR for page 121
Page 151 have been overcome. There is indeed some anecdotal evidence that large software producers are paying more attention to these matters, and with good effect. For example, a report from Microsoft (Sullivan, 1996) describes iterative user-interface design efforts for Windows 95 that followed prescriptions for interface development suggested by recent research (e.g., Nielsen, 1993; Landauer, 1995; Sawyer et al., 1996). As prior research has found, user test results showed a gain of approximately 50 percent in user task performance efficiency as a direct result of usability engineering activities. Several lessons can be taken from this and recent, similar reports. First is the encouraging sign that assessment-driven design is being applied to significant projects and that it is working. A more cautionary lesson, however, is the authors' report of how narrowly the Microsoft project escaped neglect of assessment on several occasions, and how important the consequences would have been. In moving the interface design from that of immensely popular Windows 3.1 and 3.11, the team reported, it had originally believed that, because the previous interface was so well evolved and so successful in the marketplace, only small evolutionary changes based on known flaws, user complaints, and bug reports would be needed. However, early direct user tests and observations "surprised" the team into a realization that many critical problems could be solved only by a complete redesign and that many opportunities existed for significant innovative improvements that market response had not suggested. By the time the product was delivered, hundreds of flaws deemed worth remedy had been found and several provably important innovations were incorporated. Throughout the development, the team continued to be surprised both by how poorly features and functions previously thought good actually performed and by how poorly newly proposed fixes often turned out on actual test. The point here is that the prior interface from the same source, the most "advanced" Windows project, was still, in the mid-1990s, very far from optimized and there was still room for dramatic improvement based on explicit assessment-driven usability engineering. The fact that computer hardware has become much faster and more capacious-and software commensurably larger and more highly featured-does not in the least ensure that usefulness and usability of applications have improved; indeed, the effect is often the opposite. Thus, it seems certain that there will continue to be opportunities for major improvements in the design of interfaces for some time to come, especially in the many new and so far very sparsely evaluated mass network-based applications for social activities. Meanwhile, another complementary question needs to be answered. Windows 95 got the evaluation attention it needed, but no one knows

OCR for page 121
Page 152 how many other products are or are not profiting from formative evaluation. One bit of suggestive evidence comes from informal analysis of the same publication in which the Windows 95 results were reported, the Proceedings of the 1996 ACM Conference on Human Factors in Computing Systems, CHI96. This is the major organ in which work on interface development and research is first published. Among 67 articles in the 1996 issue, of which over half describe newly developed or modified interface designs, only one of every six articles reports any kind of serious user testing or observation. This small proportion is not significantly different from the numbers reported for relevant publications in the 1980s (Nielsen and Levy, 1993). Thus, it appears that progress toward better interfaces still has plenty of scope for greater application of this well-established methodology. Also of interest, about one-sixth of the papers at CHI96 were directed toward network interface applications, and another sixth were about research on general interface components that might be used in the future-the kind of science research toward principled design many workshop participants thought should be better encouraged. As mentioned above, it could be hypothesized that greatly increased beta testing made possible by World Wide Web dissemination of software has reduced the need for explicit evaluation. There may be some truth in this hypothesis in that many (but far from all) of the flaws and remedies discovered in usability engineering efforts come from trial user comments. On the other hand, as mentioned, World Wide Web beta testing is suspect as a usability design methodology because it gets information primarily from relatively expert, relatively heavy early-adopter users, those willing and able to try faulty versions (the average untested application interface has 40 flaws, according to Nielsen and Levy, 1993) of unproved things, people who are certainly unrepresentative of the target audience of this report. In addition, the Web has produced an explosion of new software that is often the result of extremely rushed, frequently amateurish, design efforts. Indeed, some usability experts think that much current Web-based software, and most home pages, have reintroduced long-recognized, serious design flaws (e.g., untyped hypertext links, missing escape and backout capabilities, and lengthy processes and downloads about which users are not warned) and that Web dissemination may have promulgated and institutionalized more avoidable problems than it has fixed. Requiring the using public to weed through the technology because of involuntary subjection to a welter of bad applications does not seem a desirable strategy for rapidly bringing every citizen happily on-line. Research is needed to determine whether, in fact rather than impression, recent trends in software development, such as World Wide Web beta testing and increasing speed of development cycles, are making things better or worse.

OCR for page 121
Page 153 Notes 1. According to Cynthia Crossen in the Wall Street Journal (1996, pp. B1, B11) "Not even computer industry executives can explain the illogic of the modern keyboard ... a device jerry-built from technology as old as 1867 and as new as this year. Because there has never been an overarching plan or design, [it] defies common sense. Its terminology is inscrutable (alt, ctrl, esc, home), and the simplest tasks require memorizing keystroke combinations that have no intuitive basis." 2. Today's elegant cellular phone interfaces emerged after a period of what some observers deem excessive feature creep. See Virzi et al. (1996). 3. A Reuters business information survey of 1,300 managers reported complaints about stress associated with an excess of information, fostered by information technology (King, 1996, p. 4). 4. See, for example, Munk (1996). She reports estimates that 27 percent ($3,510) of the $13,000 annual cost of a networked personal computer goes for providing technical support to the user, and writes, "There's a Parkinson's Law in effect here: computer software grows to fill the expanded hardware. This is not to say that all the new software isn't useful; it often is. But not everybody needs it. For mundane uses, the older software may, paradoxically, be more efficient" (p. 280). 5. In addition to instances of software for scientific and engineering applications, current popular examples, such as the World Wide Web and assorted approaches to electronic publishing, derived from efforts of technical users to design systems to meet their own needs. 6. Gould et al. (1987b) notes that equivalent reading speed for screens and for paper depends on high-resolution antialiased fonts, an element of output display (see Chapter 3). 7. A meaningful approach to computer literacy, including essential concepts and skills, is the focus of an anticipated Computer Science and Telecommunications Board project. 8. The Telecommunications and Information Infrastructure Access Program (TIIAP), run by the National Telecommunications and Information Administration, funds diverse public-interest (including government services-related, educational, library, and other) information infrastructure projects that would form a natural platform for evaluation if funding were sufficient. See O'Hara (1996, p. 6). 9. For independent innovations, "early adopters" were regarded as having a competitive advantage over those still using older technologies; for interdependent innovations, early adopters do not achieve full benefits from the new technology until the late adopters come on board (Rogers, 1983).