Click for next page ( 5


The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 4
HUMAN FACTORS METHODS IN RESEARCH AND PRODUCT DES IGN ANALYS IS: GATHERING IDEAS The ideas behind products typically arise from three major sources: from the redesign of an existing product, from an identified need in the marketplace, and from a new technological capability that provides a useful new function to users. Information about the success of existing products can be obtained either by asking their users for their opinions and uses of the systems or by gather ing unobtrusive data about their use. Information about a new product can come from reports of needs from potential users. Reports from Users Questionnaires and interviews are the most common methods for gathering information about the success of a product or the needs for new functions or a new product. Both questionnaires and interviews are good methods for eliciting information about how a person goes about his or her work, what aids or tools he or she uses or desires, what kind of knowledge or training is required to do the work, what difficulties he or she reports about the work, where the work originates and where it goes, what inter- actions are necessary with other people to do the work, and how the user thinks the work process could be improved. Questionnaires are more rigid in format than interviews, since interviews can go where the interviewee leads, often uncovering unanticipated new information. The principal disadvantage of interviews, however, is that they are time-consuming; only one person can be interrogated at a time. By aggregating information from 4

OCR for page 4
a number of interviewees or questionnaires, one can construct a general picture of users' needs and construct some tentative system concepts for helping the users do their work (Relley and Chapanis, 1982; Rosson, 1983). Diaries provide a similar form of informal data gathering and are used to uncover the needs and capabil- ities of the potential users of a new product. Data about work can be gathered in detail over a long period of time, especially about how much time particular kinds of activities take and their sequential dependencies. Because a shorter time elapses between the occurrence of an event and its report, diaries give a more accurate record of actual activity than retrospective reports in questionnaires and interviews (Mantel and Haskell, 1983). A common marketing technique for gathering information about existing or potential users' needs is the focus group. Instead of interviewing a single user at a time, groups of users who are either similarly trained or who share common goals are first told about some potential capabilities of a system, then asked to discuss how they might find uses for these capabilities. Occasionally active brainstorming from these sessions generates very good ideas. The same kind of method is used to collect opinions about an existing product and to ask for sug- gestion. for improvements. Often designers will gather expert users of a system and ask their opinion about how to improve the system or how to design a new, co~puter- based tool for aiding their work (Al-Awar et al., 1981). The advantage of such methods is that the participants stimulate each others' thoughts , uncover ing idea. or suggestions they may not have thought of individually. That is also its disadvantage: a participant's true opinions can be swayed by group pressure. Inferring Needs from Natural Observation .~ One of the main drawbacks of the methods listed above is that they rely on users' perceptions of their needs and capabilities. Sometimes new products meet needs unforeseen by their users; sometimes users, either consciously or unconsciously, distort their daily work activities and feelings about existing working conditions. In such cases, it may be better to collect information, not by asking users, but by watching their behavior and inferring their needs and capabilities from their activities.

OCR for page 4
6 Two methods are often used to collect information about users' behavior in natural work settings. In the case of activity analysis, an observer watches and records certain behaviors of the workers. The data may be collected by direct observation or by analyzing video or film recordings. Individual samples of categorized activities are aggregated into activity frequency tables, graphs, or state transition diagrams. Such performance analyses are particularly useful in assessing the changes made in work by comparing activity before and after a new system or design change is implemented "Hartley et al., 1977; Hoecker and Pew, 1980). Logging and metering techniques involve observations of what a user does with a system, but the measurement is embedded directly into the software. These procedures can include a simple record with a time-stamp of every interaction that a user makes with the computer, or it can involve a complete hard copy representation of a sequence of particular display frames. Powerful logging and metering software can also categorize certain recognizable events and summarize their times. For example, one could summarize such events as time to complete a task, user and/or system response time, and frequencies and types of errors. Logging and metering procedures are typically embedded in the operational software. Where there are limits to the access to such software, one can connect a second computer in tandem to the first and direct data about the user's activities to it, in essence providing a Impassive tap.. In this way, logging does not interfere with system response times, and information about the user inputs and the system responses can be recorded in detail for future use (see Whiteside et al., 1982; Goodwin, 1982). DESIGN: THE INITIAL DESIGN Designers go through two stages in constructing an initial design, either implicitly, driven by intuition or experience, or explicitly, using some or all of the detailed tools described below. First, the designers decide what the user is going to do, conducting an informal or formal task analysis. Second, they specify what the interface will look like and what the dialog will consist of. There are a variety of methods that apply to thin stage, where designers use informal or

OCR for page 4
formal guidelines, consult end users, or have some theory-based judgments to draw on. Determining What the User Needs to Do The most common form of analyzing the user's activities is called a task analysis . Task analysis is the proces s of analyzing the functional requirements of a system to ascertain and descr ibe the tasks that people perform. It focuses both on how the system f its within the global task the user is trying to perform (e.g., prepare a report of a projected budget) and what the user has to do to use the system (e .g ., access the application program, access the data files, etc.). Task analysis has two major aspects: the first spec i f ies and descr ibes the task s, and the second, and more important, analyzes the specified tasks to determine such system or environmental characteristics as the number of people needed, the skills and knowledge they should have, and the training necessary. The first step involves decomposition of tasks into their constitutent subtasks and annotating each subtask for its essential elements and their interdependencies. The second step involves examination of the actual tasks and interdepen- dencies, assessing how difficult each is, what knowledge is required, where the information resides, etc. Results of task analyses are used not only in writing functional specif ications for a particular application, but also for assigning work to groups of workers, arranging equipment in an efficient configuration, determining task demands on people, and developing operating procedures and train- ing manuals (see Bullen and Bennett, 1983: Bullen et al., 1982). Specifying the Initial Design , .. An initial system or interface design is constructed next. With the global tasks the user has to perform specified as above, the designer groups the subtasks according to logical function from the perspective of the user but tempered by system/hardware constraints. Then the actual interface or system details come from three sources: design guidelines or principles, intuitions of the designer sometimes aided by intuitions of the users themselves, and theory-based judgments.

OCR for page 4
8 In generating an initial design, the designer can address existing design guidelines for general prescrip- tions o f hew to specify particular components of the interface. For example, if the interface has a menu, the guideline may prescribe that the alternatives should be listed by order of frequency of use or cluster them according to functional similarity, rather than displayed alphabetically or randomly. Current design guidelines (e.g., Woodson and Conover, 1966; Van Cott and Rinkade, 1972) include prescriptions about such topics as the readability of type fonts, the brightness levels of display screens, keyboards designed to fit hand ah ape and function, and rules for making abbreviations and symbols (see also Schneiderman, 1982; Smith, 1982). Current guidelines, however, are more concerned with perceptual and performance characteristics than with the cognitive properties of the interaction. Thus, they would prescribe appropriate type fonts, but not what words these fonts should express to the user to suggest the appropriate analogy for performing the task on the system. There are several major caveats in the use of design guidelines: the prescriptions or recommendations contained may have been derived from situations or research not applicable to the system being designed; new or unaccounted for variables may interact in unanticipated ways; and current guidelines do not always publish the source of the recommendation, whether it was generated by a controlled laboratory study or derived from the col- lected wisdom of experience. Guidelines have to be applied with care. Though design guidelines have their flaws, they are very useful in placing a particular new design in a setting of conventional wisdom. Often the designer, skilled in interacting with systems and cognizant of the end tasks that are being supported in this design, cannot foresee the difficulties the new user will have with the system. Design guidelines provide suggestions to the designer that will in many cases be better than those based solely on intuition. (For a recent version of guidelines, see Smith, 1984.) The skills and knowledge of users themselves can be used to advantage by incorporating users in the design team. Users can provide some critical insights about how they think of the tank and thus the system (e.g. , what kinds of information should be accessible when, what the screens should look like to mimic the original, a noncomputer version of the task, what commands ought to

OCR for page 4
9 be called) . They know the procedur es and terminology and, with proper support, can contr ibute to the design and layout of forms and menus as well as act as critics of the design. Gould and Lewis (1985) and Miller and Pew (1981) provide examples of the involvement of users in the design process. Other ways in which the sophisticated user can be involved in the des ign of sof twar e systems can be found below in the section on prototype testing with users. A third source of information about the original design specification is psychological theories. Theory-based judgments can constrain aspects of a design or suggest promising areas of investigation. For example, theories of color contrast can provide insight into the appro- priateness of certain combinations used in screen high- lighting or predict the readability of a new monochrome display color. Because Fitt's Law accounted for movement time for placing a cursor in a desired position with a mouse and for placing the appropr late f inger on a desired key location, two conclusions follow: the invention of faster pointing devices was unlikely to increase perfor- mance and the design of keyboards with larger per ipheral key caps would increase the accuracy of keying (Card et al., 1978; Card et al., 1980b). Part of the difficulty in constructing a design and analyzing its usability has to do with how the interface is specified. Verbal descriptions of how a system works are particularly unsuited for conveying the flow of an interaction and the choices the user has at each point. Several specification languages or formats have been explored recently not only to serve as a way of conveying to those who actually build or code the system what it will do but also as a way of concretely specifying the system to analyze its usability. One way to specify the interaction is to use an inter- active tool kit called a human-computer dialog management system. This system guides the definition of the inter- action language that describes the actions of the user and the system and the screen formats displayed at each moment. Hartson et al. (1984), Jacob (1983), and Wasserman (1982) provide good examples of this kind of inter face def inition.* A second format for displaying *This is also a system that allows rapid embodiment of the functioning of a new, developing system and thus is a tool for r apid prototyping .

OCR for page 4
10 what the system does at each state is a state transition diagram, r ecently used as a descr iption of a system ' s work ings in Kieras and Polson (1983 ) . DESIGN: FORMAL ANALYSIS OF TlIE: INITIAL DESIGN Once an initial design is specified, even if it in a par tial design, it can be sub jected to several kinds of scrutiny. The goal in this analysis stage is to make the initial design as good an possible before it is made into the prototype for user testing. Three methods aid in this process: structured walk-throughs, decomposition, and task-theoretic analytic models. Structured walk-throughs involve construction of tasks that a user carries out on a simulated system. The user tries out the system by going through the task, step by step, screen by screen, command by command. This can be done with the design as specified in a number of different formats, using an experimental simulation of a prototype or even with the experimenter presenting paper and pencil figures of the screens, menus, and commands in the appropriate sequence. The technique helps to identify confusing, unclear, or incomplete instructions, illogical or inefficient operations, unnatural or difficult proce- dures, and procedural steps that may have been overlooked because they were implicitly rather than explicitly defined. Gould et al. (1983), Ramsey (1974), Ramsey et al. (1979), and Weinberg and Friedman (1984) provide examples of the use of structured walk-throughs. A second kind of formal analysis, called decomposition, is proposed in Reitman et al. (1985). In this analysis, the major components of the design are separated and analyzed for their impact on cognition. The picture displayed on the screen, for example, is assessed for how it helps or hinders the user's ability to perceive mean- ingful relationships or the system model. The commands are assessed for their load on long-term memory, how easy they are to remember, and how confusable they are among each other. For each component, a second design alterna- tive is constructed to fit within the general guidelines of usability. Then, through discussion and debate, the design team decides which alternative of each component is the better design. This method encourages careful scrutiny of the proposed design and often encourages designers to specify better interfaces before the first prototype is bu ilt .

OCR for page 4
11 The third kind of formal techniques invoke task- theoretic analytic models. These models provide representations and analyses that assess, for example, which parts of a metaphor aid performance and which do not (Douglas and Moran, 1983) and how big the user's short-term memory load is at each step of th. interaction (Rieras and Polson, 1985). Prime examples of these tech- niques include metaphor analysis (Carroll and Thomas, 1982; Carroll and Mack, 1982), assessment of mental models (deKleer and Brown, 1983; deKleer and Brown, in press; and others in Gentner and Stevens, 1983), develop- ment of production rule systems that represent the user's knowledge of the tank (Kieras and Polson, 198S), object/ action analysis (called External/internal task mapping. by Moran, 1983), the GOMS model (Card et al., 1980b; 1983), and formal grammar notation systems (Reisner, 1981a, 1984; Blesser and Foley, 1982). These task analytic models are very useful tools. However, none of them yet encompasses all of the cogni- tive aspects of the interaction; each focuses on one or more important aspects. These methods require training to use and often take a long time. However, they all have the advantage of being based on sound theories of human behavior and can provide important analysis of usability before any coding of software or running of subjects is contemplated. There is a trade-off, then, between time spent in analysis and time spent testing users in the laboratory or the field. The hope embodied in this approach is that as the science of user-interface design grows, analytic tools will improve to the point of making the actual user testing of designed systems merely a last, short check of a good, finished design. DESIGN: BUILDING A PROTOTYPE Three methods provide simulations or quick versions of significant aspects of a new system so it can be tried by actual users. The methods are called facading, the Wizard of Oz technique, and rapid prototyping. Facading is the technique of quickly and inexpen- sively building a simulation of the external appearance (i.e., the ~facade.) of a system's interface. Its advan- tages are that it is quick and relatively easy; the target system's underlying complexity and/or final computational capability is ~finessed.. To be maximally beneficial, the facade must embody some level of the functional

OCR for page 4
12 capability of the final target system. It does not just generate a series of static snapshots of the system but rather includes the control structure, flow, or connectiv- ity of the final system. Hanau and Lenorovitz (1980) and Lenorovitz and Ramsey (1977) provide good examples of the use of this technique. A variant of the facading technique is the Wizard of Oz technique. Instead of having the computer embody the simulated system, hidden human operators intercept user commands and provide output back to the user. Often the technique is used to test a new interface language: the hidden human operator intercepts the new commands, trans- lates them into the real system commands, and, after receiving output from the real computer system, retrans- lates them back to the tested end-user (see Gould et al. , 1983; Gould and Boies, 1978; Ford, 1981; Relley, 1983; Wixon et al., 1983). Rapid or fast prototyping are terms appl fed to the more formalized building of a prototype in a hurry. The speed of building a running system depends mainly on the underlying supporting software, which makes the specific prototype programmable from existing modules. Ideally, the prototype programming language separates elements of the dialog from the actual implementation software. For example, the designer can specify the placement of the command input line or the menu choices variously without having to program new modules to execute these different input formats. One of these, the dialog management system,. is under development by Hartson and his colleagues (Mattson et al., 1984 s Yunten and Rartson, 1984)S another system is described in Wasserman (1982) and Wasserman and Shewmake (1982). Another project that uses rapid prototyping methods is reported in Hayes et al. (1981). DES IGN: PROVE TESTING WITH {USERS When a prototype of some form has been built, actual users are then brought in to use the system and report their opinions about it. These tests can vary greatly in how well controlled their designs are and how representa- tive the set of tested users are of the final population of users. Moreover, users are asked to perform several kinds of tasks, some testing the normal, frequent talks that regular users will be expected to perform, others testing those subtasks thought to be especially difficult

OCR for page 4
13 either for the system (e.g., those producing long system response times) or for the user (e.g., the longest sequence of commands for a particular type of task). Prototype tests differ in what kinds of data are taken from the user--times and errors, thinking aloud protocols, or attitudes. Experimental Designs Field tests to evaluate systems are fashioned after laboratory tests common in the academic field of experi- mental psychology. In general, they require the compari- son of at least two systems, systems that differ in only one component or variable. Measures are designed to reflect the performance attributable to the effects of that variable, and subjects are chosen to be representa- tive of the population of end users. Of particular impor Lance are various techniques for controlling irrelevant variables. For example, one must ensure that measures of intelligence of the test subjects do not differ across both conditions, affecting the results in addition to the effects of the independent variables. Often the rules of good experimental design are violated in the interest of proceeding quickly. Subjects who are different from the end users but more available may be testeds comparisons may be made between two systems that differ on more than one variable; measures may be taken that are less senettive than those that will directly test why performance on one system is better or worse than another; occasionally only one system is tested and performance on it is measured against some predetermined standard (e.g., a 10-minute rule for time to learn a system). The closer the test is to good experimental design, the more quickly the findings can advance knowledge about the important aspects of scud human-computer interface. However, as is often the case in development, the goal is not ultimate knowledge but rather global assessment of the adequacy of a particular interface or system. A compromise design procedure is described in Reign et al. (1984). The use of experi- mental design is found in Ledgard et al. (1981), Reianer et al. (1975), Reianer (1977, 1981b), and Williges and Williges ( 1982) . One variant from controlled experimental evaluation that has been found useful in the development of inter- faces is called guasi-experimental design. These -

OCR for page 4
14 designs involve capturing data at several time intervals typically of durations measured in weeks or months. Sometime during the data capturing intervals, a change or a modification of a system is introduced; the data being captured are expected to reflect the impact of this change. Some of these quasi-experimental designs allow for comparisons with a control group. These designs are hard to control, since the investigator must typically take existing groups of users, giving one the change and the other no change. Inherent differences in existing groups is a major worry in evaluating the results. A complete description of this technique can be found in Cook and Campbell (1979); Roltum (1982) and Rice (1982) provide good examples of this method. Selection of Tasks to Perform There are two reasons one has users try out a prototype system: to identify points of difficulty for the user so that those points can be redesigned and to measure ~tan- dard use of the system, so that later changes in hardware can be assessed or so those concerned with the staffing of a large operation of users can determine how many people will be needed. For the first purpose, tasks are selected that stress the system and the user, generally called critical incidents. For the second purpose, tasks are selected to estimate basic characteristics of the system's use, called benchmark tests. In terms of critical incidents, the goal is to set up situations or tasks that have been shawn historically to tax the user and/or the system and are sufficiently important that they can make the difference between success or failure on task or system performance. One might, for example, require the user to access item. distant from what is being presented on the current screen or to perform a long command sequence, to deter- mine the loads of this part of the design on the user's ability to imagine the stored information's underlying structure or the mnemonic characteristics and grammatical rules implied by the command sequences. The goal is to set up situations in which the data will tell the designers something about the limits of human or system performance. These tasks are illustrated in the work of Al-Awar et al. (1981), Relley and Chapanis (1982), and Flanagan (1954) .

OCR for page 4
15 In benchmark tests, the goals are quite different. The designer wants to measure the likely performance times and errors expected in normal use. The tasks are not designed to tax the system or the user, but rather to be representative of the kinds of frequent tasks the system will normally support. Typically, tasks are constructed to measure the expected amount of time it takes a new user to learn a system, the amount of time it takes the user to perform a set of predefined tasks, and the amount of time it takes the system to respond to a user's request. A good study that illustrates the use of this method is that of the evaluation of eight text editors by Roberto and Moran (1983). A study of data- base interfaces using benchmarks was done by Man tei and Cattell (1982). Kinds of Data Collected There are four major kinds of data collected in tests of systems: the time it takes to perform a task, the frequency and kinds of errors, the goals and intentions of the users, and the attitude of the user. The amount of time a task takes (either how long an entire task takes or how long each successive keystroke takes) reflects the time it takes the user to perceive inputs, categorize and plan appropriate actions, and execute proper responses. Error frequencies and types reflect the difficulties users have with these processes and often point to the cause of the error (whether the error response is similar to one in a similar plan, was generated from confusion with a similar screen, has a label that sounds the same as another, etc.) A simple analysis of users' times and errors is found in Reisner et al. (1975) and Reisner (19771. A comprehensive analysis of users' times is found in Card et al. (1980b, 1983). Other uses of times and error. can be found in Boies (1974), Rosson (1984), Sheppard and Rruesi (1981), and Thomas and Gould (197S). A more thorough, complicated k ind of data to collect during evaluation involves the user's chiming aloud while performing the task. Typically the user is video- and sound-recorded while he or she is performing the tasks . The r ecording captures what is said and done, what is displayed on the Screen, what sections of the documentation are being examined, what parts of the task instructions the user is reviewing, etc. The most

OCR for page 4
16 complete protocols ask the subjects to verbalize their intentions, what their goals are, and what current plans they have about reaching their goals. Other behavior is directly observable s thoughts and plans typically are not. This method has been used by Mack et al. (1983), Carroll and Hack (1982), and Card et al. (1980a) in their studies of skilled text editing. More complete descrip- tions of the technique and its advantages and disadvan- tages can be found in Lewis (1982), Olson et al. (1984), and Ericsson and Simon (1980). A third kind of data collected in evaluation sessions is the users t opinions about the system's ease of use and functionality. A common instrument used to scale users' global attitudes about the system is the evalua- tion component of Oagood et al.'s (1957) Semantic Differential (see Good, 1982, for an example of its use). Questionnaires and interviews also tap users' reactions to particular components of the system. One problem with users' reports, however, is that they are typically distorted by their experience with other, similar systems. Or a user may have difficulty separating components of the system sUchs for example, a user who has a very difficult time using a system may report that he or she likes it a great deal, recognizing how much easier it is to perform the task on a computer compared with previous manual methods. Redesign Typically as the prototype of the original design is tested, errors are found and revisions suggested. The methods appropriate to the initial design are appropriate also at the stage of redesign. This part of the design process iterates through ~fixing. and ~testing. until either an acceptable level of performance ts reached or the deadline for developing the system is reached. IMPI~MENTATION: MONI=R=G ==I== Pats 1 Just as data were collected in the original conception and analysis phase of product development, data are col- lected on the system Be implemented. At this stage, activity analyses, diaries, logging and metering, and questionnaires and interviews are all appropriate methods for assessing whether the product as designed is performs

OCR for page 4
17 ing as predicted in the f inal environment . I f problems are found in the field, either small corrections are made in the code (e.g., changing what a command is called is easy to change in the code but can have an enormous impact on the ease of use), or a redesign is called for, sending the product design process back to prototype development or fully back to the top of the cycle.