Click for next page ( 6


The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 5
of the system that fits what the users have to do to operate the system. Descriptive models are those held by the researcher to approximate what the user does know; prescriptive models are those held by the designer to approximate what the user should know. The concern of this report, however, is the representation that the user has of how a computer system works. Furthermore, since a mental model may be only one way of describing the knowledge that a user has about a system, this report is broadened to include all of what a user knows about using a particular piece of software, including how to use it and how it works. What users know sliders in several important dimensions. It differs according to the sophistication of the user. For example, a user who is a programmer might have a very different understand- ing of a piece of software than a person with no programming ex- perience. Also, multiple mental models or several representations at different levels of abstraction might coexist within the same individual. For example, a person who both designed and later used a system might develop two somewhat compartmentalized understandings of the system. Analogous distinctions arise if we consider different task environments. For example, the representa- tion elicited for routine skilled behavior might differ substantively from that elicited when a person tries to recover from an error or otherwise solve problems (e.g., Rasmussen, 1983) . - Because understanding what the user knows has practical importance for designing software and its training, and because it has theoretical importance in understanding people as they generally perform complex cognitive tasks, this report considers only the representations the users have when using software- representations of the task being performed, the user-system in- terface, and the system architecture. TYPES O1? REPRESENTATIONS OF USERS' 1iNOW[EDGE There are three basic types of representations that have been formulated to characterize what a user of software knows. The most elementary is a simple sequence of overt actions that fit a particular situation. The second is a more complex and general characterization, the knowledge of methods. This kind of rem resentation of the user's behavior incorporates general goals, the 5

OCR for page 5
subgoal associated with it, a set of methods that could be brought to bear to accomplish the subgoals, and, finally, sequences of on orators for those methods. Both of these conceptualizations are task-oriented in that they contain no theory of how the software or system works or what the user's actions do internally to produce the results. The third, the mental model,' is knowledge of how the system works, what its components are, how they are related, what the internal processes are, and how they affect the components. It is this conceptualization that allows the user not only to construct actions for novel tasks but also to explain why a particular action produces the results it does. Simple Sequences Users often have no knowledge of the underlying system or even general rules for getting things done. Novices, in particular, resort to a learning method that borders on rote memorization. They learn sequences of actions that will get the system to do common types of tasks. For example, in using the operating system on the Michigan Terminal System to print the contents of a text file with the laser printer, many users merely memorize the nearly nonsense strings: $RUN *textform scards = pc:fw.macros + file spunch = -x 'run a program called ~textformn with input from a master file of parameters plus the inputfile, send the output to a temporary file called ~x" ' $RUN *pagepr scards = -x par = onesided 'run a program called ~pagep~n with input from the temporary file ~x" so that the output is printed on only one side of each page' where the only free parameter to be entered is the name of the file after the ~+" in the first discards" designation. Similarly, some word processors require the user to memorize short, common ~ This is a subset of the knowledge Rouse and Morris (1986) call mental models. We would include knowledge that helps the user to explain the function and states of the system and to predict its future behavior. We would not include descriptions of its purpose and form, information that seems shallow and unhelpful in a performance context. 6

OCR for page 5
command sequences to accomplish certain repetitive actions, such as " XME" to exit, and " XBA" to enact the printing sequence. A good clue as to how often users rely on these simple sequences is to note the cheat sheets that they keep available when they are using software, or the notes made and often stuck to the side of the cathode-ray tube to remind the user of some commands that are commonly used but difficult to remember. Young (1983) described one way in which users think about a calculator, as simple sequences or sets of task-action pairs. A task includes something the user wishes to accomplish (e.g., an arithmetic calculation or formula evaluation), which is associated with an action, or what the user must do in order to accomplish the task (e.g., key presses on a calculator). This knowledge is in the form of paired associates, and like the sequences to print a file described above, it has simple slots that indicate the free parameters the user must designate to fit the current situation. A second description of simple sequences of actions is the keystroke mode} (Card et al., 1980a,b, 1983; Embley et al., 1978~. The analyses in the keystroke models contain notations that de- scribe what sequences of actions users make in invoking simple commands: the keystrokes, mouse movements and so on. In Card et al. (1980a,b, 1983) keystroke analysis, the analyst assumes that the user needs time to make each act in producing the command: a time to make a keystroke, a time to point with a mouse, a time to move the hands from the keyboard to the mouse or back, and a time to mentally prepare each command and its parameters. The analysis assumes that users must retrieve each command sequence from their memory, incurring a pause for mental preparation, and then execute the components of the command, pausing for addi- tional mental preparation times before each command word, each parameter, and each delimiter (such as pressing a parenthesis, return, or other type of operator). For example, a command se- quence for using a line-oriented editor to search a file for an error and fix it: s /f "errorstring" 'search the whole file for an error' a 16 "oldstring" newstring" 'alter line 16 so that the old string is replaced with the new string' 7

OCR for page 5
would include mental preparations before each line and before each parameter, such as "/f~ and "16,~ and the strings to be searched for and replaced. Analysis proceeds by attaching a constant time for each keystroke, movement, or mental preparation, affording a prediction of how long the formulation and execution of each command would normally require. In the same spirit, Reisner (1984) assumes that the user needs a fixed amount of time to make each individual act In producing a command. Instead of one mental preparation time, however, Reisner (1984) posits specific mental acts (e.g., retrieving from long-term memory, calculating a number, copying a number), each of which takes a different length of time. The analyst assumes (or knows from prior experimentation) how the various parameters are related (e.g., the time to calculate a number will be greater than the time to copy that number from a display) without spec- ifying each time exactly. Simple algebra ~ then used to predict which of various whole design alternatives, or which of various user methods, will require the shortest time to perform. These analyses of simple sequences serve to facilitate both comparison of existing software packages for the one that will re- quire the shortest time to perform and the design and development of new system languages. i Methods and Ways to Choose Among Them Users~not only elicit simple sequences to fit simple situations by rote; they sometimes also choose among various possible general methods that fit a particular situation.2 A number of investigators have studied the organization of more general actions as a function of task goals in the domain of programming. A general finding is that skilled programmers recognize aspects of particular situations and select general ac- tions appropriate to them. For example, individual statements or sets of lines of code in a program are "chunked" into higher- order task-relevant structures. Skilled programmers can recall at a glance more lines of code than novice programmers (Adelson, 1981; McKeithen et al., 1981; Shneiderman, 1980~. This is consistent 2 these methods are similar to the procedures remembered and used in the stage of "deciding and testing actions" in supervisory control tasks, described by Sheridan et al. (1986~. 8

OCR for page 5
with prior studies of expertise and the organization of memory (Chase and Simon, 1973; Egan and Schwartz, 1979; Reitman, 1976~. These studies suggest that in the skilled programmer's knowledge base there is a mapping between chunks of actions or methods (that often go together) and general task features, so that the actions will be recalled and used at appropriate times in the future. These chunks reflect a developed, deeper understanding of routine programs, which are useful to a programmer writing programs. Similarly, Ehrlich and Soloway (1984) have shown that skilled programmers tend to employ patterns of actions, called plans, consisting of routinely occurring sequences of programming statements. Furthermore, by examining the structure of recall protocols, McKeithen et al. (1981) determined that skilled programmers or- ganize their vocabulary of programming statements more stereo- typically than do novice programmers. It appears that with ex- pertise, the users' understanding converges to a similar set of representations of concepts in the programming language. Data base designers reveal mental organizations that become increas- ingly homogeneous with greater expertise (Smelcer, 1986~. A more complete theory about what the user knows about how to accomplish a particular task is the GOMS model (Card et al., 1983~. GOMS-is an acronym that stands for the elements of what the user knows: the goals, the operators, the methods, and selection rules. In the GOMS model, the user has a certain goal to accomplish (such as editing a manuscript that has been marked up). The user recognizes that this large goal can be broken into a set of subgoals (such as finding each editing mark and making the requisite changes). Subgoals are broken down into smaller and smaller subgoals until they match a basic set of methods, that is, sequences of operations that satisfy a small subgoal. The GOMS model states that users have some rules by which they choose the method that will fit the current situation. For example, users may know that there are several methods that can be used to find the first place in the manuscript to be edited: using the search function with a distinguishing string to be found, using the page-forward key until the target page is found visually, or using the cursor key to find the specific target location visually. People will choose whether to use the search, page-forward, or cursor key method depending on how far away the next editing target is assumed to be. Each of these methods is made up of 9

OCR for page 5
certain operators, key presses, and hand motions, as specified in the keystroke mode! described above in the discussion of simple sequences. A number of empirical studies have shown that the predictions of GOMS and the keystroke mode! are reasonably accurate, ant} that sometunes one can even use the same time parameters across applications. Card et al. (1983) showed that their parameters for keystrokes and mental processing time were similar across text processors, operating systems, and graphics packages. Olson and Nilsen (1987) extended the analysis to show that the basic param- eters applied well to spreadsheet software. However, additional time parameters were required. One was to account for the time it took users to scan the screen (for example, to find on the screen the coordinates of a particular value in a spreadsheet). A second time parameter was required to account for the time it takes the user to choose between methods: the more methods to choose from, the longer the pause before executing a simple sequence in a command. Command grammars use a different analytic representation, but are analyzing the same kinds of mental events. The command language grammar (COG) (Moran, 1981) and Backus normal form (BNF) (Reisner, 1981, 1984) have been used to describe the orga- -nization of sequences of actions that fulfill goals. These grammars are sets of rules that show the different ways in which an "alpha- bet~ of actions can be formed to produce acceptable Sentences that are understandable to a system or a device. For example, Reisner (1981, 1984) treats user actions that are acceptable to the system as a language. She describes the structure of this language as a BNF grammar. Figure 1 shows a sample of what in this formalism are called rewrite rules. At the higher levels are the user's task goals and the possible methods that can achieve the goal. This is presumably a representation of the components of plans the user has ready to evoke to fill an overall task goal. Below these are the varieties of action sequences that can be elicited in a method. The top several lines of Figure 1 are similar to the goals/subgoals and methods of the GOMS analysis; the lower levels are similar to the keystroke mode} sequences. Compared to GOMS, this representation more compactly shows the alternative ways to accomplish a task or to enact a series of keystrokes; GOMS requires a new method for each al- ternative. While various methods (represented as sentences from 10

OCR for page 5
Use On Identify first line Get first line on screen "Locate" strategy ..>ldentify first line ~ enter On command press ENTER ..>Get first line on screen ~ Move cursor to first line ..>Use "Oocate" strategy use scroll strategy ..>Move cursor to command Input field ~ type "locate" command ~ press ENTER Move cursor to command Input field muse cursor keys press PFCURSOR null Type locate command ..>Type "locate" keyword ~ type line number Type locate keyword Type line number ..>L+O+C L L+O+C+A+T+E ..>Type number FIGURE 1 A command grammar representation of actions necessary to edit a line using a word processor. Rewrite rules applied to this domain are compact definitions of the many acceptable ways to get something done in a particular command language. One reads these rules from left to right; the left-hand terms are made up of the elements listed on the right-hand side. Elements connected by a An are executed in sequence, elements connected by a ~~ represent alternative ways of invoking the same goal. For example, "Use Dn" consists of identifying the first line, then entering the "Dn" command, and then pressing enter. Typing the locate keyword, however, includes typing aLOC," "L,~ or "LOCATE." Source: Reisner (1984:53~. such a grammar) can be compared to see which takes less time, a grammatical representation is less adequate than GOMS in that it lacks any way to represent how a user selects the method appro- priate for the current situation. The language format of grammars, however, allows the use of standard sentence complexity measures to predict some aspects of user behavior: the more rules, the long it takes a user to learn; the greater the sentence (sequence) complexity, the longer the pauses between keystrokes; the more terminal symbols in the language, the harder the language is to learn. These predictions have not been fully tested, and there is some suggestion in the literature about language understanding that these measures do not adequately predict how difficult it is to understand sentences (Fodor et al., 1974; Miller, 1962~. The formalism, however, allows a number of intriguing predictive possibilities for understanding and recalling command languages. See Reisner (1983) for a discussion of the potential value of such grammars. 11

OCR for page 5
Mental Models In its most generic application, the term mental mode! could be applied to any set of mental events, but few if anyone would cistern such meaning for the word model. Somewhat narrower in meaning, the term could be used for any thought process in which there are defined inputs and outputs to a believable process which operates on the inputs to produce outputs. In this sense, one could have a mental mode} of one's own behavior (~If ~ do this, then that will happens), another person's behavior, the input- output characteristics of any software process run on a computer, or any information process mediated by people or machines. It could be a series of paired associates by which the user predicts, through a causal chain, outputs of a process given its inputs. Given these general possibilities for the term mental model, it is most commonly used to refer to a representation (in the head) of a physical system or software being run on a computer, with some plausible cascade of causal associations connecting the input to the output. Accordingly, the user's mental mode! of a system is here defined as a rich ant! elaborate structure, reflecting the user's understanding of what the system contains, how it works, and why it works that way. It can be conceived as knowledge about the system sufficient to permit the user to mentally try out actions before choosing one to execute. A key feature of a mental mode} is that it can be "rune with trial, exploratory inputs and observed for its resultant behavior (Sheridan et al., 1986~. Mental models are used during learning (such as using an analogy to begin to understand how the system works), in problem solving (such as in trying to extricate oneself from an error or performing a novel task), and when the user is reflecting on or attempting to rationalize or explain the system's behavior. Users are typically described as using a mechanistic model; that is, the user is assumed to have a conceptual "machines whose simulated function matches the actual target machine in some way.3 Three general kinds of models are called surrogates (Young, 1983), metaphors (Carroll and Thomas, 1982), and glass boxes (DuBoulay et al., 1981~. A fourth kind of model, the network 3 This may be more due to the fact that researchers are good at describing mechanistic models than to the fact that it is the only kind of model people have. In fact, exploration of other representations is an important research need. 12

OCR for page 5
model, is a composite, blending the features of surrogates and glass boxes. Surrogates A surrogate is a conceptual analysis that perfectly mimics the target system's input/output behavior and that does not assume that the way in which output is produced in the surrogate is the same process as that in the target system. It is a system that behaves the same, but is not assumed to be isomorphic in its inter- nal workings. Thus, while the surrogate always provides the right answer (the one that the target system would have generated), it offers no means of illuminating the real underlying causal basis for the answer. It is a good, complete analogy that may allow the user to construct appropriate behavior in a novel situation, but it does not help the user explain whey the system behaves the way it does. Young (1983) noted that it is very difficult to construct an adequate surrogate, even for a fairly simple system like a hand- held calculator. This raises the question of whether people ever hold surrogates in their minds, even for simple devices. Metaphor Models A metaphor mode! is a direct comparison between the target system and some other system already known to the user. A com- mon example, referred to widely in the literature, is the metaphor that Ha text editor is a typewriter. Many investigators have observed that new users spontaneously refer to this typewriter metaphor during early learning about text processors (Bott, 1979; Carroll and Thomas, 1982; Douglas and Moran, 1983; Mack et al., 1983~. The explanations people offer for system behavior are often couched in the vocabulary of the metaphor. Furthermore, the ex- tent to which knowledge in the metaphor source domain matches the target domain correlates with performance. That is, the task- action pairs that fit both the metaphor source and the target system are easy to learn; those that do not are often learned last or remain constant sources of error. For example, learners have less trouble learning how to use character keys than the backspace and carriage return keys; the latter typically operate differently in text processors than they do in typewriters. 13

OCR for page 5
Unlike surrogates, metaphor models are easy to construct or learn, and they provide explanations of why the system behaves as it does. However, metaphors vary greatly in accuracy. For example, "the interface is a desktop" seems less accurate than "values are put into storage locations." One difficulty with using metaphors in analyzing users' be- havior with computers is it is difficult to find out what the users' metaphors are. As Young (1983) put it, a metaphor analysis ex- changes the problem of describing what the user knows about the target system for the problem of describing what the user knows about the metaphor source. For example, user have to know enough about pipelines for the metaphor "a flow chart is a pipeline" to be useful. In addition, metaphors that map one domain perfectly into another are rare. Consequently, metaphors can sometimes be misleading as well as helpful. The hydrody- namic metaphor for electric current, for example, is only good for a limited subset of phenomena, and is misleading for many others. Similarly, the typewriter metaphor for a word processor helps with some actions (like using the backspace key), but interferes with the learning of others (like the return key) (Douglas and Moran, 1983~. Glass Box Models Glass box models lie between metaphors and surrogates. They are surrogates in that they are perfect mimics of the target sys- tem. But they are metaphors in that they offer some semantic interpretation for the internal components. For example, Mayer (1976) discusses a glass box mimic for a BASIC-like program- ming language. This glass box is not simply a surrogate, because its components are presented via metaphors (input as a ticket window, storage as a file cabinet). It can be run to perfectly predict outputs from inputs, but it can also be interpreted via these metaphors. Yet it is also not a simple metaphor; it is a composite metaphor (Carroll and Thomas, 1982; Rumelhart and Norman, 1981~. It does not merely exchange the target system for a metaphor source in toto; it uses aspects of several metaphors to provide the surrogate behavior. Glass boxes have been used primarily in a prescriptive context rather than in a descriptive one. Mayer's (1976) glass box is not a mental structure that was cliscovered; it is a mental structure that 14

OCR for page 5
was taught to the user (e.g., subjects were instructed to think of in- put as a ticket window). Studies of prescriptive conceptual models tell us something about what kinds of models are useful, and about models that people could generate. On the other hand, they can validate prescriptive models that help users of complex systems when it is hard for the user to deduce an adequate representation merely from experience. Network Representations of the System Network representations contain the states a system can be in and the actions the user can take that change the system to another state (Miller, 1985~. One particular type of network rep- resentation, the generalized transition network (GTN), contains detailed descriptions of what the system does (Kieras and Poison, 1983~. GTN's are state transition diagrams that represent the vis- ible states of the system (i.e., the display on the screen) as nodes, and the actions the user can take at each state (the commands or menu choices) as arcs. The connected nodes and arcs form a net- work that shows the sequence of states that follow user actions at each point in the software interaction. GTN's and other network diagrams are often used as tools in system development, to give the designer a picture to refer to in order to keep track of what can be done at every state in the transaction. Figure 2 illustrates a-portion of on-e of these networks for the actions that can be taken when a user enters a system and loads the word processing application. Networks can also be used to describe what the user knows about the system (Olson, 1987~. Olson (1987) suggests that GTNs be used to represent users' knowledge of system states and allow- able actions; these can be compared to the GTN of the actual target system to measure the user's level of learning or under- standing. Examination of the parts of the real GTN that are missing in the user's representation could indicated areas in which learning or remembering certain functions is difficult. The GTN is like a surrogate representation in that it does not give an underlying explanation about why the elements are related in the way that they are nor how the internal system com- ponents behave. Nor is there any indication of the purpose these actions fill toward a user's goal. It does, however, represent what 15

OCR for page 5
,~-o5- PROMPT ~/ To (~(INPUT) I PROMPT ' CANCL CANCL ~ 2) REVISE ~ (~)_ - \\ UP \ OTHER 'ILLEGAL ACTED ~3 ~ <~ CANCL ~ NO - BUT INPUTCANCL CANCL ~ INPUT INPUT_ NULL ENTER ~ (am) ADD CHAR TO INPUT (~;~ FIGURE 2 A generalized transition network (GTN) representation of part of the task of editing a document. Circles represent states or tasks, arcs represent the connections between states, and labels to the arcs represent the actions the user takes. Source: Kieras and Polson (1983:104). 16

OCR for page 5
the user knows about how the system works in simple stimulus- reponse terms. A GTN displays the simple response that can be expected from the system given each action the user takes. And, importantly to the user, knowledge of these actions and their con- sequences can be useful when the user must solve problems, either when an error has just occurred or when a novel goal has arisen and the user needs to decide on an appropriate sequence of actions. Comparisons It is useful to consider the relation between sequence/method representations and mental models. People undoubtedly have both kinds of knowledge when they use computing systems. But re- search on these two approaches is largely complementary in that the kinds of questions addressed about one kind of representa- tion have been different from those about the other. Briefly, the sequence/method representations are more analytic in that they can predict behavior (except errors) in some detail. Although the sequence/method approach has not typically dealt with predicting user errors, attempts have been made to show how user learning takes place. The mental models approach, on the other hand, accounts for errors as well as accurate behavior in novel and stan- dard situations, but does not predict the details of behavior well nor how the models are learned. - Sequence/method representations, because they are composed of goal-action pairs, by their very nature predict how knowledge is used. To ciate they have represented only how to accomplish routine tasks (in which all the goal-subgoal and subgoal-action relations have been worked out) but have little or nothing to say about how knowledge is used in nonroutine tasks, such as in recovering from an error or behaving in an entirely unfamiliar situation. They do not have much generality in their conditions. And, there is no posited mechanism for problem solving when a new situation fits several general condition-action pairs. Without this mechanism, these analyses cannot account for errors. Some attempt has been made to account for how sequence/method representations are learned. Lewis (1986) provides an account of how users might acquire goal-action knowledge after they watch another person use the system. Through several simple heuristics that link actions to probable causes, the user begins to build a reasonable set of rules. The acquisition of rules is detailed by 17

OCR for page 5
Lewis (1986), but the further learning in fine-tuning those rules is not covered. Kieras and Bovair (1986) do not explain original learning per se, but have shown that learning a new system is speeded up if the user is familiar with another system that has many of the same rules in cornrnon. Neither of these approaches addresses the continued learning that goes on as the user acquires or discovers new strategies for efficiency. Research on mental models, on the other hand, has not con- centrated on the details of how a user uses a mental mode] nor how it is acquired. Douglas and Moran (1983) have produced the most detailed analysis of the behavior of a user who has a mental model. They examined the analogy of "a text processor ~ a typewriters by noting the typewriter condition-action sequences that matched and mismatched those in the new system. Those condition-action pairs that matched were learned easily and quickly, and those that did not match produced continued errors and pauses. Other researchers have attempted to make the analysis of the behavior of the user who has a mental mode! more specific and revealing (Foley and Williges, 1982; Moran, 1983; Payne and Green, 1983~. What is missing from these analyses, however, is how users use their mental models to come up with a set of appropriate actions. There are likely to be some very interesting cognitive actions going on in the pause between the presentation of the problem (e.g., the feedback from the screen after an error) and the choice of the next action. Most - of the empirical work on the effectiveness of mental models and the predictive power of sequence/method analyses has been at a gross behavioral level. The studies of experts' chunking of information (Chase and Simon, 1973, for example) are almost completely empirical; they focus entirely on the acquisition of the condition part of a condition-action pair and offer little basis for theory. The grammatical approaches often hold a key assumption: that the fewer actions there are per task, the cognitively simpler the task. Recent work has raised questions about the accuracy of this assumption (Olson and Nilsen, 1988; Rosson, 1983~. There are occasions when a task has a few actions, but the planning and calculating necessary to make those actions is difficult. Moran has described a number of connections and contrasts between sequence/method and mental model approaches. The GOMS analysis (a methods analysis) and COG (a blend of method and mental model) sprang from common theoretical roots. Indeed, 18