Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 5
of the system that fits what the users have to do to operate the
system. Descriptive models are those held by the researcher to
approximate what the user does know; prescriptive models are
those held by the designer to approximate what the user should
know.
The concern of this report, however, is the representation that
the user has of how a computer system works. Furthermore, since
a mental model may be only one way of describing the knowledge
that a user has about a system, this report is broadened to include
all of what a user knows about using a particular piece of software,
including how to use it and how it works.
What users know sliders in several important dimensions. It
differs according to the sophistication of the user. For example, a
user who is a programmer might have a very different understand-
ing of a piece of software than a person with no programming ex-
perience. Also, multiple mental models or several representations
at different levels of abstraction might coexist within the same
individual. For example, a person who both designed and later
used a system might develop two somewhat compartmentalized
understandings of the system. Analogous distinctions arise if we
consider different task environments. For example, the representa-
tion elicited for routine skilled behavior might differ substantively
from that elicited when a person tries to recover from an error or
otherwise solve problems (e.g., Rasmussen, 1983) .
- Because understanding what the user knows has practical
importance for designing software and its training, and because
it has theoretical importance in understanding people as they
generally perform complex cognitive tasks, this report considers
only the representations the users have when using software-
representations of the task being performed, the user-system in-
terface, and the system architecture.
TYPES O1? REPRESENTATIONS OF
USERS' 1iNOW[EDGE
There are three basic types of representations that have been
formulated to characterize what a user of software knows. The
most elementary is a simple sequence of overt actions that fit a
particular situation. The second is a more complex and general
characterization, the knowledge of methods. This kind of rem
resentation of the user's behavior incorporates general goals, the
5
OCR for page 6
subgoal associated with it, a set of methods that could be brought
to bear to accomplish the subgoals, and, finally, sequences of on
orators for those methods. Both of these conceptualizations are
task-oriented in that they contain no theory of how the software or
system works or what the user's actions do internally to produce
the results.
The third, the mental model,' is knowledge of how the system
works, what its components are, how they are related, what the
internal processes are, and how they affect the components. It is
this conceptualization that allows the user not only to construct
actions for novel tasks but also to explain why a particular action
produces the results it does.
Simple Sequences
Users often have no knowledge of the underlying system or
even general rules for getting things done. Novices, in particular,
resort to a learning method that borders on rote memorization.
They learn sequences of actions that will get the system to do
common types of tasks. For example, in using the operating system
on the Michigan Terminal System to print the contents of a text
file with the laser printer, many users merely memorize the nearly
nonsense strings:
$RUN *textform scards = pc:fw.macros + file spunch = -x
'run a program called ~textformn with input from a master file
of parameters plus the inputfile, send the output to a temporary
file called ~x" '
$RUN *pagepr scards = -x par = onesided
'run a program called ~pagep~n with input from the temporary
file ~x" so that the output is printed on only one side of each
page'
where the only free parameter to be entered is the name of the
file after the ~+" in the first discards" designation. Similarly,
some word processors require the user to memorize short, common
~ This is a subset of the knowledge Rouse and Morris (1986) call mental
models. We would include knowledge that helps the user to explain the
function and states of the system and to predict its future behavior. We
would not include descriptions of its purpose and form, information that
seems shallow and unhelpful in a performance context.
6
OCR for page 7
command sequences to accomplish certain repetitive actions, such
as " XME" to exit, and " XBA" to enact the printing
sequence. A good clue as to how often users rely on these simple
sequences is to note the cheat sheets that they keep available when
they are using software, or the notes made and often stuck to the
side of the cathode-ray tube to remind the user of some commands
that are commonly used but difficult to remember.
Young (1983) described one way in which users think about
a calculator, as simple sequences or sets of task-action pairs. A
task includes something the user wishes to accomplish (e.g., an
arithmetic calculation or formula evaluation), which is associated
with an action, or what the user must do in order to accomplish
the task (e.g., key presses on a calculator). This knowledge is
in the form of paired associates, and like the sequences to print
a file described above, it has simple slots that indicate the free
parameters the user must designate to fit the current situation.
A second description of simple sequences of actions is the
keystroke mode} (Card et al., 1980a,b, 1983; Embley et al., 1978~.
The analyses in the keystroke models contain notations that de-
scribe what sequences of actions users make in invoking simple
commands: the keystrokes, mouse movements and so on. In Card
et al. (1980a,b, 1983) keystroke analysis, the analyst assumes that
the user needs time to make each act in producing the command:
a time to make a keystroke, a time to point with a mouse, a time
to move the hands from the keyboard to the mouse or back, and a
time to mentally prepare each command and its parameters. The
analysis assumes that users must retrieve each command sequence
from their memory, incurring a pause for mental preparation, and
then execute the components of the command, pausing for addi-
tional mental preparation times before each command word, each
parameter, and each delimiter (such as pressing a parenthesis,
return, or other type of operator). For example, a command se-
quence for using a line-oriented editor to search a file for an error
and fix it:
s /f "errorstring"
'search the whole file for an error'
a 16 "oldstring" newstring"
'alter line 16 so that the old string is replaced with the new
string'
7
OCR for page 8
would include mental preparations before each line and before each
parameter, such as "/f~ and "16,~ and the strings to be searched
for and replaced. Analysis proceeds by attaching a constant time
for each keystroke, movement, or mental preparation, affording
a prediction of how long the formulation and execution of each
command would normally require.
In the same spirit, Reisner (1984) assumes that the user needs
a fixed amount of time to make each individual act In producing
a command. Instead of one mental preparation time, however,
Reisner (1984) posits specific mental acts (e.g., retrieving from
long-term memory, calculating a number, copying a number), each
of which takes a different length of time. The analyst assumes (or
knows from prior experimentation) how the various parameters
are related (e.g., the time to calculate a number will be greater
than the time to copy that number from a display) without spec-
ifying each time exactly. Simple algebra ~ then used to predict
which of various whole design alternatives, or which of various user
methods, will require the shortest time to perform.
These analyses of simple sequences serve to facilitate both
comparison of existing software packages for the one that will re-
quire the shortest time to perform and the design and development
of new system languages.
i
Methods and Ways to Choose Among Them
Users~not only elicit simple sequences to fit simple situations
by rote; they sometimes also choose among various possible general
methods that fit a particular situation.2
A number of investigators have studied the organization of
more general actions as a function of task goals in the domain
of programming. A general finding is that skilled programmers
recognize aspects of particular situations and select general ac-
tions appropriate to them. For example, individual statements
or sets of lines of code in a program are "chunked" into higher-
order task-relevant structures. Skilled programmers can recall at a
glance more lines of code than novice programmers (Adelson, 1981;
McKeithen et al., 1981; Shneiderman, 1980~. This is consistent
2 these methods are similar to the procedures remembered and used
in the stage of "deciding and testing actions" in supervisory control tasks,
described by Sheridan et al. (1986~.
8
OCR for page 9
with prior studies of expertise and the organization of memory
(Chase and Simon, 1973; Egan and Schwartz, 1979; Reitman,
1976~. These studies suggest that in the skilled programmer's
knowledge base there is a mapping between chunks of actions or
methods (that often go together) and general task features, so that
the actions will be recalled and used at appropriate times in the
future. These chunks reflect a developed, deeper understanding
of routine programs, which are useful to a programmer writing
programs. Similarly, Ehrlich and Soloway (1984) have shown that
skilled programmers tend to employ patterns of actions, called
plans, consisting of routinely occurring sequences of programming
statements.
Furthermore, by examining the structure of recall protocols,
McKeithen et al. (1981) determined that skilled programmers or-
ganize their vocabulary of programming statements more stereo-
typically than do novice programmers. It appears that with ex-
pertise, the users' understanding converges to a similar set of
representations of concepts in the programming language. Data
base designers reveal mental organizations that become increas-
ingly homogeneous with greater expertise (Smelcer, 1986~.
A more complete theory about what the user knows about
how to accomplish a particular task is the GOMS model (Card et
al., 1983~. GOMS-is an acronym that stands for the elements of
what the user knows: the goals, the operators, the methods, and
selection rules. In the GOMS model, the user has a certain goal
to accomplish (such as editing a manuscript that has been marked
up). The user recognizes that this large goal can be broken into
a set of subgoals (such as finding each editing mark and making
the requisite changes). Subgoals are broken down into smaller and
smaller subgoals until they match a basic set of methods, that is,
sequences of operations that satisfy a small subgoal.
The GOMS model states that users have some rules by which
they choose the method that will fit the current situation. For
example, users may know that there are several methods that can
be used to find the first place in the manuscript to be edited: using
the search function with a distinguishing string to be found, using
the page-forward key until the target page is found visually, or
using the cursor key to find the specific target location visually.
People will choose whether to use the search, page-forward, or
cursor key method depending on how far away the next editing
target is assumed to be. Each of these methods is made up of
9
OCR for page 10
certain operators, key presses, and hand motions, as specified in
the keystroke mode! described above in the discussion of simple
sequences.
A number of empirical studies have shown that the predictions
of GOMS and the keystroke mode! are reasonably accurate, ant}
that sometunes one can even use the same time parameters across
applications. Card et al. (1983) showed that their parameters for
keystrokes and mental processing time were similar across text
processors, operating systems, and graphics packages. Olson and
Nilsen (1987) extended the analysis to show that the basic param-
eters applied well to spreadsheet software. However, additional
time parameters were required. One was to account for the time
it took users to scan the screen (for example, to find on the screen
the coordinates of a particular value in a spreadsheet). A second
time parameter was required to account for the time it takes the
user to choose between methods: the more methods to choose
from, the longer the pause before executing a simple sequence in
a command.
Command grammars use a different analytic representation,
but are analyzing the same kinds of mental events. The command
language grammar (COG) (Moran, 1981) and Backus normal form
(BNF) (Reisner, 1981, 1984) have been used to describe the orga-
-nization of sequences of actions that fulfill goals. These grammars
are sets of rules that show the different ways in which an "alpha-
bet~ of actions can be formed to produce acceptable Sentences
that are understandable to a system or a device.
For example, Reisner (1981, 1984) treats user actions that
are acceptable to the system as a language. She describes the
structure of this language as a BNF grammar. Figure 1 shows a
sample of what in this formalism are called rewrite rules. At the
higher levels are the user's task goals and the possible methods that
can achieve the goal. This is presumably a representation of the
components of plans the user has ready to evoke to fill an overall
task goal. Below these are the varieties of action sequences that
can be elicited in a method. The top several lines of Figure 1 are
similar to the goals/subgoals and methods of the GOMS analysis;
the lower levels are similar to the keystroke mode} sequences.
Compared to GOMS, this representation more compactly
shows the alternative ways to accomplish a task or to enact a
series of keystrokes; GOMS requires a new method for each al-
ternative. While various methods (represented as sentences from
10
OCR for page 11
Use On
Identify first line
Get first line on screen
"Locate" strategy
..>ldentify first line ~ enter On command
press ENTER
..>Get first line on screen ~ Move cursor to
first line
..>Use "Oocate" strategy use scroll strategy
..>Move cursor to command Input field ~ type
"locate" command ~ press ENTER
Move cursor to command Input field
muse cursor keys press PFCURSOR null
Type locate command ..>Type "locate" keyword ~ type line number
Type locate keyword
Type line number
..>L+O+C L L+O+C+A+T+E
..>Type number
FIGURE 1 A command grammar representation of actions necessary to
edit a line using a word processor. Rewrite rules applied to this domain are
compact definitions of the many acceptable ways to get something done in a
particular command language. One reads these rules from left to right; the
left-hand terms are made up of the elements listed on the right-hand side.
Elements connected by a An are executed in sequence, elements connected
by a ~—~ represent alternative ways of invoking the same goal. For example,
"Use Dn" consists of identifying the first line, then entering the "Dn"
command, and then pressing enter. Typing the locate keyword, however,
includes typing aLOC," "L,~ or "LOCATE." Source: Reisner (1984:53~.
such a grammar) can be compared to see which takes less time, a
grammatical representation is less adequate than GOMS in that
it lacks any way to represent how a user selects the method appro-
priate for the current situation.
The language format of grammars, however, allows the use of
standard sentence complexity measures to predict some aspects
of user behavior: the more rules, the long it takes a user to
learn; the greater the sentence (sequence) complexity, the longer
the pauses between keystrokes; the more terminal symbols in the
language, the harder the language is to learn. These predictions
have not been fully tested, and there is some suggestion in the
literature about language understanding that these measures do
not adequately predict how difficult it is to understand sentences
(Fodor et al., 1974; Miller, 1962~. The formalism, however, allows a
number of intriguing predictive possibilities for understanding and
recalling command languages. See Reisner (1983) for a discussion
of the potential value of such grammars.
11
OCR for page 12
Mental Models
In its most generic application, the term mental mode! could
be applied to any set of mental events, but few if anyone would
cistern such meaning for the word model. Somewhat narrower in
meaning, the term could be used for any thought process in which
there are defined inputs and outputs to a believable process which
operates on the inputs to produce outputs. In this sense, one
could have a mental mode} of one's own behavior (~If ~ do this,
then that will happens), another person's behavior, the input-
output characteristics of any software process run on a computer,
or any information process mediated by people or machines. It
could be a series of paired associates by which the user predicts,
through a causal chain, outputs of a process given its inputs.
Given these general possibilities for the term mental model, it
is most commonly used to refer to a representation (in the head)
of a physical system or software being run on a computer, with
some plausible cascade of causal associations connecting the input
to the output. Accordingly, the user's mental mode! of a system is
here defined as a rich ant! elaborate structure, reflecting the user's
understanding of what the system contains, how it works, and why
it works that way. It can be conceived as knowledge about the
system sufficient to permit the user to mentally try out actions
before choosing one to execute. A key feature of a mental mode}
is that it can be "rune with trial, exploratory inputs and observed
for its resultant behavior (Sheridan et al., 1986~.
Mental models are used during learning (such as using an
analogy to begin to understand how the system works), in problem
solving (such as in trying to extricate oneself from an error or
performing a novel task), and when the user is reflecting on or
attempting to rationalize or explain the system's behavior.
Users are typically described as using a mechanistic model;
that is, the user is assumed to have a conceptual "machines whose
simulated function matches the actual target machine in some
way.3 Three general kinds of models are called surrogates (Young,
1983), metaphors (Carroll and Thomas, 1982), and glass boxes
(DuBoulay et al., 1981~. A fourth kind of model, the network
3 This may be more due to the fact that researchers are good at
describing mechanistic models than to the fact that it is the only kind
of model people have. In fact, exploration of other representations is an
important research need.
12
OCR for page 13
model, is a composite, blending the features of surrogates and
glass boxes.
Surrogates
A surrogate is a conceptual analysis that perfectly mimics the
target system's input/output behavior and that does not assume
that the way in which output is produced in the surrogate is the
same process as that in the target system. It is a system that
behaves the same, but is not assumed to be isomorphic in its inter-
nal workings. Thus, while the surrogate always provides the right
answer (the one that the target system would have generated), it
offers no means of illuminating the real underlying causal basis
for the answer. It is a good, complete analogy that may allow the
user to construct appropriate behavior in a novel situation, but it
does not help the user explain whey the system behaves the way
it does.
Young (1983) noted that it is very difficult to construct an
adequate surrogate, even for a fairly simple system like a hand-
held calculator. This raises the question of whether people ever
hold surrogates in their minds, even for simple devices.
Metaphor Models
A metaphor mode! is a direct comparison between the target
system and some other system already known to the user. A com-
mon example, referred to widely in the literature, is the metaphor
that Ha text editor is a typewriter. Many investigators have
observed that new users spontaneously refer to this typewriter
metaphor during early learning about text processors (Bott, 1979;
Carroll and Thomas, 1982; Douglas and Moran, 1983; Mack et al.,
1983~. The explanations people offer for system behavior are often
couched in the vocabulary of the metaphor. Furthermore, the ex-
tent to which knowledge in the metaphor source domain matches
the target domain correlates with performance. That is, the task-
action pairs that fit both the metaphor source and the target
system are easy to learn; those that do not are often learned last
or remain constant sources of error. For example, learners have
less trouble learning how to use character keys than the backspace
and carriage return keys; the latter typically operate differently in
text processors than they do in typewriters.
13
OCR for page 14
Unlike surrogates, metaphor models are easy to construct or
learn, and they provide explanations of why the system behaves
as it does. However, metaphors vary greatly in accuracy. For
example, "the interface is a desktop" seems less accurate than
"values are put into storage locations."
One difficulty with using metaphors in analyzing users' be-
havior with computers is it is difficult to find out what the users'
metaphors are. As Young (1983) put it, a metaphor analysis ex-
changes the problem of describing what the user knows about
the target system for the problem of describing what the user
knows about the metaphor source. For example, user have to
know enough about pipelines for the metaphor "a flow chart is
a pipeline" to be useful. In addition, metaphors that map one
domain perfectly into another are rare. Consequently, metaphors
can sometimes be misleading as well as helpful. The hydrody-
namic metaphor for electric current, for example, is only good for
a limited subset of phenomena, and is misleading for many others.
Similarly, the typewriter metaphor for a word processor helps with
some actions (like using the backspace key), but interferes with
the learning of others (like the return key) (Douglas and Moran,
1983~.
Glass Box Models
Glass box models lie between metaphors and surrogates. They
are surrogates in that they are perfect mimics of the target sys-
tem. But they are metaphors in that they offer some semantic
interpretation for the internal components. For example, Mayer
(1976) discusses a glass box mimic for a BASIC-like program-
ming language. This glass box is not simply a surrogate, because
its components are presented via metaphors (input as a ticket
window, storage as a file cabinet). It can be run to perfectly
predict outputs from inputs, but it can also be interpreted via
these metaphors. Yet it is also not a simple metaphor; it is a
composite metaphor (Carroll and Thomas, 1982; Rumelhart and
Norman, 1981~. It does not merely exchange the target system for
a metaphor source in toto; it uses aspects of several metaphors to
provide the surrogate behavior.
Glass boxes have been used primarily in a prescriptive context
rather than in a descriptive one. Mayer's (1976) glass box is not a
mental structure that was cliscovered; it is a mental structure that
14
OCR for page 15
was taught to the user (e.g., subjects were instructed to think of in-
put as a ticket window). Studies of prescriptive conceptual models
tell us something about what kinds of models are useful, and about
models that people could generate. On the other hand, they can
validate prescriptive models that help users of complex systems
when it is hard for the user to deduce an adequate representation
merely from experience.
Network Representations of the System
Network representations contain the states a system can be
in and the actions the user can take that change the system to
another state (Miller, 1985~. One particular type of network rep-
resentation, the generalized transition network (GTN), contains
detailed descriptions of what the system does (Kieras and Poison,
1983~. GTN's are state transition diagrams that represent the vis-
ible states of the system (i.e., the display on the screen) as nodes,
and the actions the user can take at each state (the commands or
menu choices) as arcs. The connected nodes and arcs form a net-
work that shows the sequence of states that follow user actions at
each point in the software interaction. GTN's and other network
diagrams are often used as tools in system development, to give
the designer a picture to refer to in order to keep track of what
can be done at every state in the transaction. Figure 2 illustrates
a-portion of on-e of these networks for the actions that can be
taken when a user enters a system and loads the word processing
application.
Networks can also be used to describe what the user knows
about the system (Olson, 1987~. Olson (1987) suggests that GTNs
be used to represent users' knowledge of system states and allow-
able actions; these can be compared to the GTN of the actual
target system to measure the user's level of learning or under-
standing. Examination of the parts of the real GTN that are
missing in the user's representation could indicated areas in which
learning or remembering certain functions is difficult.
The GTN is like a surrogate representation in that it does
not give an underlying explanation about why the elements are
related in the way that they are nor how the internal system com-
ponents behave. Nor is there any indication of the purpose these
actions fill toward a user's goal. It does, however, represent what
15
OCR for page 16
,~-o5-
PROMPT ~/
To
(~(INPUT) I
PROMPT '
CANCL
CANCL ~ 2)
REVISE ~ (~)_ -
\\ UP
\ OTHER
'ILLEGAL ACTED
~3 ~
<~
CANCL ~ NO - BUT
INPUT—CANCL
CANCL ~ INPUT
INPUT_ NULL
ENTER ~ (am)
ADD CHAR TO INPUT
(~;~
FIGURE 2 A generalized transition network (GTN) representation of part
of the task of editing a document. Circles represent states or tasks, arcs
represent the connections between states, and labels to the arcs represent
the actions the user takes. Source: Kieras and Polson (1983:104).
16
OCR for page 17
the user knows about how the system works in simple stimulus-
reponse terms. A GTN displays the simple response that can be
expected from the system given each action the user takes. And,
importantly to the user, knowledge of these actions and their con-
sequences can be useful when the user must solve problems, either
when an error has just occurred or when a novel goal has arisen
and the user needs to decide on an appropriate sequence of actions.
Comparisons
It is useful to consider the relation between sequence/method
representations and mental models. People undoubtedly have both
kinds of knowledge when they use computing systems. But re-
search on these two approaches is largely complementary in that
the kinds of questions addressed about one kind of representa-
tion have been different from those about the other. Briefly, the
sequence/method representations are more analytic in that they
can predict behavior (except errors) in some detail. Although the
sequence/method approach has not typically dealt with predicting
user errors, attempts have been made to show how user learning
takes place. The mental models approach, on the other hand,
accounts for errors as well as accurate behavior in novel and stan-
dard situations, but does not predict the details of behavior well
nor how the models are learned.
- Sequence/method representations, because they are composed
of goal-action pairs, by their very nature predict how knowledge
is used. To ciate they have represented only how to accomplish
routine tasks (in which all the goal-subgoal and subgoal-action
relations have been worked out) but have little or nothing to
say about how knowledge is used in nonroutine tasks, such as
in recovering from an error or behaving in an entirely unfamiliar
situation. They do not have much generality in their conditions.
And, there is no posited mechanism for problem solving when a
new situation fits several general condition-action pairs. Without
this mechanism, these analyses cannot account for errors.
Some attempt has been made to account for how sequence/method
representations are learned. Lewis (1986) provides an account of
how users might acquire goal-action knowledge after they watch
another person use the system. Through several simple heuristics
that link actions to probable causes, the user begins to build a
reasonable set of rules. The acquisition of rules is detailed by
17
OCR for page 18
Lewis (1986), but the further learning in fine-tuning those rules
is not covered. Kieras and Bovair (1986) do not explain original
learning per se, but have shown that learning a new system is
speeded up if the user is familiar with another system that has
many of the same rules in cornrnon. Neither of these approaches
addresses the continued learning that goes on as the user acquires
or discovers new strategies for efficiency.
Research on mental models, on the other hand, has not con-
centrated on the details of how a user uses a mental mode] nor how
it is acquired. Douglas and Moran (1983) have produced the most
detailed analysis of the behavior of a user who has a mental model.
They examined the analogy of "a text processor ~ a typewriters
by noting the typewriter condition-action sequences that matched
and mismatched those in the new system. Those condition-action
pairs that matched were learned easily and quickly, and those
that did not match produced continued errors and pauses. Other
researchers have attempted to make the analysis of the behavior
of the user who has a mental mode! more specific and revealing
(Foley and Williges, 1982; Moran, 1983; Payne and Green, 1983~.
What is missing from these analyses, however, is how users use
their mental models to come up with a set of appropriate actions.
There are likely to be some very interesting cognitive actions going
on in the pause between the presentation of the problem (e.g., the
feedback from the screen after an error) and the choice of the next
action.
Most - of the empirical work on the effectiveness of mental
models and the predictive power of sequence/method analyses has
been at a gross behavioral level. The studies of experts' chunking
of information (Chase and Simon, 1973, for example) are almost
completely empirical; they focus entirely on the acquisition of the
condition part of a condition-action pair and offer little basis for
theory. The grammatical approaches often hold a key assumption:
that the fewer actions there are per task, the cognitively simpler
the task. Recent work has raised questions about the accuracy of
this assumption (Olson and Nilsen, 1988; Rosson, 1983~. There
are occasions when a task has a few actions, but the planning and
calculating necessary to make those actions is difficult.
Moran has described a number of connections and contrasts
between sequence/method and mental model approaches. The
GOMS analysis (a methods analysis) and COG (a blend of method
and mental model) sprang from common theoretical roots. Indeed,
18
Representative terms from entire chapter:
simple sequences