Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 4
HUMAN FACTORS METHODS IN RESEARCH AND PRODUCT DES IGN
ANALYS IS: GATHERING IDEAS
The ideas behind products typically arise from three
major sources: from the redesign of an existing product,
from an identified need in the marketplace, and from a
new technological capability that provides a useful new
function to users. Information about the success of
existing products can be obtained either by asking their
users for their opinions and uses of the systems or by
gather ing unobtrusive data about their use. Information
about a new product can come from reports of needs from
potential users.
Reports from Users
Questionnaires and interviews are the most common
methods for gathering information about the success of a
product or the needs for new functions or a new product.
Both questionnaires and interviews are good methods for
eliciting information about how a person goes about his
or her work, what aids or tools he or she uses or desires,
what kind of knowledge or training is required to do the
work, what difficulties he or she reports about the work,
where the work originates and where it goes, what inter-
actions are necessary with other people to do the work,
and how the user thinks the work process could be
improved. Questionnaires are more rigid in format than
interviews, since interviews can go where the interviewee
leads, often uncovering unanticipated new information.
The principal disadvantage of interviews, however, is
that they are time-consuming; only one person can be
interrogated at a time. By aggregating information from
4
OCR for page 5
a number of interviewees or questionnaires, one can
construct a general picture of users' needs and construct
some tentative system concepts for helping the users do
their work (Relley and Chapanis, 1982; Rosson, 1983).
Diaries provide a similar form of informal data
gathering and are used to uncover the needs and capabil-
ities of the potential users of a new product. Data
about work can be gathered in detail over a long period
of time, especially about how much time particular kinds
of activities take and their sequential dependencies.
Because a shorter time elapses between the occurrence of
an event and its report, diaries give a more accurate
record of actual activity than retrospective reports in
questionnaires and interviews (Mantel and Haskell, 1983).
A common marketing technique for gathering information
about existing or potential users' needs is the focus
group. Instead of interviewing a single user at a time,
groups of users who are either similarly trained or who
share common goals are first told about some potential
capabilities of a system, then asked to discuss how they
might find uses for these capabilities. Occasionally
active brainstorming from these sessions generates very
good ideas. The same kind of method is used to collect
opinions about an existing product and to ask for sug-
gestion. for improvements. Often designers will gather
expert users of a system and ask their opinion about how
to improve the system or how to design a new, co~puter-
based tool for aiding their work (Al-Awar et al., 1981).
The advantage of such methods is that the participants
stimulate each others' thoughts , uncover ing idea. or
suggestions they may not have thought of individually.
That is also its disadvantage: a participant's true
opinions can be swayed by group pressure.
Inferring Needs from Natural Observation
.~
One of the main drawbacks of the methods listed above
is that they rely on users' perceptions of their needs
and capabilities. Sometimes new products meet needs
unforeseen by their users; sometimes users, either
consciously or unconsciously, distort their daily work
activities and feelings about existing working conditions.
In such cases, it may be better to collect information,
not by asking users, but by watching their behavior and
inferring their needs and capabilities from their
activities.
OCR for page 6
6
Two methods are often used to collect information
about users' behavior in natural work settings. In the
case of activity analysis, an observer watches and
records certain behaviors of the workers. The data may
be collected by direct observation or by analyzing video
or film recordings. Individual samples of categorized
activities are aggregated into activity frequency tables,
graphs, or state transition diagrams. Such performance
analyses are particularly useful in assessing the changes
made in work by comparing activity before and after a new
system or design change is implemented "Hartley et al.,
1977; Hoecker and Pew, 1980).
Logging and metering techniques involve observations
of what a user does with a system, but the measurement is
embedded directly into the software. These procedures
can include a simple record with a time-stamp of every
interaction that a user makes with the computer, or it
can involve a complete hard copy representation of a
sequence of particular display frames. Powerful logging
and metering software can also categorize certain
recognizable events and summarize their times. For
example, one could summarize such events as time to
complete a task, user and/or system response time, and
frequencies and types of errors.
Logging and metering procedures are typically embedded
in the operational software. Where there are limits to
the access to such software, one can connect a second
computer in tandem to the first and direct data about the
user's activities to it, in essence providing a Impassive
tap.. In this way, logging does not interfere with system
response times, and information about the user inputs and
the system responses can be recorded in detail for future
use (see Whiteside et al., 1982; Goodwin, 1982).
DESIGN: THE INITIAL DESIGN
Designers go through two stages in constructing an
initial design, either implicitly, driven by intuition or
experience, or explicitly, using some or all of the
detailed tools described below. First, the designers
decide what the user is going to do, conducting an
informal or formal task analysis. Second, they specify
what the interface will look like and what the dialog
will consist of. There are a variety of methods that
apply to thin stage, where designers use informal or
OCR for page 7
formal guidelines, consult end users, or have some
theory-based judgments to draw on.
Determining What the User Needs to Do
The most common form of analyzing the user's activities
is called a task analysis . Task analysis is the proces s
of analyzing the functional requirements of a system to
ascertain and descr ibe the tasks that people perform. It
focuses both on how the system f its within the global task
the user is trying to perform (e.g., prepare a report of
a projected budget) and what the user has to do to use
the system (e .g ., access the application program, access
the data files, etc.).
Task analysis has two major aspects: the first
spec i f ies and descr ibes the task s, and the second, and
more important, analyzes the specified tasks to determine
such system or environmental characteristics as the
number of people needed, the skills and knowledge they
should have, and the training necessary. The first step
involves decomposition of tasks into their constitutent
subtasks and annotating each subtask for its essential
elements and their interdependencies. The second step
involves examination of the actual tasks and interdepen-
dencies, assessing how difficult each is, what knowledge
is required, where the information resides, etc. Results
of task analyses are used not only in writing functional
specif ications for a particular application, but also for
assigning work to groups of workers, arranging equipment
in an efficient configuration, determining task demands
on people, and developing operating procedures and train-
ing manuals (see Bullen and Bennett, 1983: Bullen et al.,
1982).
Specifying the Initial Design
, ..
An initial system or interface design is constructed
next. With the global tasks the user has to perform
specified as above, the designer groups the subtasks
according to logical function from the perspective of the
user but tempered by system/hardware constraints. Then
the actual interface or system details come from three
sources: design guidelines or principles, intuitions of
the designer sometimes aided by intuitions of the users
themselves, and theory-based judgments.
OCR for page 8
8
In generating an initial design, the designer can
address existing design guidelines for general prescrip-
tions o f hew to specify particular components of the
interface. For example, if the interface has a menu, the
guideline may prescribe that the alternatives should be
listed by order of frequency of use or cluster them
according to functional similarity, rather than displayed
alphabetically or randomly. Current design guidelines
(e.g., Woodson and Conover, 1966; Van Cott and Rinkade,
1972) include prescriptions about such topics as the
readability of type fonts, the brightness levels of
display screens, keyboards designed to fit hand ah ape and
function, and rules for making abbreviations and symbols
(see also Schneiderman, 1982; Smith, 1982).
Current guidelines, however, are more concerned with
perceptual and performance characteristics than with the
cognitive properties of the interaction. Thus, they
would prescribe appropriate type fonts, but not what
words these fonts should express to the user to suggest
the appropriate analogy for performing the task on the
system. There are several major caveats in the use of
design guidelines: the prescriptions or recommendations
contained may have been derived from situations or
research not applicable to the system being designed; new
or unaccounted for variables may interact in unanticipated
ways; and current guidelines do not always publish the
source of the recommendation, whether it was generated by
a controlled laboratory study or derived from the col-
lected wisdom of experience. Guidelines have to be
applied with care.
Though design guidelines have their flaws, they are
very useful in placing a particular new design in a
setting of conventional wisdom. Often the designer,
skilled in interacting with systems and cognizant of the
end tasks that are being supported in this design, cannot
foresee the difficulties the new user will have with the
system. Design guidelines provide suggestions to the
designer that will in many cases be better than those
based solely on intuition. (For a recent version of
guidelines, see Smith, 1984.)
The skills and knowledge of users themselves can be
used to advantage by incorporating users in the design
team. Users can provide some critical insights about how
they think of the tank and thus the system (e.g. , what
kinds of information should be accessible when, what the
screens should look like to mimic the original, a
noncomputer version of the task, what commands ought to
OCR for page 9
9
be called) . They know the procedur es and terminology
and, with proper support, can contr ibute to the design
and layout of forms and menus as well as act as critics
of the design. Gould and Lewis (1985) and Miller and Pew
(1981) provide examples of the involvement of users in
the design process. Other ways in which the sophisticated
user can be involved in the des ign of sof twar e systems
can be found below in the section on prototype testing
with users.
A third source of information about the original design
specification is psychological theories. Theory-based
judgments can constrain aspects of a design or suggest
promising areas of investigation. For example, theories
of color contrast can provide insight into the appro-
priateness of certain combinations used in screen high-
lighting or predict the readability of a new monochrome
display color. Because Fitt's Law accounted for movement
time for placing a cursor in a desired position with a
mouse and for placing the appropr late f inger on a desired
key location, two conclusions follow: the invention of
faster pointing devices was unlikely to increase perfor-
mance and the design of keyboards with larger per ipheral
key caps would increase the accuracy of keying (Card et
al., 1978; Card et al., 1980b).
Part of the difficulty in constructing a design and
analyzing its usability has to do with how the interface
is specified. Verbal descriptions of how a system works
are particularly unsuited for conveying the flow of an
interaction and the choices the user has at each point.
Several specification languages or formats have been
explored recently not only to serve as a way of conveying
to those who actually build or code the system what it
will do but also as a way of concretely specifying the
system to analyze its usability.
One way to specify the interaction is to use an inter-
active tool kit called a human-computer dialog management
system. This system guides the definition of the inter-
action language that describes the actions of the user
and the system and the screen formats displayed at each
moment. Hartson et al. (1984), Jacob (1983), and
Wasserman (1982) provide good examples of this kind of
inter face def inition.* A second format for displaying
*This is also a system that allows rapid embodiment of
the functioning of a new, developing system and thus is a
tool for r apid prototyping .
OCR for page 10
10
what the system does at each state is a state transition
diagram, r ecently used as a descr iption of a system ' s
work ings in Kieras and Polson (1983 ) .
DESIGN: FORMAL ANALYSIS OF TlIE: INITIAL DESIGN
Once an initial design is specified, even if it in a
par tial design, it can be sub jected to several kinds of
scrutiny. The goal in this analysis stage is to make the
initial design as good an possible before it is made into
the prototype for user testing. Three methods aid in
this process: structured walk-throughs, decomposition,
and task-theoretic analytic models.
Structured walk-throughs involve construction of
tasks that a user carries out on a simulated system. The
user tries out the system by going through the task, step
by step, screen by screen, command by command. This can
be done with the design as specified in a number of
different formats, using an experimental simulation of a
prototype or even with the experimenter presenting paper
and pencil figures of the screens, menus, and commands in
the appropriate sequence. The technique helps to identify
confusing, unclear, or incomplete instructions, illogical
or inefficient operations, unnatural or difficult proce-
dures, and procedural steps that may have been overlooked
because they were implicitly rather than explicitly
defined. Gould et al. (1983), Ramsey (1974), Ramsey et
al. (1979), and Weinberg and Friedman (1984) provide
examples of the use of structured walk-throughs.
A second kind of formal analysis, called decomposition,
is proposed in Reitman et al. (1985). In this analysis,
the major components of the design are separated and
analyzed for their impact on cognition. The picture
displayed on the screen, for example, is assessed for how
it helps or hinders the user's ability to perceive mean-
ingful relationships or the system model. The commands
are assessed for their load on long-term memory, how easy
they are to remember, and how confusable they are among
each other. For each component, a second design alterna-
tive is constructed to fit within the general guidelines
of usability. Then, through discussion and debate, the
design team decides which alternative of each component
is the better design. This method encourages careful
scrutiny of the proposed design and often encourages
designers to specify better interfaces before the first
prototype is bu ilt .
OCR for page 11
11
The third kind of formal techniques invoke task-
theoretic analytic models. These models provide
representations and analyses that assess, for example,
which parts of a metaphor aid performance and which do
not (Douglas and Moran, 1983) and how big the user's
short-term memory load is at each step of th. interaction
(Rieras and Polson, 1985). Prime examples of these tech-
niques include metaphor analysis (Carroll and Thomas,
1982; Carroll and Mack, 1982), assessment of mental
models (deKleer and Brown, 1983; deKleer and Brown, in
press; and others in Gentner and Stevens, 1983), develop-
ment of production rule systems that represent the user's
knowledge of the tank (Kieras and Polson, 198S), object/
action analysis (called External/internal task mapping.
by Moran, 1983), the GOMS model (Card et al., 1980b;
1983), and formal grammar notation systems (Reisner,
1981a, 1984; Blesser and Foley, 1982).
These task analytic models are very useful tools.
However, none of them yet encompasses all of the cogni-
tive aspects of the interaction; each focuses on one or
more important aspects. These methods require training
to use and often take a long time. However, they all
have the advantage of being based on sound theories of
human behavior and can provide important analysis of
usability before any coding of software or running of
subjects is contemplated. There is a trade-off, then,
between time spent in analysis and time spent testing
users in the laboratory or the field. The hope embodied
in this approach is that as the science of user-interface
design grows, analytic tools will improve to the point of
making the actual user testing of designed systems merely
a last, short check of a good, finished design.
DESIGN: BUILDING A PROTOTYPE
Three methods provide simulations or quick versions of
significant aspects of a new system so it can be tried by
actual users. The methods are called facading, the
Wizard of Oz technique, and rapid prototyping.
Facading is the technique of quickly and inexpen-
sively building a simulation of the external appearance
(i.e., the ~facade.) of a system's interface. Its advan-
tages are that it is quick and relatively easy; the target
system's underlying complexity and/or final computational
capability is ~finessed.. To be maximally beneficial,
the facade must embody some level of the functional
OCR for page 12
12
capability of the final target system. It does not just
generate a series of static snapshots of the system but
rather includes the control structure, flow, or connectiv-
ity of the final system. Hanau and Lenorovitz (1980) and
Lenorovitz and Ramsey (1977) provide good examples of the
use of this technique.
A variant of the facading technique is the Wizard of
Oz technique. Instead of having the computer embody the
simulated system, hidden human operators intercept user
commands and provide output back to the user. Often the
technique is used to test a new interface language: the
hidden human operator intercepts the new commands, trans-
lates them into the real system commands, and, after
receiving output from the real computer system, retrans-
lates them back to the tested end-user (see Gould et al. ,
1983; Gould and Boies, 1978; Ford, 1981; Relley, 1983;
Wixon et al., 1983).
Rapid or fast prototyping are terms appl fed to the
more formalized building of a prototype in a hurry. The
speed of building a running system depends mainly on the
underlying supporting software, which makes the specific
prototype programmable from existing modules. Ideally,
the prototype programming language separates elements of
the dialog from the actual implementation software. For
example, the designer can specify the placement of the
command input line or the menu choices variously without
having to program new modules to execute these different
input formats. One of these, the dialog management
system,. is under development by Hartson and his
colleagues (Mattson et al., 1984 s Yunten and Rartson,
1984)S another system is described in Wasserman (1982)
and Wasserman and Shewmake (1982). Another project that
uses rapid prototyping methods is reported in Hayes et
al. (1981).
DES IGN: PROVE TESTING WITH {USERS
When a prototype of some form has been built, actual
users are then brought in to use the system and report
their opinions about it. These tests can vary greatly in
how well controlled their designs are and how representa-
tive the set of tested users are of the final population
of users. Moreover, users are asked to perform several
kinds of tasks, some testing the normal, frequent talks
that regular users will be expected to perform, others
testing those subtasks thought to be especially difficult
OCR for page 13
13
either for the system (e.g., those producing long system
response times) or for the user (e.g., the longest
sequence of commands for a particular type of task).
Prototype tests differ in what kinds of data are taken
from the user--times and errors, thinking aloud protocols,
or attitudes.
Experimental Designs
Field tests to evaluate systems are fashioned after
laboratory tests common in the academic field of experi-
mental psychology. In general, they require the compari-
son of at least two systems, systems that differ in only
one component or variable. Measures are designed to
reflect the performance attributable to the effects of
that variable, and subjects are chosen to be representa-
tive of the population of end users. Of particular impor
Lance are various techniques for controlling irrelevant
variables. For example, one must ensure that measures of
intelligence of the test subjects do not differ across
both conditions, affecting the results in addition to the
effects of the independent variables.
Often the rules of good experimental design are
violated in the interest of proceeding quickly. Subjects
who are different from the end users but more available
may be testeds comparisons may be made between two systems
that differ on more than one variable; measures may be
taken that are less senettive than those that will
directly test why performance on one system is better or
worse than another; occasionally only one system is
tested and performance on it is measured against some
predetermined standard (e.g., a 10-minute rule for time
to learn a system). The closer the test is to good
experimental design, the more quickly the findings can
advance knowledge about the important aspects of scud
human-computer interface. However, as is often the case
in development, the goal is not ultimate knowledge but
rather global assessment of the adequacy of a particular
interface or system. A compromise design procedure is
described in Reign et al. (1984). The use of experi-
mental design is found in Ledgard et al. (1981), Reianer
et al. (1975), Reianer (1977, 1981b), and Williges and
Williges ( 1982) .
One variant from controlled experimental evaluation
that has been found useful in the development of inter-
faces is called guasi-experimental design. These
-
OCR for page 14
14
designs involve capturing data at several time intervals
typically of durations measured in weeks or months.
Sometime during the data capturing intervals, a change or
a modification of a system is introduced; the data being
captured are expected to reflect the impact of this
change. Some of these quasi-experimental designs allow
for comparisons with a control group. These designs are
hard to control, since the investigator must typically
take existing groups of users, giving one the change and
the other no change. Inherent differences in existing
groups is a major worry in evaluating the results. A
complete description of this technique can be found in
Cook and Campbell (1979); Roltum (1982) and Rice (1982)
provide good examples of this method.
Selection of Tasks to Perform
There are two reasons one has users try out a prototype
system: to identify points of difficulty for the user so
that those points can be redesigned and to measure ~tan-
dard use of the system, so that later changes in hardware
can be assessed or so those concerned with the staffing
of a large operation of users can determine how many
people will be needed. For the first purpose, tasks are
selected that stress the system and the user, generally
called critical incidents. For the second purpose, tasks
are selected to estimate basic characteristics of the
system's use, called benchmark tests.
In terms of critical incidents, the goal is to set
up situations or tasks that have been shawn historically
to tax the user and/or the system and are sufficiently
important that they can make the difference between
success or failure on task or system performance. One
might, for example, require the user to access item.
distant from what is being presented on the current
screen or to perform a long command sequence, to deter-
mine the loads of this part of the design on the user's
ability to imagine the stored information's underlying
structure or the mnemonic characteristics and grammatical
rules implied by the command sequences. The goal is to
set up situations in which the data will tell the
designers something about the limits of human or system
performance. These tasks are illustrated in the work of
Al-Awar et al. (1981), Relley and Chapanis (1982), and
Flanagan (1954) .
OCR for page 15
15
In benchmark tests, the goals are quite different.
The designer wants to measure the likely performance
times and errors expected in normal use. The tasks are
not designed to tax the system or the user, but rather to
be representative of the kinds of frequent tasks the
system will normally support. Typically, tasks are
constructed to measure the expected amount of time it
takes a new user to learn a system, the amount of time it
takes the user to perform a set of predefined tasks, and
the amount of time it takes the system to respond to a
user's request. A good study that illustrates the use of
this method is that of the evaluation of eight text
editors by Roberto and Moran (1983). A study of data-
base interfaces using benchmarks was done by Man tei and
Cattell (1982).
Kinds of Data Collected
There are four major kinds of data collected in tests
of systems: the time it takes to perform a task, the
frequency and kinds of errors, the goals and intentions
of the users, and the attitude of the user.
The amount of time a task takes (either how long an
entire task takes or how long each successive keystroke
takes) reflects the time it takes the user to perceive
inputs, categorize and plan appropriate actions, and
execute proper responses. Error frequencies and types
reflect the difficulties users have with these processes
and often point to the cause of the error (whether the
error response is similar to one in a similar plan, was
generated from confusion with a similar screen, has a
label that sounds the same as another, etc.) A simple
analysis of users' times and errors is found in Reisner
et al. (1975) and Reisner (19771. A comprehensive
analysis of users' times is found in Card et al. (1980b,
1983). Other uses of times and error. can be found in
Boies (1974), Rosson (1984), Sheppard and Rruesi (1981),
and Thomas and Gould (197S).
A more thorough, complicated k ind of data to collect
during evaluation involves the user's chiming aloud
while performing the task. Typically the user is video-
and sound-recorded while he or she is performing the
tasks . The r ecording captures what is said and done,
what is displayed on the Screen, what sections of the
documentation are being examined, what parts of the task
instructions the user is reviewing, etc. The most
OCR for page 16
16
complete protocols ask the subjects to verbalize their
intentions, what their goals are, and what current plans
they have about reaching their goals. Other behavior is
directly observable s thoughts and plans typically are
not. This method has been used by Mack et al. (1983),
Carroll and Hack (1982), and Card et al. (1980a) in their
studies of skilled text editing. More complete descrip-
tions of the technique and its advantages and disadvan-
tages can be found in Lewis (1982), Olson et al. (1984),
and Ericsson and Simon (1980).
A third kind of data collected in evaluation sessions
is the users t opinions about the system's ease of use
and functionality. A common instrument used to scale
users' global attitudes about the system is the evalua-
tion component of Oagood et al.'s (1957) Semantic
Differential (see Good, 1982, for an example of its
use). Questionnaires and interviews also tap users'
reactions to particular components of the system. One
problem with users' reports, however, is that they are
typically distorted by their experience with other,
similar systems. Or a user may have difficulty separating
components of the system sUchs for example, a user who
has a very difficult time using a system may report that
he or she likes it a great deal, recognizing how much
easier it is to perform the task on a computer compared
with previous manual methods.
Redesign
Typically as the prototype of the original design is
tested, errors are found and revisions suggested. The
methods appropriate to the initial design are appropriate
also at the stage of redesign. This part of the design
process iterates through ~fixing. and ~testing. until
either an acceptable level of performance ts reached or
the deadline for developing the system is reached.
IMPI~MENTATION: MONI=R=G ==I== Pats
1
Just as data were collected in the original conception
and analysis phase of product development, data are col-
lected on the system Be implemented. At this stage,
activity analyses, diaries, logging and metering, and
questionnaires and interviews are all appropriate methods
for assessing whether the product as designed is performs
OCR for page 17
17
ing as predicted in the f inal environment . I f problems
are found in the field, either small corrections are made
in the code (e.g., changing what a command is called is
easy to change in the code but can have an enormous
impact on the ease of use), or a redesign is called for,
sending the product design process back to prototype
development or fully back to the top of the cycle.
Representative terms from entire chapter:
design guidelines