Defining Requirements and Design
Inadequate user requirements are a major contributor to project failure. The most recent CHAOS report by the Standish Group (2006), which analyzes the reasons for technology project failure in the United States, found that only 34 percent of projects were successful; 15 percent completely failed and 51 percent were only partially successful.
Five of the eight (highlighted below) most frequently cited causes of failure were poor user requirements:
13.1 percent, incomplete requirements
12.4 percent, lack of user involvement
10.6 percent, inadequate resources
9.9 percent, unrealistic user expectations
9.3 percent, lack of management support
8.7 percent, requirements keep changing
8.1 percent, inadequate planning
7.5 percent, system no longer needed
Among the main reasons for poor user requirements are (1) an inadequate understanding of the intended users and the context of use, and (2) vague usability requirements, such as “the system must be intuitive to use.”
Figure 7-2 shows how usability requirements relate to other system requirements. Usability requirements can be seen from two perspectives: characteristics designed into the product and the extent to which the product meets user needs (quality in use requirements).
There are two types of usability requirements. Usability as a product quality characteristic is primarily concerned with ease of use. ISO/IEC 9126-1 (International Organization for Standardization, 2001) defines usability in terms of understandability, learnability, operability, and attractiveness. There are numerous sources of guidance on designing user interface characteristics that achieve these objectives (see the section on guidelines and style guides under usability evaluation). While designing to conform to guidelines will generally improve an interface, usability guidelines are not sufficiently specific to constitute requirements that can be easily verified. Style guides are more precise and are valuable in achieving consistency across screen designs produced by different developers. A style guide tailored to project needs should form part of the detailed usability requirements.
At a more strategic level, usability is the extent to which the product
meets user needs. ISO 9241-11 (International Organization for Standardization, 1998) defines this as the extent to which a product is effective, efficient, and satisfying in a particular context of use. This high-level requirement is referred to in ISO software quality standards as “quality in use.” It is determined not only by the ease of use, but also by the extent to which the functional properties and other quality characteristics meet user needs in a specific context of use.
In these terms, usability requirements are very closely linked to the success of the product.
Effectiveness is a measure of how well users can perform the job accurately and completely.
Efficiency is a measure of how quickly a user can perform work and is generally measured as task time, which is critical for productivity.
Satisfaction is the degree to which users like the product—a subjective response that includes the perceived ease of use and usefulness. Satisfaction is a success factor for any products with discretionary use, and essential to maintain workforce motivation.
Uses of Methods
Measures of effectiveness, efficiency, and satisfaction provide a basis for specifying concrete usability requirements.
Measure the Usability of an Existing System
If in doubt, the figures for an existing comparable system can be used as the minimum requirements for the new system. Evaluate the usability of the current system when carrying out key tasks, to obtain a baseline for the current system. The measures to be taken would typically include
success rate (percentage of tasks in which all business objectives are met).
mean time taken for each task.
mean satisfaction score using a questionnaire.
Specify Usability Requirements for the New System
Define the requirements for the new system, including the type of users, tasks, and working environment. Use the baseline usability results as a basis for establishing usability requirements. A simple requirement would be that when the same types of users carry out the same tasks, the success rate, task time, and user satisfaction should be at least as good as for the current system.
It is useful to establish a range of values, such as
the minimum to be achieved,
a realistic objective, and
the ideal objective (from a business or operational perspective).
It may also be appropriate to establish the usability objectives for learnability, for example, the duration of a course (or use of training materials) and the user performance and satisfaction expected both immediately after training and after a designated length of use.
It is also important to define any additional requirements for user performance and satisfaction related to users with disabilities (accessibility), critical business functions (safety), and use in different environments (universality).
Depending on the development environment, requirements may, for example, either be
iteratively elaborated as more information is obtained from usability activities, such as paper prototyping during development, or
agreed by all parties before development commences and subsequently modified only by mutual agreement.
Test Whether the Usability Requirements Have Been Achieved
Summative methods for measuring quality in use (see Chapter 8) can be used to evaluate whether the usability objectives have been achieved. If any of the measures fall below the minimum acceptable values, the potential risks associated with releasing the system before the usability has been improved should be assessed. The results can be used to prioritize future usability work in subsequent releases.
The Common Industry Specification for Usability Requirements (Theofanos, 2006) provides a checklist and a format that can be used initially to support communication between the parties involved to obtain a better understanding of the usability requirements. When the requirements are more completely defined, it can be used as a formal specification of requirements. These requirements can subsequently be tested and verified.
The specification is in three parts:
The context of use: intended users, their goals and tasks, associated equipment, the physical and social environment in which the product will be used, and examples of scenarios of use. An incomplete understanding of the context of use is a frequent reason for partial or complete failure of a system when implemented. The context of use is composed of the characteristics of the users, their task, and the usage environment. There are several methods that can be used to obtain an adequate understanding of this type of information (see Chapter 6).
Usability measures: effectiveness, efficiency, and satisfaction measures for the main scenarios of use with target values when feasible.
The test method: the procedure to be used to test whether the usability requirements have been met and the context in which the measurements will be made. This provides a basis for testing and verification.
The context of use should always be specified. The importance of specifying criteria for usability measures (and an associated range of acceptable values) will depend on the potential risks and consequences of poor usability.
Communication Among Members of the Development Team
This information facilitates communication among the members of the development or supplier organization. It is important that all concerned groups in the supplier organization understand the usability requirements before design begins. Benefits include the following:
Reducing risk of product failure. Specifying performance and satisfaction criteria derived from existing or competitor systems greatly reduces the risk of product failure as a result of releasing a product that is inferior to existing or competitor systems.
Reducing the development effort. This information provides a mechanism for the various concerned groups in the customer’s organization to consider all of the requirements before design begins and reduces later redesign, recoding, and retesting. Review of the requirements specified can reveal misunderstandings and inconsistencies early in the development cycle, when these issues are easier to correct.
Providing a basis for controlling costs. Identifying usability requirements reduces the risk of unplanned rework later in the development process.
Tracking evolving requirements by providing a format to document usability requirements.
Communication Between Customers and Suppliers
A customer organization can specify usability requirements to accurately describe what is needed. In this scenario, the information helps supplier organizations understand what the customer wants and supports the proactive collaboration between a supplier and a customer.
Specification of Requirements
When the product requirements are a matter for agreement between the supplier and the customer, the customer organization can specify one or more of the following:
intended context of use,
user performance and satisfaction criteria, and
The Common Industry Specification for Usability Requirements provides a baseline against which compliance can be measured.
Contributions to System Design Phases
Usability requirements should be integrated with other systems engineering activities. For example, the ISO/IEC 15288 standard (International Organization for Standardization, 2002) for system life-cycle processes includes the user-centered activities in the stakeholder requirements definition process as shown in Box 7-1.
User-Centered Activities for Stakeholder Requirements
Strengths, Limitations, and Gaps
Establishing high-level usability requirements that can be tested provides the foundation for a mature approach to managing usability in the development process. But while procedures for establishing these requirements are relatively well established in standards, they are not widely applied or understood, and there is little guidance on how to establish more detailed user interface design requirements.
With most emphasis in industry on formative evaluation to improve usability, there is often a reluctance to invest in the summative evaluation in the final development of the project. Formal summative evaluation in terms of established usability criteria is needed to determine valid usability.
As much of systems development is carried out on a contractor-supplier basis (even if the supplier is internal to the customer organization), it is for the contractor to judge whether the investment in establishing and validating usability requirements is sufficient to justify the associated risk reduction.
Usability requirements can also provide significant benefits in clarifying user needs and providing explicit user-oriented goals for development, even if they cannot be exhaustively validated. If there are major usability problems, even the results from testing three to five participants would be likely to provide advance warning of a potential problem (for example, if none of the participants can complete the tasks, or if task times are twice as long as expected).
WORK DOMAIN ANALYSIS
Among the questions that arise when facing the design of a new system are the following: What functions will need to be accomplished? What will be automated, and what will be performed by people? If people will be involved, how many people will it take, and what will be their role? What information and controls should be made available, and how should they be presented to enhance performance? What training is required?
One approach to answering these questions is to start with a list of the tasks to be accomplished and perform task analyses to identify the sequence of actions entailed, the information and controls required to perform those actions, and the implications for number of people and training required. This approach works well when the tasks to be performed and conditions of use can be easily specified a priori (e.g., automated teller machines). However, in the case of highly complex systems (e.g., a process control
plant, a military command and control system) unanticipated situations and tasks inevitably arise.
Work domain analysis techniques have been developed to support analysis and design of these more complex systems, in which all possible tasks and situations cannot be defined a priori. Work domain analysis starts with a functional analysis of the work domain to derive the functions to be performed and the factors that can arise to complicate performance (Woods, 2003). The objective is to produce robust systems that enable humans to effectively operate in a variety of situations—both ones that have been anticipated by system designers and ones that are unforeseen (e.g., safely shutting down a process control plant with an unanticipated malfunction).
Work domain analysis methods grew out of an effort to design safer and more reliable nuclear power plants (Rasmussen, 1986; Rasmussen, Pejtersen, and Goodstein, 1994). Analysis of accidents revealed that operators in many cases were faced with situations that were not adequately supported by training, procedures, and displays because they had not been anticipated by the system designers. In those cases, operators had to compensate for information or resources that were inadequate in order to recover and control the system. This led Rasmussen and his colleagues to develop work domain analyses methods to support development of systems that are more resilient in the face of unanticipated situations.
A work domain analysis represents the goals, means, and constraints in a domain that define the boundaries within which people must reason and act. This provides the framework for identifying functions to be performed by humans (or machines) and the cognitive activities those entail. Displays can then be created to support those cognitive activities. The objective is to create displays and controls that support flexible adaptation by revealing domain goals, constraints, and affordances (i.e., allowing the users to “see” what needs to be done and what options are available for doing it).
A work domain analysis is usually conducted by creating an abstraction hierarchy according to the principles outlined by Rasmussen (1986). A multilevel goal-means representation is generated, with abstract system purposes at the top and concrete physical equipment that provides the specific means for achieving these system goals at the bottom. In many instances, the levels of the model include functional purpose (a description of system purposes); abstract function (a description of first principles and priorities); generalized function (a description of processes); physical function (a description of equipment capabilities); and physical form (a description of physical characteristics, such as size, shape, color, and location).
Work domain analyses do not depend on a particular knowledge acquisition method. Any of the knowledge acquisition techniques covered in Chapter 6 can be used to inform a work domain analysis. In turn, the
results of the work domain analysis provide the foundation for further analyses to inform human-system integration.
There are a growing number of HSI approaches that are grounded in a work domain analysis. A prominent example is cognitive work analysis (Rasmussen, 1986; Rasmussen et al., 1994; Vicente, 1999) that uses work domain analysis as the foundation for deriving implications for system design and related aspects of human-system integration, including function allocation, display design, team and organization design, and knowledge and skill training requirements. Burns and Hajdukiewicz (2004) provide design principles and examples of creating novel visualizations and support systems based on a work domain analysis.
Applied cognitive work analysis provides a step-by-step approach for performing and linking the results of a work domain analysis to the development of visualizations and decision-aiding concepts (Elm et al., 2003). These include
using a functional abstraction network to capture domain characteristics that define the problem space confronting domain practitioners.
overlaying cognitive work requirements on the functional model as a way of identifying the cognitive demands/tasks/decisions that arise in the domain and require support.
identifying information/relationship requirements needed to support the cognitive work identified in the previous step.
specifying representation design requirements that define how the information/relationships should be represented to practitioner(s) to most effectively support the cognitive work.
developing presentation design concepts that provide physical embodiments of the representations specified in the previous step (e.g., rapid prototypes that embody the display concepts).
Each design step produces a design artifact that collectively forms a continuous design thread providing a traceable link from cognitive analysis to design.
Work-centered design (Eggleston, 2003; Eggleston et al., 2005) is another example of an HSI approach that relies on a work domain analysis. Key elements of work-centered design include (a) analysis and modeling of the demands of work, (b) design of displays/visualizations that reveal domain constraints and affordances, and (c) use of work-centered evaluations that probe the ability of the resultant design to support work across a representative range of work context and complexities.
The shared representation produced as output from a work domain analysis is typically a graphic representation of domain goals, means, and constraints. Figure 7-3 provides an example of a graphic work domain representation that was developed for a nuclear power plant design. The work domain representation specifies the primary goals of the plant (generate electricity and prevent radiation release), the major plant functions in support of those goals (Level 2 functions in the figure) and the plant processes available for performing the plant functions (Levels 3 and 4 in the figure). Level 4 specifies the major engineered control functions available for achieving plant goals. This is the level at which manual and automatic control actions can be specified to affect goal achievement.
While work domain analyses have often adopted Rasmussen’s abstraction hierarchy formalism, the results of a work domain analysis can take multiple forms. These include alternative network representations (e.g., Elm et al., 2003), prose descriptions of the characteristics of the work domain, and concept maps.
Uses of Methods
Work domain analyses complement more traditional task analysis approaches. Traditional task analyses model how tasks in a domain are performed or should be performed. Work domain analyses model the problem space in which reasoning and action can take place. The work domain representation provides the basis for deriving the information required to enable domain practitioners to understand and reason about the domain at different levels of abstraction, ranging from domain purposes (e.g., prevent radiation release) all the way down to the particular physical systems (e.g., pumps and valves) available for achieving the domain goals.
The output of a work domain analysis is used to inform further analyses that feed different elements of human-system integration. Table 7-1 provides a summary of the major elements of a cognitive work analysis that provide traceable links between the results of the work domain analysis and implications for system design, including function allocation decisions, team and organization design, design of physical and information systems including displays, personnel selection and training, development of procedures, specification of test cases to drive system evaluation, and conduct of human reliability analyses as part of risk-based analyses.
TABLE 7-1 Analytic Tools Involved in the Cognitive Work Analysis Methodology
Phases of Cognitive Work Analysis
Work domain analysis
Analyzes the purposes and physical context in which domain practitioners operate. This includes a description of domain goals, means available for achieving those goals, and constraints (e.g., physical constraints, sociopolitical constraints).
Control task analysis
Identifies what needs to be done in a work domain. This includes a description of the work situations that can arise and the work functions that need to be performed, independent of who (person or machine) will perform them or the detailed strategies to be used.
Analysis of strategies for making decisions and carrying out tasks, independent of who will carry them out.
Social organization and cooperation analysis
Focuses on who can carry out the work, how it can be distributed or shared, and how it can be coordinated. This includes allocation of work among individuals and/or machines, organization of individuals into teams and larger organizational units, and communication and coordination requirements.
Worker competencies analysis
Analysis of perceptual and cognitive requirements of workers (e.g., skills, knowledge, attitudes) to foster understanding and reduce workload.
Use of Work Domain Analysis in the Port Security Case Study
Work domain analysis has been an integral part of the port security HSI work described in Chapter 5. One recent application involved determining potential technology insertion points for cargo screening at seaports where containers move directly from ship to rail, without exiting through a truck gate. In order to evaluate this domain comprehensively, interviews were conducted with terminal operations managers, physical site maps were collected, and terminal operations walkthroughs were conducted. The information was synthesized into descriptions of current operations at each of the terminals and rail yards, with a focus on identifying common and contrasting operational practices, speed of operations, overall time requirements for ship servicing, dwell time of containers in storage stacks, labor and equipment requirements, potential radiation portal screening choke points, and issues related to the operational impact of screening at these locations. The findings were used to define screening concepts that would maximize threat detection while minimizing impact on commerce.
Other Example Applications
One of the strengths of work domain analysis methods is their ability to drive the design of novel visualizations tailored to the demands of the work (Burns and Hazdukiewicz, 2004). Successful applications range from process control (Roth et al., 2001; Jamieson and Vicente, 2001), to aircraft displays (Dinadis and Vicente, 1999), to medical device applications (Lin, Vicent, and Doyle, 2001), to military command and control (Martinez, Bennett, and Shattuck, 2001; Potter et al., 2002), to network management (Duez and Vicente, 2005; Burns et al., 2000), and to defense against cyber war (Gualtieri and Elm, 2002). In each case, the approach yielded novel decision support concepts that were fine-tuned to the cognitive work requirements of the domain and markedly different from traditional displays in the domain.
One example drawn from a process control application is a large wall-mounted group view display intended to enable power plant control room teams to maintain broad situation awareness of the status of the plant. The goal was to increase the ability of operators to quickly assess plant state and effectively control the plant in both normal and abnormal condition.
The content and organization of the group view display was based on a work domain analysis (see Figure 7-4). The group view display was organized around the major plant functions that need to be achieved to maintain safety and power generation goals, and the physical processes that support them. The objective was to enable operators to rapidly assess whether the major plant functions are being achieved and the state of active plant processes that are supporting those plant functions. In cases of plant disturbances, in which one or more of the plant goals are violated, a functional representation allows them to assess what alternative means are available for achieving the plant goals.
A formal evaluation study demonstrated that the functionally organized overview display was more effective and was preferred by operators over a more conventional overview display that utilized a physical plant mimic as the organizational scheme. Teams performed significantly better with the functionally organized overview display than the more conventional physical mimic display in identifying target events (24-percent improvement) and diagnosing plant disturbances (27-percent improvement) (Roth et al., 2001).
The results illustrate the value of work domain analysis in deriving the critical goals, means, and constraints in the domain that impact decision making and in generating novel displays that effectively communicate these factors to support individuals and teams.
Work domain analyses promote design of novel visualizations that enable practitioners to readily apprehend and assimilate domain information
required to support complex decisions (Burns and Hazdukiewicz, 2004). One recent example is a work-centered support system visualization that was developed to support dynamic mission replanning in a military airlift organization (Roth et al., 2006). A work domain analysis identified domain factors that enter into and complicate airlift mission planning decisions, including the need to match loads to currently available aircraft, obtain diplomatic clearance for landings in and flights over foreign nations, balance competing airlift demands, and conform to airfield and aircrew constraints. Although existing information systems included all the relevant data, operational personnel had to navigate across multiple tabular displays to extract and mentally collate the necessary information. The work domain analysis provided the basis for design of a novel timeline display that enables operational personnel to graphically “see” the relationships between mission plan elements and resource constraints (e.g., airfield operating hours, durations of diplomatic clearances, crew rest requirements) to detect and address violations. A formal evaluation comparing performance with the timeline display to performance with the legacy system established significant improvement in performance with the timeline display (Roth et al., 2006).
Contributions to System Design Phases
A work domain analysis is usually performed at several levels of detail, depending on the stage of system development and complexity of the system being analyzed. A work domain analysis is performed as a preliminary analysis to identify information needs, critical constraints, and information relationships that are necessary for successful action and problem management within the domain. As the design evolves, the work domain analysis can be deepened and used to inform display design, function identification and allocation decisions, team and organization design, as well as identification of knowledge and skills (e.g., accurate system mental models) that are needed to effectively support performance in the domain.
The application of work domain analysis throughout the HSI design cycle has been successfully illustrated by Neelam Nakar and her colleagues, who have been applying work domain analysis and cognitive work analysis methods to the design of a first-of-a-kind Australian AWACS-style air defense platform called the Airborne Early Warning and Control (Naikar and Sanderson, 1999, 2001; Naikar et al., 2003; Naikar and Saunders, 2003; Sanderson et al., 1999; Sanderson, 2003). Their work has demonstrated the usefulness of work domain analysis throughout the system design cycle, including:
Evaluation of alternative platform design proposals offered by different vendors.
Determination of the best crew composition for a new platform.
Definition of training and training simulator needs.
Assessment of risks associated with upgrading existing defense platforms.
Work domain analyses have been similarly successfully employed to provide early input into the HSI issues in a number of large-scale first-of-a-kind projects, including the design of a next-generation power plant (Roth et al., 2001); a next-generation U.S. navy battleship (Bisantz et al., 2003; Burns, Bisantz, and Roth, 2004), and a next-generation Canadian frigate (Burns, Bryant, and Chalmers, 2000).
Strengths, Limitations, and Gaps
A primary strength of work domain analysis is in emphasizing the importance of uncovering and representing domain characteristics and constraints that impact cognitive and collaborative work, as well as in guiding the design of systems that are fine-tuned to supporting the work demands and enabling domain practitioners to respond adaptively to a broad range of situations. It complements traditional sequential task analyses approaches by providing explicit shared representation of domain goals, characteristics, and constraints (Miller and Vicente, 1999; Bisantz et al., 2003).
A limitation of work domain analysis methods that is often pointed to is that it can be resource-intensive to exhaustively map the characteristics and constraints of a domain. However, as multiple projects have shown, it is not necessary to perform an exhaustive domain analysis to reap the benefits (e.g., Bisantz et al., 2003). A work domain analysis can be performed at different levels of detail, depending on the complexity of the system being analyzed and the phase of analysis. A preliminary, high-level work domain analysis can be performed early in the HSI process to identify information needs, critical constraints, and information relationships that are necessary for successful action and problem management in the domain. As the design evolves, the work domain analysis can be elaborated.
A related strength of work domain analysis methods is that it encourages explicit links between analysis and design via intermediate design artifacts. As the design evolves, these artifacts can be expanded and modified to provide a tracable link between domain demands, cognitive and performance requirements, and system features intended to provide the requisite support.
One of the current gaps that limit the impact of work domain analysis methods is the paucity of computational tools to facilitate analysis and serve as a core living repository of domain knowledge that could be drawn on throughout the system life cycle. While there has been some progress on
tool development, such as the work domain analysis workbench (Skilton, Cameron, and Sanderson, 1998) and the cognitive systems engineering tool for analysis tool, more comprehensive and robust tools are needed.
One of the most common issues that arise in complex system design is estimating whether the aggregate workload associated with the tasks assigned to system users will result in too much to do in the time available, leading to stress, unreliable performance, or, in some cases, system failure. Workload comes in different varieties and may be assessed from many different perspectives.
For tasks involving significant physical effort, physical workload is an ergonomic issue and in sustained task performance is usually measured in terms of oxygen consumption, or heart rate. Prediction of physical workload depends on having measurement results from other related activities and conditions and estimating the differences between the known results and the postulated activity. Guidelines are available to assess excessive physical workload.
Structurally, the human limbs and eyes can be directed to only one location at a time, and excessive workload can result from a requirement that they be directed to too many places for the time available or that they need to be in different places at the same time. Speech communication is similarly limited. Assessing this kind of structural interference requires estimates or measurements of the time required for the various activities required of the limbs, eyes, and voice in each task, laying them out in sequence, subject to temporal constraints, and evaluating the potential conflicts.
The most challenging evaluation is of mental workload. Humans can generally direct their attention to only one task or activity at a time. That is not to say that one cannot sometimes process, to some level of completeness, multiple streams of information, especially when they are coordinated or relate to the same task. There is a large literature on attention, attention management, and multitasking that is beyond the scope of this report (see, for example, Chaffin, Anderson, and Martin, 1999; Wickens and Hollands, 1999; and Charlton, 2002).
The distinctions among these types may become blurred. Thinking is often accompanied by visual exploration, and it is difficult to distinguish the structural constraint of where the eyes are looking from the mental load of reasoning about what is seen. Demanding physical effort may capture attention that could otherwise be directed to cognitive task performance.
Predicting mental workload has proved daunting, but there are some
modeling techniques that have been applied. Most depend on having the results of a detailed task analysis, requiring an understanding of the cognitive components of the task and estimates of the time that will be associated with each task element. McCracken and Aldrich (1984) defined the visual, auditory, cognitive, and perceptual-motor load associated with a collection of common elemental tasks, such as reading an instrument or operating a control. Then, after making corresponding estimates of the time required for each task element in context, they used task analysis results to bring together the elemental components into estimates of the aggregate loads as a function of time on each modality. This basic approach has also been used in a variety of modeling contexts, including network models and more detailed human performance simulations (Laughery and Corker, 1997).
When prediction is not possible or leads to uncertain results, it is necessary to undertake a study to estimate mental workload from actual measurements. There are fundamentally four kinds of measurements and analysis that have been used: (1) varying the task load corresponding to the range of expected task conditions (e.g., pace of input demand, such as air traffic load, or complexity of environment, such as urban versus rural road conditions) and evaluating the functional relation between task performance and task load; (2) introducing independent competing secondary tasks and measuring the quality of performance on the secondary task in the presence of the task under study; (3) asking the user to estimate perceived workload while performing the task or immediately afterward (i.e., subjective assessment, using tools such as the NASA TLX scales; Hart and Staveland, 1988); and (4) employing physiological measures, such as pupil diameter, eye-blink rate, evoked potential responses, or heart rate. There are numerous summary references that document these methods, such as Tsang and Wilson (1997) and Hancock and Desmond (2001).
Uses of Method
When individual tasks are time sensitive or when the system users are subjected to the demands of multitasking, excessive workload is one of the paramount issues that can degrade system performance. Whenever a new system is designed or revised, it is important to consider the impact of the design on user workload. Workload estimates are also needed in job design—the assembly of tasks into jobs. Workload is a key component in preparing estimates of needed manpower or, when there is a mandate to reduce staff, workload estimates are the most important consideration. Ultimately, workload is reflected in the personnel requirements forecast. It is an important area for coordination across the HSI domains.
The primary shared representations are graphs of workload as a function of time or task progress and PERT charts (a network diagram in which milestones are linked by tasks) or Gantt charts (bar charts that illustrate a project schedule). These shared representations illustrate the timelines of activities, showing where overlaps occur, with highlights showing phases in which the workload exceeds limits. For descriptions of these tools, see Modell (1996).
However, in most cases the output of studies assessing workload is expressed in an experiment report. Whenever possible the estimated workload should be compared with acceptable limits.
Contributions to System Design Phases
In a typical system design, consideration of workload begins with the initial task analysis and context of use assessment. In early stages, the estimates will be largely qualitative. The aspects of the design are identified that may be workload sensitive or where overload presents substantial task completion or safety risk. As the design matures, the workload estimates should become more quantitative, and confidence in the estimates will improve. When designs have reached the stage of completion in which a simulation of the task or of alternative task designs can be built, modeling studies or human-in-the-loop evaluations can be undertaken to estimate the workload of critical phases of the operation or critical elements of the system (see the section below on models and simulations). These studies will contribute to the manpower and personnel domains as well and should be coordinated with specialists in those areas. Measuring workload is also important during summative test and evaluation stages of a project.
Strengths, Limitations, and Gaps
The definition, measurement, and prediction of workload, particularly mental workload, has been on the human factors research agenda for more than 30 years. Measurement protocols and modeling approaches are available. It is much harder to define acceptable limits, because these are dependent on the measures used and there is no standardization of the measures, at least for mental workload. Using them requires the expertise of human factors professionals. All of the methods provide only approximate answers until the full system design is complete and the workload of using the real system can be evaluated.
Objective measures are usually to be preferred, but they require more effort to instrument and apply to simulated or real task performance. Subjective methods have been shown to be reliable if standardized question-
naires are used. Users can report only their perceptions and, under stressful conditions, perceived workload may be more important than objective workload requirements.
There is a need for more collaboration among the specialists of the manpower, personnel, and human factors domains to ensure that the studies that are undertaken meet the requirements of all these stakeholders. Suitable shared representations are not well developed. Workload models can produce PERT chart–like representations that are useful for detailed analysis of operational concepts, but the output of most workload studies is simply an experiment report. New visualizations are required that are grounded in data but that present it in a form that allows all stakeholders to understand not only what the recommendations are, but also how they are supported by the data.
The preceding sections of this chapter have emphasized design as conducted by professional designers and engineers. This section focuses on design as a hybrid activity (see, e.g., Muller, 2003) conducted by professionals and end-users together, as co-designers. Much of the background for these concepts was provided in the participatory analysis section of Chapter 6. We restrict the discussion here to design-related concepts within that more general framework.
The principal focus of participatory design has been twofold (Blomberg et al., 2003; Bødker et al., 2004; Greenbaum and Kyng, 1991; Kyng and Matthiassen, 1997; Muller, Haslwanter, and Dayton, 1997; Muller and Kuhn, 1993; Schuler and Namioka, 1993):
To present design options clearly and understandably to end-users and
To provide the means for end-users to make changes in those design options.
This overall philosophy means that end-users are more involved in design and development than is the case in conventional treatments, in which end-users tend to be consulted during requirements elicitation, and again during usability or acceptance testing. By contrast, participatory design typically involves iterative engagements with users as first-class participants at multiple, strategically chosen moments during the specification-design-evaluation processes. When appropriate, this approach supplements the knowledge of engineers and professional designers with the work domain
knowledge of the end-users themselves, for a better informed, more efficient development process that typically requires fewer iterations to achieve targeted levels of usability, user satisfaction, and user acceptance.
Participatory design work has focused on issues of theory, context, and practice (for a summary, see Levinger, 1998). In this report, we focus on six sets of practices that have been shown to provide sustained value in system development (for encyclopedic reviews of over 70 participatory practices, see Bødker et al., 2004; Muller, 2003; Muller, Haslwanter, and Dayton, 1997).
Methods and Shared Representations
The analysis phases of scenario-based methods are noted in the previous chapter (Carroll, 1995, 2000; Carroll, Rosson, and Carroll, 2002b, 2003). These activities continue in design. One of the strongest ways to describe a revised or new design is through a story of that design in use. Scenario-based design is based around such stories. Scenario-based design builds on the problem statement through the following steps:
A set of activity designs (literally, action-oriented scenarios of future use) are constructed and evaluated with end-users. The claims from the previous step (i.e., assertions of value to the end-users) can be used to structure the evaluation.
An information design is proposed, based on the approved activity designs. Each activity design becomes a reference model for the evaluation of each information design. The information design provides a more detailed perspective on the narrative of the activity design, and is itself a more refined scenario of future use. Again, the claims from the participatory analysis can be used to structure the evaluation.
A more detailed interaction design is developed, based on a refined and stabilized information design. Each interaction design is an even more refined and developed scenario of future use. The action designs remain the reference models against which the interaction design is evaluated—again with the potential aid of the claims from the participatory analysis.
In these ways, scenario-based design produces a structured series of narratives, each focused on resolving particular questions. The scenarios remain intelligible and accessible to the end-users, who are encouraged to critique and modify them as needed.
Another powerful way to tell a story about future use is through enactment of that scenario using tangible materials, such as prototypes of the envisioned technology. If the technology has been completed, then this approach becomes a matter of formative or summative usability evaluation (see Chapter 8). However, in participatory design, the prototype is often left strategically incomplete to encourage and even to require users to contribute their ideas directly to the evolving concept.
One of the most powerful forms of strategic incompleteness is to make a nonfunctional prototype out of low-technology materials (Bødker et al., 1987; Ehn and Kyng, 1991; Muller, 1992; Muller et al., 1995). This approach has several advantages. First, it is easy to produce, and that means that it is easy to revise or abandon (an extreme version of the concept of “throwaway prototype”). Second, it is easy to modify in place—a form of user-initiated design.1 Third, modification of the low-tech representation requires no specialized tools other than domain knowledge. Thus, a low-tech representation becomes another means for leveling the playing field, encouraging end-users to make egalitarian contributions of their knowledge to complement the knowledge of software and design professionals.
Bødker et al. (1987) provided early demonstrations of the value of low-tech mock-ups (“cardboard computers”) in critique and redesign of new technologies for newspaper print shops. Muller (1992) provided an evolutionary view of paper-and-pencil materials and associated working practices in the design of user interfaces. Lafrenière (1996) showed a more macroscopic approach involving user-initiated construction of storyboard scenarios through the use of strategically incomplete storyboard frames (see also Muller, 2001). An integration of several of these approaches became a more formal description of proven “bifocal tools” for participatory analysis and design (Muller et al., 1995).
Low-tech representations have the additional advantage of being a form of literal requirements document. That is, the constructed form of the representation is a first approximation of the intended final design of the user interface. In the course of working with the low-tech representation, users and systems professionals usually enact or review one or more
scenarios of use. The sequence of events in this scenario (often captured in the form of a video recording—e.g., Muller, 1992; Muller et al., 1995) is a first approximation of the user experience and of the user-experienced information design and information architecture that must also be built. In these ways, the simple paper-and-pencil (or cardboard) materials can become powerful engines for explicating and enhancing designs.
The strategy of acting out a use scenario has been another tool of participatory design. Using the theoretical foundation of Boal’s theatre of the oppressed (Boal, 1992), participatory designers have staged dramas to elicit discussion of working practices and technology alternatives. The principal method in information technology (e.g., Ehn and Kyng, 1991; Ehn and Sjögren, 1991) has been Boal’s forum theatre, in which the designers present a skit with an undesirable outcome and challenge the end-users to modify the script, the props (i.e., the technology), or the setting, and then to reenact the drama, until the outcome is better. A secondary method in information technology (e.g., Brandt and Grunnet, 2000) has been the practice of “frozen images” or tableaux, in which the actors in a drama are asked to stop (“freeze”) while the audience asks each actor what her or his character was trying to achieve, what obstacles she or he faced, and how the situation or circumstances should be improved.
As video technology has become a consumer product, users have also become authors of videos to show current work problems and proposed solutions (Björgvinsson and Hillgren, 2004; Buur et al., 2000; Mørch et al., 2004). An explicit tie-in to scenario-based methods was made by Iacucci and Kuutti (2002) in their work on “performing scenarios” (see also Buur and Bødker, 2000).
Ethnography has figured prominently in the literature on participatory design (e.g., Blomberg et al., 1993, 2003; Mogensen and Trigg, 1992; Suchman, 1987, 2002; Suchman and Trigg, 1991; Trigg, 2000). The specific methods used by ethnographers in design activities tend to invoke other methods, previously described in the section on participatory design. For broader discussions of ethnography, see Chapter 6.
Preceding sections have described the use of stories and scenarios, low-technology representations, and user-produced documentaries as methods
and materials for participatory analysis. These and other methods have been integrated in the generative workshops of Sanders and colleagues (Sanders, 2000). Generative workshops consist of methods from market research (e.g., focus groups to elicit users’ comments), ethnography (observation of users engaged in work), and participatory design (construction of anticipated or desired future objects through low-technology prototyping). The goal of this conjoint “say-do-make” approach is to triangulate on important user needs, working practices, and innovations.
Contributions to the System Design Process
Each participatory design method produces its own characteristic shared representation and contribution; several of these were reviewed in the preceding chapter on analysis. Table 7-2 provides a summary of contributions and shared representations.
TABLE 7-2 Summary of Contributions and Shared Representations in Participatory Design
Role in System Development Process
Shared Representation and Use
Design in use (actual use or future use)
Ongoing opportunity to revisit opportunity analysis and context of use
Layered design documents (activity design, information design, interaction design)
Stories, storyboards, narratives
Artifacts created during the design process
Consequences of designs for work practices
Workshops, especially generative workshops
Opportunity to revisit context of use and opportunity analysis
Consequences of designs for work practices
Artifacts created during the design process
Early marketing insights
In brief recapitulation, scenario-based methods may produce stories, storyboards, narratives, and use cases; the latter are particularly useful for systems engineering. These materials can become background or reference material for the more detailed work of designers and developers. Alternatively, a more detailed scenario can develop into use cases, which directly inform design and development on an event-by-event (or action-by-action) basis.
Low-technology representations provide first drafts of user interface designs and are suitable inputs to the work of professional designers; the information surrounding them is valuable to resolve questions that designers and implementers might have about why certain features are needed and for what purpose. In addition to the first draft approach, low-technology representations can become detailed design documents, ready for implementation into working hardware or software.
The theatrical methods are similar to the multimedia documentary methods in the preceding chapter. As with the narratives and explanations surrounding a low-technology representation, the additional information in a theatrical method may provide useful contextualization of design recommendations and implementation decisions.
The workshop methods are similar in outcome to the theatrical methods, with the difference that the workshop methods were designed by professional designers to be used by professional designers. Their outcomes are thus structured to be useful inputs to the next, more formalized design steps.
Strengths, Limitations, and Gaps
Strengths and weaknesses of participatory design are similar to those for participatory analysis, as discussed in Chapter 6. A principal strength of the participatory approaches is the collection and use of detailed, in-depth information from the users’ perspective. As discussed above, users have access to a different kind of knowledge from that of systems professionals, and the users’ knowledge can be very valuable for informing design with the realities of how the work gets done, as well as for defining new opportunities and understanding the context of use (Chapter 6). A second principal strength is the growing body of practices for combining the users’ knowledge with the knowledge of design and implementation professionals (and other professionals) through well-understood methodologies.
There are two principal weaknesses of the participatory approaches. The first is a matter of appearance. Participatory approaches involve knowledge holders who have historically been undervalued in systems develop-
ment, and therefore the participatory design may be required to justify this “unusual” approach to more traditional practitioners and management. Similarly, the strategic informality of the participatory approaches may present an appearance problem—i.e., the use of low-technology, narrative, and expressive media that are so necessary for full and effective communication across disciplinary boundaries.
The second principal weakness of the participatory approaches is that it is sometimes difficult to integrate their informal, open, “soft” outcomes with the kinds of precise knowledge that are typically required as inputs to downstream systems development activities. This problem is rapidly becoming a nonissue, through the integrative methodologies pioneered by Kensing and Madsen (1993), the integrations with formal methods proposed by Muller, Haslwanter, and Dayton (1997), and the development of a participatory information technology methodology (Bødker et al., 2004).
In the participatory analysis section of Chapter 6, we summarized the contextual inquiry process, including the three activities of contextual inquiry, interpretation, and affinity analysis, as well as the construction of the five models characterized, respectively, in flow, sequence, physical, cultural, and artifact terms. Contextual inquiry can lead in turn to contextual design (Holtzblatt, 2003; Holtzblatt et al., 2004), which includes the following activities:
Visioning and storyboarding: Develop new concepts and concretize them in the form of stories (“visions”). Iteratively refine these concepts via storyboards.
User environment design: Develop an abstract version of the structure and function clustering of the system’s components and operations independently of the user interface and implementation (Holtzblatt, 2003, p. 943).
It is interesting to note that the stories and storyboards are accessible to end-users, whereas the larger components of the visioning and user environment design activities are explicitly stated to avoid issues of user interfaces or user experiences. Thus, while contextual inquiry involved a major component of user participation in analysis, much of the work of contextual design focuses more on the product team and its professional staff, returning to the users for a more traditional usability evaluation (see Chapter 8).
Contextual design has been designed to be well integrated into a flow of work beginning with contextual inquiry and proceeding into development.
The shared representations of contextual design (see above) are structured and sized for immediate uptake by systems engineers and professional designers.
Contributions to the System Design Process
Contextual design has been developed for effective transfer of knowledge from designers to other systems professionals. The form of storyboarding used in contextual design is intended for rapid uptake (as in use cases), and the structure and function clustering is one of the principal outcomes of a requirements analysis, to assist other systems professionals in making choices in function allocation.
Contextual design is intended to produce formal requirements and specifications. The vision statements and descriptions of current or future end-user work environments are inputs to those more formal documents.
Strengths, Limitations, and Gaps
As noted in the preceding chapter, the contextual inquiry and design methods involve more research time and more meeting time than some less formal methods, such as participatory design. We proposed in that chapter that there is a straightforward trade-off between the need for informal and open methods that maximize the contributions of end-users (with their own unique knowledge) versus more formal and closed methods that maximize the subsequent uptake by the development team.
Physical ergonomics is concerned with human anatomical, anthropometric, physiological and biomechanical characteristics as they relate to physical activity.2 Complex and simple systems often require both cognitive and physical activities of the user or group of users. Clearly, it is best to design an ergonomically correct system in the early stages of system design (Kroemer, Kroemer, and Kroemer-Elbert, 2001), and ideally a formal
In August 2000, the International Ergonomics Association Council adopted an official definition of ergonomics (see http://www.iea.cc/browse.php?contID=what_is_ergonomics [accessed April 2007]).
institutionalized process for incorporating ergonomics into system design preexists. The steps in the overall ergonomic process are (1) organization of the process, (2) identifying the problem, (3) analyzing the problem, (4) developing a solution, (5) implementing the solution, and (6) evaluating the result (Kilbom and Petersson, 2006).
In ergonomics, the philosophy behind the methods is one of prevention and designing the system to minimize risk factors. Without such a proactive, planned approach, the human cost can range from mild discomfort to cumulative trauma or injury and possibly even death. It is therefore a serious matter to consider the human user’s physical limitations and capabilities when designing systems. The major ergonomic considerations for healthy, safe, and efficient workplaces and environments are worker task position (reach, grasp, lines of sight, work heights, etc.), posture (seated and standing), clearances (access, movement space, activity space), machine control (visibility, control dimensions), force application (allowable forces), workstation layout (display and control positions and relationships), and physical environment (lighting, noise, climate, vibration, radiation, chemical, psychosocial, spatial, etc.) (Wilson, 1998). Anthropometric (and other) data for ergonomic design in new system design can be found in several published military and civilian guidelines and standards. Human digital modeling is another excellent way to test design alternatives. In addition, controlled testing and laboratory experimentation (e.g., fitting and user testing) can be used to empirically optimize ergonomic design.
In physical ergonomics, concern for the user ranges from perceived discomfort to physical injury. Assessment methods can be used to identify prospective problems in existing systems or for evaluating alternatives in new systems. Using one class of physical ergonomics issues as an example, musculoskeletal injuries often begin with users experiencing discomfort (Hedge, 2005). Left untreated, these perceptions of discomfort can escalate into pain. Untreated pain can then result in musculoskeletal injury (e.g., tendonitis, tenosynovitis, carpal tunnel syndrome) (Hedge, 2005).
Finally, there has been an effort to automate the tools with which physical ergonomics is considered in the design process. Digital human models are ergonomic analysis and design tools that are intended to be used early in the product and system development process to improve the physical design of systems and workstations (see section on models and simulation).
Shared representations range from physiological tests, such as the measurement of systolic blood pressure, to subjective instruments, such as ratings of perceived exertions (Louhevaara et al., 1998). Physical ergonomics methods that focus on assessing discomfort center around self-report in-
struments. Such shared representations have the downside of subjectivity. In the assessment of posture, direct observation can be used, with shared representations taking the form of checklists and other data-acquisition and -reduction tools. Fatigue assessments, while attempting to be quantitative, do rely on subjective ratings, and thus shared representations take the form of the output of rating-based instruments. Finally, methods to assess physical risk also tend to rely on shared representations that are at least partially subjective—typically taking the form of checklists and rating scales. With respect to human digital modeling, an avatar or virtual human with specific population attributes is rendered as it dynamically performs tasks in a system. More simply, a dynamic simulation of the human-system interaction is rendered. More detail on the shared representations, including examples, follows in the context of methods.
Uses of Methods
Methods for Assessing Discomfort
In addition to discomfort serving as an early warning sign for injury, discomfort can in and of itself be costly in terms of affecting the quality or quantity of work performed (Hedge, 2005). Since discomfort is not directly assessed and must be perceived by the user, methods for assessment involve self-report instruments. One of the earliest methods to assess a user’s degree of musculoskeletal discomfort is a checklist instrument called PLIBEL (Hedge, 2005). This literature-derived instrument allows users to evaluate ergonomics hazards associated with five body regions (see Kemmlert, 1995). The assessment can be applied at the task or system level. In the context of system development within an HSI framework, PLIBEL can help identify specific bodily areas that require attention in design or redesign. For example, if excessive reaches or awkward postures are required by a newly designed jet cockpit “highway in the sky” display, PLIBEL will identify the physical regions of the body at risk.
Another group of discomfort instruments to consider in physical ergonomics assessment is that promoted by the National Institutes for Occupational Safety and Health (NIOSH) (Hedge, 2005). Self-report measures of discomfort are widely accepted by the agency (see Sauter et al., 2005). Most of these instruments share the characteristics of combining body maps with questions and, like PLIBEL, attempt to identify particular body regions at risk.
Additional methods for assessing discomfort include the Dutch Musculoskeletal Survey, the Cornell Musculoskeletal Discomfort Survey, and the Nordic Musculoskeletal Questionnaire (Hedge, 2005).
Methods for Assessing Posture
Workplace posture is a function of the interaction of many factors, including workstation design, equipment design, and methods (Keyserling, 1998). As indicated by Hedge (2005), there are various reasons why self-report instruments are less desirable than unobtrusive observations of, for example, posture. Posture in a sense is a surrogate for musculoskeletal functioning. In system development, users in mock-ups or users in existing systems can be evaluated in real time or through recordings to assess postural risk. The Quick Exposure Checklist involves both observer and user assessments. Its exposures scores are derived (in percentages), and actions ranging from “acceptable” to “investigate and change immediately” are recommended (Li and Buckle, 2005, 1999). The Quick Exposure Checklist can therefore be applied to assessing risks associated with system tasks when evaluating an existing system for redesign or when testing a prototype of a new system.
A widely used method, called rapid upper limb assessment, provides a rating of musculoskeletal loads (McAtamney and Corlett, 1993, 2005). These ratings relate to the posture, force, and movement required by tasks. After postures are selected, they are scored using scoring forms, body part diagrams, and tables. The scores are converted to actions ranging from “acceptable” to “immediate changes required.” For tasks that relate to additional body parts, the rapid entire body assessment method can be used. Additional methods include the strain index, the Ovako working posture analysis system, and the portable ergonomics observation method (Hedge, 2005).
Methods for Assessing Fatigue
The previously mentioned methods do not really address the measurement of work effort and fatigue. Methods that attempt to quantify effort and fatigue include the Borg Ratings of Perceived Exertion scale and the Muscle Fatigue Assessment method (Hedge, 2005).
The Borg ratings increase linearly with oxygen consumption, whereby a range of 6-20 was established for healthy, middle-aged people (Borg, 2005). The scale provides a measure of exertion intensity and thus provides quantitative data when evaluating a system or proposed system that requires physical user demands. One limitation is that while quantitative, the scale does rely on perceived exertion.
Strategies for reducing risk can be pursued after defining the level of effort required. The Muscle Fatigue Assessment method works best when applied to production tasks having less than 12-15 repetitions/minute with the same muscle groups and is ideal for team evaluations of a task (Rodgers,
2005). Once tasks are identified, effort intensity levels are determined for each body part. Effort durations and frequencies are determined and a rating system is used to prioritize changes. After strategies are developed for reducing the predicted risk, tasks are rerated to determine the impact of the proposed changes (Rodgers, 2005). Although the technique is partially quantitative, it does rely on subjective input (Rodgers, 2005).
Methods for Assessing Injury Risk
A predictive method for determining back injury risks was developed by NIOSH, known as the NIOSH lifting equation (Hedge, 2005). While the lifting equation does not consider the dynamics of lifting, the lumbar motion monitor (LMM) attempts to account for more realistic task situations. The LMM is a patented triaxial electrogoniometer that is attached to the spine via a hip and shoulder harness (Marras and Allread, 2005). Using potentiometers, the LMM measures the position of the spine relative to the pelvis. Software provides descriptive information about trunk kinematics and, more importantly, the system determines whether a particular worker is at risk, a task is risky, or whether an entire job comprised of several tasks is risky (Marras and Allread, 2005).
The occupational repetitive action (OCRA) methods can be used as the basis for redesign decisions and as an evaluation tool for new designs (Hedge, 2005). The OCRA index is used for the redesign or analysis of workstations and tasks (Occhipinti and Colombini, 2005). The OCRA checklist is generally used for the screening of workstations with repetitive tasks. Both methods assess repetitiveness, force, awkward postures and movements, and lack of adequate recovery periods (Occhipinti and Colombini, 2005). The risk index is the result of a ratio between actual technical actions and the recommended actions.
Human Digital Modeling
Digital human models are ergonomic analysis and design tools that are intended to be used early in the product and system development process to improve the physical design of systems and workstations (Chaffin, 2004, 2005). Software has been developed for human digital modeling (e.g., Jack, Safeworks, Ramsis, SAMMIE, UM 3DSSP, and SANTOS). Digital human models test the capabilities and limitations of humans without the expense and possible risks associated with physical mock-ups. For example, the particular reach limitations or line of sight capabilities of a vehicular driver could be determined (see the section on models and simulation for further discussion).
Contributions to System Design Phases
As noted earlier, failure to account for the user’s physical limitations and capabilities when designing systems can result in decreased performance and productivity, discomfort, cumulative trauma or injury, even death. Physical ergonomics is used to identify the physical regions of the body at risk and can help identify specific bodily areas that require attention in design or redesign. The methods can be applied at the task or system level.
To the extent that systems require physical activity, physical ergonomics methods are applicable for defining solutions—that is, to support identification and exploration of design alternatives. More specifically, semiautomated and automated systems will have human users or supervisory controllers operating in workplaces. Design of these workplaces is an iterative process, requiring assessment and support of human user physical needs (Chaffin, 1997).
Physical ergonomics methods can also be used in system evaluation and redesign to compare current and redesigned workstations and to justify funding to decision makers (McAtamney and Corlett, 2005). Physical ergonomics should also be considered in system cost-benefit analysis, since benefits can broadly impact performance (Hendrick, 1998).
Human digital modeling is used early in the product and system development process (construct invent/design) to evaluate proposed new system or workplace designs.
Strengths, Limitations, and Gaps
Attending to the user’s physical ergonomics needs improves the likelihood that the human-technology fit will promote better performance and well-being. A limitation of the physical ergonomics approach is that it focuses on “neck down” physiology. Complimentary attention should be placed on “neck up” or cognitive ergonomics. As Vink, Koningsveld, and Molenbroek (2006) suggest, not only can physical ergonomics combat negative issues, but it can also positively impact productivity and comfort. This positive impact is maximized when users and management actively participate in the process. Another historical limitation is that ergonomics has tended to focus on a single user or operator. Some have estimated performance improvement through ergonomics approaches to be in the 10-20 percent range (Hendrick and Kleiner, 2001). With the advent of macroergonomics or systems ergonomics, groups or teams of users can now be considered, as well as broader contextual factors leading to greater performance impact. This broader approach is consistent with the HSI framework. While much research has been conducted and much knowledge
has been generated, there is still much to learn. Fundamental issues, such as the actual causes of low back pain, remain. Another gap that is slowly being filled is better integration among physical, cognitive, and macroergonomic approaches in order to consider total human-system integration.
In terms of newer methods, such as human digital modeling, current digital human models are generally static or are not fully dynamic, integrated models. Human motion databases and models are helping to convert existing, static digital human models to dynamic models. Additional research is needed on human motion and biomechanics to help achieve dynamic, complex system modeling.
Current digital human simulation systems are beginning to allow a user to interact with a digital character with full and accurate biomechanics, a complete muscular system, and subject to the laws of physics (Abdel-Malek et al., 2006). Results have been achieved in the areas of dynamic motion prediction, the modeling of clothing, the modeling of muscle activation and loading, and the modeling of human performance measures.
Situation awareness has become an important ingredient in the analysis of human-system performance, and therefore HSI specialists should include its measurement in the tool box of methods to bring to each new system development (Endsley, Bolte, and Jones, 2003). Tenney and Pew (2006) is a recent review of the state of the art.
In everyday parlance, the term “situation awareness” means the up-to-the-minute cognizance or awareness required to move about, operate equipment, or maintain a system. The automobile driver requires situation awareness in order to safely operate a vehicle in a rapidly changing environment. The driver needs to understand the position of the vehicle in relation to the road and other traffic, the speed limit under which the vehicle is currently operating, the capabilities of the car itself and any special circumstances, such as weather conditions that may influence driver decision making. The driver uses senses—eyes and ears and perhaps nose and touch, to take in information and process it to build a conceptual model of the situation. The process of building up situation awareness is called situation assessment. Operational people in a variety of disciplines, ranging from military planners to hospital operating room staff, find the concept useful because for them it expresses an important but separable element of successful performance—being aware and current about the circumstances surrounding their current state of affairs.
Being involved in every aspect of the job leads to good situation
awareness but high workload. Introduction of automation reduces routine workload, but also reduces situation awareness, because it takes the user “out-of-the-loop.” Then, when a critical event occurs requiring an operator response, workload again becomes high and situation awareness is inadequate.
Automation also often introduces additional situation awareness requirements to manage the new systems. Human-system integration will be especially important to exploit the information management requirements that will accompany the next generation of automation.
Achieving situation awareness has therefore become a design criterion in addition to more traditional performance measures. However, measuring situation awareness requires more than an everyday understanding of the term. The most widely quoted definition of situation awareness was contributed by Endsley (1988), “Situation awareness is the perception of the elements in the environment within a volume of time and space, the comprehension of their meaning and the projection of their status in the near future” (p. 97). There is general agreement that the term refers to all the sensory, perceptual, and cognitive activity that prepares the user to make a decision, but it does not include the execution of a course of action once a decision is made.
Measuring Situation Awareness
A variety of methods are available that may be used to assess situation awareness. In general they fall into four categories:
Direct experimental measures.
Measures derived from scenario manipulation.
To apply direct experimental measures, the investigator or designer places a user in the context of the task under study, usually by means of exercising a scenario and simulating the operations under study. Then, at various points, the scenario is paused while the user is asked to answer a question about the status of different variables that are relevant to good situation awareness. The measure is the proportion of correct answers to the questions. The most well-known of these techniques is the Situation Awareness Global Assessment Technique (Endsley, 2000).
In order to derive a measure from scenario manipulation, the analyst specifically designs a scenario so that, at one or more points, the participants must make a decision that will reflect how successfully they have assessed the situation. For example, an aircrew is placed in an approach-
to-landing situation on the right-most of two parallel runways. Just as they are preparing to land, a second aircraft that was scheduled to land on the left runway suddenly veers over into the airspace appropriate to the right runway. The time it takes the right-runway aircrew to make a decision to go around is a measure of their situation awareness (Pew, 2000).
It is frequently difficult to arrange for collecting these kinds of objective data, and the investigator relies instead on just asking the user to assess their own situation awareness. There are formal scales for doing so, such as the Situation Awareness Rating Technique (Jones, 2000). Again, users are placed in the context of the task under study and then, either during or immediately after completing a trial, they are asked to rate their situation awareness on a predefined scale.
Think-aloud protocols are just what they sound like—users are asked to verbalize what they are thinking while they are working on a task. They are useful early in a system development process to obtain from users their interpretation of what aspects of a situation they are thinking about. They could be applied as soon as candidate stories or scenarios have been developed that reflect the way the system might work. They can help to define the information requirements and understanding required to accomplish the task.
Contributions to System Design Phases
As indicated, think-aloud protocols are useful early, during system development, when they can help elaborate the conceptual structure in which the task will take place. The other methods are most useful when prototype user-interface designs are being considered. Here the data from testing can support the evaluation of the quality of the designs for achieving situation awareness. Evaluation can also be an important part of summative usability testing because situation awareness is such an important part of the success of the application or mission. Data describing the results of situation awareness tests provide useful shared representation with other stakeholders, because it is a concept that, in its everyday meaning, can be widely understood as important to good performance.
Strengths, Limitations, and Gaps
In contrast to overall system performance measures, situation awareness measures provide indices of the way the system is influencing human performance per se, and therefore it is able to provide clues to how to improve the design from the perspective of the user. These measures are more diagnostic, in the sense that they can suggest what is missing from the design or how understanding is inadequate. For some kinds of mis-
sions, achieving good situation awareness is the most important aspect of the design.
The main weakness is that they require having an experimental or real system to be available for testing, which limits the assessment of situation awareness until later in the development process than would be desirable.
A significant gap is that one would like to be able to predict situation awareness before anything is built, but good predictive models to assess it are not available, although efforts are under way.
METHODS FOR MITIGATING FATIGUE
Systems manned by operators need to accommodate the inherent limitations imposed by human circadian rhythms and endurance capacities. This is evident in the establishment of hours-of-service regulations for various industries, particularly transportation, which limit the number of hours in specific periods of time that workers may stay on the job. The basis for these regulations is the need for sufficient time off to permit rest and recovery and, in particular, sufficient sleep. Numerous accident analyses implicate operator fatigue as a proximal cause, and some researchers suggest that certain times during the 24-hour period are a higher risk for accidents regardless of fatigue level (Folkard et al., 1999).
The methods available to the human factors practitioner for defining shift schedule impacts and mitigating them are relatively few and are all based on several factors. These include the basic circadian rhythm, the amount of sleep obtained prior to shift initiation, and the amount of sleep obtained during off-work periods during the shift assignment (e.g., 1 week). It is important to address these issues in the design phase of systems in order to preclude adverse scheduling that may not be covered by hours-of-service regulation and to build in fatigue mitigation elements when schedule impacts cannot be avoided (such as a 24/7 operation or military sustained operations).
Most of the schedule assessment and fatigue mitigation methods come from the transportation sector (Sanquist and McCallum, 2004). The methods with the most general applicability to system design and operation include
sleep environment planning and design.
Each of these methods (sometimes referred to as fatigue countermeasures or alertness management) has been shown to have a beneficial impact on reducing or avoiding fatigue in the workplace. Given the potentially lethal impact of fatigue on the job, application of these principles and methods during system design is warranted. The principal risk reduction associated with application of these methods is that of operations, which will lead to excessive fatigue and corresponding degradations in human performance. The following sections briefly describe each method or countermeasure in terms of applicability in a system requirements and design phase of development.
Uses of Methods
Alertness models have been used by researchers for a number of years to predict the likely fatigue level that would result from various shift schedules and the corresponding opportunities for sleep (or lack thereof). Most are based on several key parameters, such as a circadian rhythm component, time of day, preceding amounts of sleep, and availability of a recovery sleep period (Dawson and Fletcher, 2001; Folkard et al., 1999). The models are encoded in specialized software packages that generally require some domain expertise to operate. The outputs consist of fatigue levels before, during, and after a shift, and these values can be used as a guide to schedule construction and assignment. For designers without access to the specific modeling tools, which change fairly rapidly since they are principally a research product, simple heuristics for scheduling and rotation are reasonable substitutes. These include such rules of thumb as providing sufficient time off to permit an 8-hour sleep period, which in practice means at least 10 hours. Similarly, start times prior to 7 am are more likely to be associated with fatigue than later start times.
Trip planning is a method employed in variable ways by transport workers and is highly dependent on the transportation vector, such as air, road, or rail. Both schedulers and individual workers need to plan their trips to provide off-duty breaks of sufficient duration to obtain enough sleep. This applies in particular to workers who need to travel long distances to their work location, such as airline pilots commuting cross-country to start a long-distance flight assignment.
Strategic napping is a fatigue countermeasure that involves short sleep periods of 20-45 minutes duration to provide for recovery during a long-duration shift (Dinges at al., 1991). A number of studies indicate that strategic napping is associated with better job performance following the nap (e.g., landing an airplane). As a consequence, certain international air carriers have sanctioned napping during long-distance flights by one of the crew members, and airplane manufacturers are beginning to build long-haul
aircraft with sleeping quarters for crew members. Similarly, rail carriers are beginning to provide napping rooms in their crew turnaround locations.
Sleep environment planning and design (Zarcone, 2000) have entered into the schedule design process for certain air carriers and have also influenced how the carrier-preferred hotels design their facilities. They are now beginning to reserve certain blocks of rooms and floors for day sleepers and to implement other measures, such as blackout shades and additional soundproofing.
Alertness design methods have a common shared representation in specifying the impact on operator sleep. Alertness models predict how much sleep an operator will get on a certain shift schedule or the likely fatigue level resulting from lack of sleep. Trip planning has a similar output—given a particular duty schedule. What are the opportunities for sleep and how much will be obtained?
Figure 7-5 illustrates a shared representation that is common among alertness designers: a graphic plotting time of day against alertness level. This is a common method of defining time periods that are likely to show fatigue effects that might manifest as accidents.
Contributions to System Design Phases
Alertness design methods contribute mainly to the architecting phase of the system development process. In any manned system, the elements of staffing, scheduling, and recovery are addressed during the more specific design phases, involving determining how many people will work, doing what tasks, and the nature of the operations (e.g., 24 hour, sustained, or other). Evaluating and designing for alertness management during the architecting phase can prevent unanticipated attrition due to excessive fatigue or more dire consequences, such as accidents.
Strengths, Limitations, and Gaps
The strength of alertness design methods is that they address a problem that is reasonably easy to solve if several key parameters are considered: number of staff, duty periods, recovery time necessary, etc. In their simplest implementation, the methods consist of heuristics designed to assess the adequacy of rest periods. More involved methods, such as nap period design or sleep environment planning, can enhance alertness during long assignments or unusual shift rotations.
A primary limitation of alertness design methods is that they are relatively unknown outside the fatigue research and transportation community. A cultural tradition of work excess in some industries limits the willingness of system designers to consider the basic needs of sleep. A further limitation is that fatigue mitigation and alertness management/design are not often considered in the traditional suite of human factors methods, but they are an important part of the larger HSI domains. This is beginning to change, but the field tends to be dominated by the methods with a more traditional task-oriented focus.
A large gap in alertness design methods is the availability of robust guidelines or processes for addressing fatigue issues in design. Whereas task analysis has many variants and practitioners can learn to apply it rather quickly, alertness design does require some knowledge of biological rhythms and performance effects, and various subtleties of sleep debt accrual and mitigation. The modeling tools developed to date require substantial knowledge in the area for proper application and interpretation, and a simple set of alertness design guidelines, applicable across a wide range of work activities, has yet to be developed.
Scenarios are stories that describe how activities and events unfold over time. They can depict either how the activities currently happen or how the activities might be imagined to happen in the future. Scenarios can be produced at a variety of levels of detail, abstraction, and scale. For example, they can tell a story about how a particular culture might change (big and broad scenario-based planning) or suggest how a new technology can influence a particular process (big and focused). Scenarios can also be produced to suggest how a particular user might interact at the button-push level on a handheld device (small and detailed).
Scenarios can be used by different system stakeholders in different ways. Some design representation methods emphasize the delivery of a formal, integrated concept, embodied in the form of a highly produced and convincing story. By contrast, some participatory analysis methods focus on storytelling to create a level playing field, so as to put users and technologists on a common footing, making scenarios useful as shared representations. In more formal terms, these techniques have been described and analyzed as scenario-based methods (Carroll, 1995, 2000; Rosson and Carroll, 2002, 2003).
Uses of Methods
In using scenarios, researchers play the character’s (or persona’s) actions forward and are able to share them among themselves and with potential users. The scenarios enable all the participants to critique the assumptions and implications that are made visible. For example, end-users have the potential to see how their work would change in the future. Alternatively, systems professionals may be able to see the implications of changed working practices for new technology opportunities. In this way, scenarios offer participants the opportunity to rewrite or co-construct them until the activities and processes they represent meet all the stakeholders’ needs.
Designers create a variety of different types of scenarios depending on the design challenge and where they are in the development life cycle. Scenarios are created to address the needs of specific stakeholders, their environments, technologies, or according to the specific problem that needs to be addressed.
Another consideration in using scenarios is what medium should be used to generate and deliver the scenario. Scenarios can be represented purely verbally, with text and images (in which the images are similar to storyboards used in film and animation), or as fully featured video (see the
sections on low-technology representations and multimedia documentaries below). Usually the more informal the medium that is used for the scenarios, the more it invites the viewers to actively participate in modifying or changing them.
Scenarios are typically elicited through a sequence of steps:
Root concept: Beginning with a brief statement of the goal of the project, analysts elaborate a basic rationale for the project, list the crucial stakeholders, and provisionally state some high-level assumptions.
Ethnographic inquiry: Field observations, interviews, and artifact analyses are conducted as needed to understand users’ needs and opportunities.
Interpretation: Field observations are then organized and interpreted through a series of affinity analyses (using methods from contextual inquiry).
Problem scenarios and claims: The tentative requirements from the preceding steps are then organized into scenarios of future action that will need to be supported and specific claims (which capture essential or emergent themes and topics) about how the envisioned system will support those scenarios.
Scenario-based work often involves a preliminary coding scheme for the knowledge that is elicited. For example, Rosson and Carroll (2002) note that “Each scenario depicts actors, goals, supporting tools and other artifacts, and a sequence of thoughts, actions, and events, through which goals are achieved, transformed, obstructed, and/or abandoned.” These kinds of scenario-based codings had previously been shown to be useful in creating object-oriented designs (Rosson and Carroll, 1996). Under some circumstances, it is useful to provide materials for scenario construction that embody a particular “vocabulary” of events or actions, so that the resulting design has already been precoded into a target set of system components (Muller et al., 1995).
There are a variety of shared representations that can result from building scenarios. They are, in a conventional time-ordering:
Individual (episodic, vertical) stories from one informant at a time, describing that informant’s experience.
Composite stories that are constructed from the individuals’ stories; these stories are usually supposed to be accurate summaries and compilations of the individuals’ stories and are usually supposed to have a similar “epistemic status” of accuracy and fidelity.
Future-vision stories, which are of necessity fictitious. Typically these early-stage thinking-about-the-future stories tend to be connotative rather than denotative, evocative rather than definitive, open rather than closed, plural rather than singular, and are often deliberately incomplete.
Finally, there is the relatively formal requirements-related story, which provides a relatively detailed account of future use for designers and developers. These stories tend to be nearly the opposite of the future-vision stories: denotative, definitive, closed, singular (“we will build this”), and exhaustively complete.
Carroll suggests that scenarios are paradoxically concrete, but rough, tangible, and flexible, encouraging what-if thinking among all parties and pushing designers beyond the expected solutions (Carroll, 2000; see also Erickson, 1995, on the strategic use of roughness in representations). As a shared representation, scenarios permit the articulation of possibilities without undermining innovation, enabling the design team to focus on the systems in the given context. Scenarios can be simultaneously implicit and explicit.
Contributions to System Design Phases
Scenarios can be used in the very early stages of the design process to help explain possible system behavior. They are easy to share because they are in natural language—or in some other conventional representation, such as cartoon frames—and almost anyone can participate in their production (Carroll, 2000). Later on in the process, scenarios enable the collaborative team to begin to immediately synthesize findings from research into situated ideas for the future. They can also be used as an evaluative tool with system users—in other words, the stories and their associated pictures can be shown to end users and other stakeholders before anything is committed to code. According to Rosson and Carroll (2002), “The basic argument behind scenario-based methods is that descriptions of people using technology are essential in discussing and analyzing how the technology is (or could be) used to reshape their activities. A secondary advantage is that scenario descriptions can be created before a system is built and its impacts felt” (Rosson, Maass, and Kellogg, 1989; Weidenhaupt et al., 1998). Finally, later on in the process, scenarios can be translated into use cases or essential use cases and associated with system requirements (Preece, 2002). At this stage, they can also provide the specific task details to support human-in-the-loop simulation experiments and usability evaluation.
Strengths, Limitations, and Gaps
A primary strength of scenarios is that they are easy to make and revise—they are fast and cheap. Nothing has to be coded, and they can even describe system behavior in words alone. When they include storyboards or pictures of the activity, they can help end-users and other stakeholders envision, react, and help shape possible future systems. The limitation of scenarios is that they can never be exhaustive—not every story can be told—so some functionality that could be critical to the design of the system could remain overlooked. Although several treatments have begun to analyze the space of stories (e.g., Carroll, 2000; Muller, 2007; Rosson and Carroll, 2002, 2003), a more extensive cataloging of scenario types, uses, and limitations could be developed to overcome these limitations.
The personas (or “archetypes”) approach has become a very popular technique in applied design activities. Using personas (especially role or segmentation-based archetypes) as actors in scenarios (see following) helps situate the technology in real-life settings. Ever since Alan Cooper described personas in the book The Inmates Are Running the Asylum, there has been a great deal of effort spent on understanding why and how personas are useful in the design process and in finding ways to extend the concept beyond their role in bringing scenarios to life (Cooper, 1999, 2004; Pruitt and Grudin, 2003). Personas are useful because they build on people’s expectations and natural abilities to anticipate and infer other people’s behavior from what is known about that person. Pruitt and Grudin suggest that good personas can be generative and help designers “play forward” or project what they know about a character into new situations. Pruitt, Grudin, and others have extended the use of personas throughout the design process—from prioritizing features, to usability and market research to QA testing.
While scenarios are concise narrative descriptions, personas are descriptions of one or more people who are (or will be) using a product to achieve specific goals. A persona is specifically designed to become a shared representation of the users of the target system. They are designed to mediate communication about the potential users of the system. A persona is a model or description of a person, ideally based on observed behavior gathered by the up-front analysis team and defined through an intuitive and systematic synthesis of the analysis data. Good personas include descrip-
tions of the character’s activities, goals, skills with and without technology, influence on the business, attitudes, and communication strengths and weaknesses (Cooper, 1999; Pruitt and Grudin, 2003; Cooper and Reimann, 2004; Pruitt and Adlin, 2006). Personas become the actors in the stories of current or future use and interaction. In the commercial world, a variety of user types may be modeled. For example, an on-line banking design scenario might include both novice and expert users at different life stages (newlywed, college student, empty nester, etc.). In the military it might include representative enlisted personnel who might be assigned the task and an officer who would use the application in a supervisory role. Although there are significant differences in practice (Adlin et al., 2006), the persona descriptions can also focus more on roles, experience levels, and motivations for the interaction.
Teams are encouraged to make conceptual tests of their design decisions against the persona who represents the users (e.g., “What would Kim think about that feature?”); note, however, that the issue of representativeness is somewhat controversial (Muller et al., 2001). Task analyses can be based, in part or in whole, on the persona description (Redish and Wixon, 2003). Like a method actor who immerses himself or herself in a character, a persona is usually described in considerable personal detail—i.e., with a name, a photograph, a job description, and a variety of personal data that can include pets, favorite foods, make and model of car, and so on in order to support appropriate inference-making on the part of the persona user.
Contributions to the System Design Process
There is little formal research to support the creation of personas as a part of the systems design process. Less formal reports from practitioners are overwhelmingly positive (e.g., Cooper, 2004; Pruitt and Adlin, 2006; Adlin et al., 2006). The principal contribution is within the design or development team—an effective persona can help a team to focus on the experiences and needs of their users, providing a valuable counter balance to the more traditional concerns for systems performance and efficiency. These practitioners have successfully extended their use beyond scenarios all the way up to executive product strategy meetings (Pruitt and Grudin, 2003).
Strengths, Limitations, and Gaps
Personas can play a powerful role in bringing user concerns to the forefront of the development process. When carefully and systematically constructed, they can be an effective shared representation for the development team. A persona is, ideally, created on the basis of information about the population of real users. However, the basis for selecting the relevant
attributes of a persona is not agreed on (e.g., Adlin et al., 2006; Grudin and Pruitt, 2002; Muller et al., 2001). A persona might be based on ethnographic research, but most ethnographic accounts are about individuals rather than about group parameters or generalized characteristics, and thus they are too specific to help with the construction of a representation that faithfully represents the relevant attributes of all users. Data from marketing research may be used to select the statistical characteristics of a persona (as suggested by Grudin and Pruitt, 2002, and as partially done by Sinha, 2003), but there remains a wide range of personal characteristics in the persona description whose source is unclear (Adlin et al., 2006).
In a systematic treatment, Pruitt and Adlin (2006) advocate the creation of a “persona-weighted feature matrix” to assist a team in the systematic consideration of multiple personas. These multiple personas might represent users with different responsibilities or even different market segments (Adlin et al., 2006). The feature matrix can then be used to rate the importance (or perhaps market share) of each type of user, and it can be used further for a high-level quantification of the impact of each feature decision on each class of users and thus on the likely overall success of the product.
Prototyping is a method that can be used at any time during the design process. Prototyping helps teams answer questions, shape, and define the attributes of a desired future state (Schrage, 1996) The word “prototype” refers to a number of different types of things that can be made to express, discuss, critique, or refine a concept or system or product or plan (Beaudouin-Lafon and Mackay, 2003; Tscheligi et al., 1995). What unites the diverse meanings and types of prototypes is that any prototype is a temporary substitute for the real thing that eventually will be (or might be) implemented or constructed. Prototypes are made in different ways, of different materials, by different people, for different purposes (Houde and Hill, 1997). A pen held to one’s ear as a stand-in for a cell phone in a design session and the beta release of an application can both be considered prototypes. In the design process, prototypes help teams make the transition from an abstraction of what might be to a concrete notion of what something might be like.
Prototypes are used to reason and communicate—to persuade and argue what ought to be among collaborators (Houde and Hill, 1997; Rith and Dubberly, 2005). Prototyping is a process that brings a desired future to life or makes the design tangible (Wulff, Evenson, and Rheinfrank, 1990;
Coughlan and Prokopoff, 2004), but the role that a prototype plays in facilitating team activities is equally important. As Suchman (2004) notes, a prototype often serves as a means of enactment (i.e., for demonstrating and persuading) as well as the more conventional means of representation—a dynamic dance between the invention of needs and the technologies that support them. A prototype is intended to be a stand-in for the collaborating team’s ideas—not just as a version of an eventual or target solution (Boland and Collopy, 2004). Schrage warns that organizations may have a specification-driven culture that may prevent them from innovatively prototyping (1996). In these types of environments, the potential to enact the best possible futures may be stifled. In contrast, in organizations that successfully mediate meaning through prototypes, value is created, communicated, and shared (Schrage, 1994).
Uses of Methods
Prototypes can represent a number of dimensions in the system design. A horizontal prototype may be a reflection or enactment of all activities the system is intended to support at a very high level; a vertical prototype may address a subactivity in the design in complete detail in order to understand the implications of a particular implementation without the cost of prototyping the entire system.
Some prototypes are intended to be thrown away almost immediately, while others are more evolutionary—that is, they are designed to be continually updated throughout the design process. Architectural prototypes are produced in order to provide a representation of the performance and feasibility of a particular attribute of the supporting technology; while a requirements prototype may reflect what the system needs to do to support the activities of users without any implication of technology that will be used to implement a solution. In some situations, wireframes or purely textual prototypes are used to elicit feedback from potential users of a system, while in other situations the representation may be purely visual and a reflection of the final form of an interface (Mannio and Nikula, 2001). The level of finish or degree of roughness of a prototype can also be a consideration (Erickson, 1995). A prototype can be as simple as a few lines on a napkin or as finished as a beta release.
Figure 7-6 shows some examples of different types of prototypes.
The shared representations produced as output from prototyping may take a variety of forms (Beaudouin-Lafon and Mackay, 2003; Houde and Hill, 1997). What makes the activity of prototyping and the prototypes that
are produced so powerful as shared representations is that they are tangible and can easily be shared and foster communication among team members, stakeholders, and end-users.
Prototypes function to make explicit an aspect of form, fit, or functionality (Boland and Collopy, 2004). Form is the overall structure of the organization, environment, technology, or process. Form can shape interaction and set expectations. Fit describes the resonance (or lack) of the current embodiment to the overall endeavor’s objectives. Fit is often subjective but deeply meaningful on levels that are often beyond expression (Gladwell, 2005). Functionality describes whether the design works—that is, whether it is effective, appropriate to human use, and emotionally sustainable; has the potential to be taken up by the organization(s) that it responds to; and is situated in its context (Boland and Collopy, 2004).
Contributions to System Design Phases
Early in the life cycle, a prototype may simply be a placeholder for a real object or system: people use the prototype to show the work or actions that would take place around it (e.g., Buur et al., 2000). Later in the life cycle, a prototype may take the form of a nonfunctional description in concrete materials, such as a physical model of a device (e.g., Bødker et al., 1987, 1988) or a paper-and-pencil mock-up of a user interface (e.g., Ehn and Kyng, 1991; Muller, 2001; Muller et al., 1995); in some cases, the prototype is designed to be modifiable only by experts while in other cases the prototype is designed to be modifiable by anyone, including actual or potential users of the eventual product or system (e.g., Ehn and Kyng, 1991; Muller, 2001; Muller et al., 1994, 1995; Sanders, 2000). Still later in the life cycle, a prototype may be a faithful paper-and-pencil copy of a designed system, used for early user evaluations while the real system is being built (e.g., Snyder, 2003). Houde and Hill (1997) recommend that an integrated prototype be based on the construction of as many as three different types of prototypes (based on the role or function of the target system in people’s work, the look and feel of the system, and the implementation technology).
At later phases of the life cycle, prototypes may contain varying levels of functionality. In some cases, the functionality may be complete, but the implementation technology may provide flexibility (to try out new or alternative ideas) in preference to performance. In other cases, the functionality may provide surface fidelity, but with fictitious back-end architectures, data, and communications. In the wizard of Oz style of prototype, the back-end is simulated by a person representing the behavior of the computer. Selecting the appropriate form of prototype depends on the development or communication problem to be solved (or both); see Beaudouin-Lafon and Mackay (2003) and Houde and Hill (1997) for details. At this level, prototypes can support experimentation with alternative designs or formative usability evaluation.
Strengths, Limitations, and Gaps
A primary strength of prototyping and the prototypes that result is the cohesion for the team. According to Kelly (2001), “Good prototypes don’t just communicate—they persuade.” When discussing ideas or determining direction, having a prototype with which to negotiate makes the process more effective, fosters innovation, and usually reduces development costs (Kelley, 2001).
The greatest feature of prototyping is also its biggest foible. Because
prototypes are so real, they make the experience of the product, application, or service so tangible that they can influence teams to fix too quickly on a potential solution. When that happens, people are usually taken in by the level of finish in the prototype and believe that all the qualities and features are set—rather than being open to change. In practice, it is very important to match the level of finish to the stage in the process to avoid early closure on an incomplete solution (Erickson, 1995).
The most frequent use of prototypes has occurred in hardware and software development. However, prototypes have also been used to explore organizational outcomes, and—significantly—to critique and redesign the technologies that might lead to different organizational outcomes. The best-known example is the UTOPIA research project, which dealt with new technologies and their implications for changes in working relations and power balances among two groups of skilled workers (e.g., Ehn and Kyng, 1991; see also Bødker et al., 1987, 1988). Extensions of these methods could be used to explore the interactions of new technologies and new working practices in a variety of home, commercial, and military settings.
Aside from the relatively informal demonstrations of Ehn and Kyng (1991), there is work to be done to understand how physical artifacts—the nonhuman components of the system—interact with the prototypes of the people side of the system. Is there a classification scheme to be developed to include verbal or descriptive concepts and theories, to interactive role-playing, to computational models and simulation? Clearly new methods for visualizing interactions and activities must be developed.
Prototyping organizations (as well as technologies) will help produce and maintain better organizations—because troubling interactions can become visible and refinements can be made before the design is rolled out, saving time, effort, and misunderstanding. By having participants (preferably the intended end-users) contribute to design—and by nature some ownership—the likelihood of adoption and enactment of the goals of the design will be increased (Muller, 1992; Muller et al., 1994). Once the design is implemented, an organization will be better maintained because the prototyping can facilitate (1) changes to accommodate organizational strategy changes and (2) and an ongoing capability for what-if scenario testing for unanticipated outcomes.
Prototyping training systems should not only help produce better trainees early on, but also, as noted above, can enable better system designs that make fewer demands for intensive training later on. Prototyping training also provides an opportunity for early feedback to the systems as well as organizational designers from the target users, providing insightful opportunities for improvement to the developers, as well as potential for early buy-in by the end-users.
MODELS AND SIMULATIONS
Modeling and simulation have provided important methods and tools to support the system engineering process since the days of analog computers. Models and simulations represent a more formal step in human-system design. They can reduce the time and data gathering required for functional evaluation by screening alternatives and identifying the critical parameter ranges to test. They can also be used for decision making about specifications or the most promising design alternatives. Today the capabilities of computers to support virtual environments, multi-person video games, complex systems and their subsystems, and even human thinking processes makes the potential range of application almost limitless.
The term “computer simulation,” or often just “simulation,” implies using a computer to mimic the behavior of some physical or conceptual system or environment. It can be used to make concrete the eventual real effects of alternative conditions and courses of action, or it can be used to support training. The term “model” is widely used for everything from fashion design mannequins and physical mock-ups to flow charts and block diagram abstractions. Because simulations are, by definition, abstractions of the real thing, they make use of models. With respect to human-system integration, the kinds of simulations and models of interest are quantitative, usually implemented on a computer, and represent one or more aspects of the characteristics, performance, or behavior of a system, a human, or a human-machine system combination. A simulation of a system also implies a representation of the environment in which it operates. There are many ways to express such models, ranging from closed form mathematical equations to high-fidelity human-system computer simulations.
Types and Uses of Models and Simulations
The Link Trainer is perhaps one of the earliest human-in-the-loop computer simulations. It was an approximate representation of the equations of motion of an airplane and used real aircraft instruments and controls in the cockpit mock-up so that a human could practice the skills of instrument flying. Later, when simulators began to represent the pilot’s visual field outside the aircraft, a small-scale physical mock-up of a section of terrain was created and a television camera “flew” over this terrain board to project an image of what the pilot would see. Today very sophisticated human-in-the-loop simulation continues to play an important role, both
in operator training, for everything from aircraft operation to physicians practicing medical procedures, and in system development, to evaluate the performance resulting from new technology, concepts of operation, or procedures. The military services, NASA, and the aerospace industry have used human-in-the-loop or mission simulation in research and during system development very successfully over the past 20 years.
For example, NASA has used simulation in its role in research to support the continued improvement and the reduction of human error in the National Aerospace System. They have employed everything from single crew member part-task simulations to full-mission representations of the coordinated behavior of commercial aircraft crews and air traffic controllers in air operations. From 1986 to 2005 the rate of major commercial aircraft accidents per million miles flown in the United States was reduced from 0.401 to 0.103, or 75 percent, while the volume of traffic has increased nearly 100 percent (National Transportation Safety Board, 2006). Similarly, the U.S. Air Force has demonstrated the training value of full-mission human-in-the-loop simulation of air operations involving aircrew and forward air controllers (Schreiber and Bennett, 2006; Schreiber, Bennett, and Gehr, 2006).
Network Models of Human-System Performance
Simulations are usually associated with the representation of systems or subsystems, but there is now a large body of literature on simulation to represent the performance of a person-machine system. Typically, the model is built on the basis of a detailed task analysis, and each subtask is represented as a node in a network of nodes describing the completion of a higher level task. Each node represents, as statistical distributions, the time to complete the subtask and the probability that it will be completed successfully. Tasks can be aggregated into still higher levels of activities, goals, or missions. One can represent contingent branching structures among nodes, and the resulting models can become quite complex. Outcome performance measures are averaged from multiple (often 100-300) Monte Carlo executions of the model, each calculating the aggregate performance time and success probability of the activity or mission. The programming language, Microsaint, is an example of a language, specifically designed and widely used to support this kind of simulation. The most well-known examples of this class of models are the IMPRINT series of models used by the U.S. Army to predict the performance of military systems (Booher and Minninger, 2003; Archer, Headley, and Allender, 2003). IMPRINT has been used to create significant redesigns in many systems, improving performance, saving millions of dollars, and reducing the risk of fielding systems not fit for their purpose.
There is also a long history of the use of models and simulations in psychology to represent aspects of human behavior or performance (or both). Psychologists use them to summarize what they know and to support theories. Some of these models have been shown to be useful for system design to estimate and predict performance or to derive performance measures indicative of human-system performance.
Signal Detection Theory
One such mathematical model is signal detection theory, which was originally developed to quantify the detection of signals in noisy radar returns (Peterson et al., 1954). It is applicable to a wide range of human-system decision problems, including medical diagnosis, weather forecasting, prediction of violent behavior, and air traffic control, and it has been shown to be a robust method for modeling these types of problems (Swets et al., 2000). Signal detection theory has been found to be useful because it provides separate measures of the sensitivity of the human-system combination to discriminate signal from noise distributions on one hand and the decision criterion (the location of the threshold at which people or machines respond with a signal-present/signal-absent decision) on the other. The principal value of applying signal detection theory is to develop metrics for human-system performance and to evaluate design trade-offs between detector sensitivity, base rates of the signals of interest, and overall predictive value of the system output. The method is best employed to model effectiveness of discrete decision processes supported by automated systems. It serves to reduce the risk of picking the wrong operating point for a decision process, resulting in too many false alarms or a nonoptimal number of successful detections.
Models Derived from Human Cognitive Operations
A second, quite different approach is GOMS (Card, Moran, and Newell, 1983). GOMS models represent, for a given task, the user’s Goals, Operators (a keystroke, memory retrieval, or mouse move), Methods (to reach a goal, such as using keystrokes or a menu to open a file), and Selection rules (to choose which method to use). These models can be applied as soon as there is an explicit design for a user interface, and they have been used to predict response times, learning times, workload, and to provide a measure of interface consistency and complexity (i.e., similar tasks should use similar methods and operators). These models are now being more widely applied, and there are tools available to support their use (Kieras, 1998; Nichols and Ritter, 1995; Williams, 2000). They provide a sharable representation of the tasks, how they are performed, and how long each
will take. GOMS models can support user interface hardware and software design in several ways. They can be used to confirm consistency in the interface, that a method is available for each user goal, that there are ways to recover from errors, and that there are fast methods for frequently occurring goals (Chipman and Kieras, 2004).
The GOMS series of models had their most notable, documented application to predicting the performance of a new design for a telephone information operator’s workstation in Project Ernestine (Gray, John, and Atwood, 1993). In this case, a variant of GOMS predicted that performance with a new telephone operator workstation design would be so much slower than that of the existing workstation, which would result in an increased operation cost of about $2.5 million per year. The new workstation was actually built and soon abandoned because the predictions were correct. As another example, preliminary studies suggest that a modeling approach could make cell phone menu use more efficient by reducing interaction time by 30 percent (St. Amant, Horton, and Ritter, 2004). If applied across all cell phones, this would save 28 years of user time per day. Gong and Kieras (1994) describe a GOMS analysis that suggested a redesign of a commercial computer-aided design system would lead to a 40-percent reduction in performance time and a 46-percent reduction in learning time. These time savings were later validated with actual users. Thus, simple GOMS models can reduce the risk of subsequent operational inefficiencies quite early in the system development process.
Models can also provide quantitative evidence for change—they can be used to reject a design that does not perform well enough. Glen Osga (noted in Chipman and Kieras, 2004, pp. 9-10) did a GOMS analysis of a new launch system for the Tomahawk cruise missile system. The analysis predicted that the launch process with the new system would take too long. This was ignored and the system was built as designed. Indeed, the system failed its acceptance test and had to be redesigned. As Chipman and Kieras note, it was costly to ignore this analysis, which could have led to a better design.
Despite their usefulness, GOMS models have not been as widely used by human factors specialists or systems engineers in systems development, particularly in large systems. Although relatively straightforward, they are perceived to be too difficult and time-consuming to apply.
Digital Human Physical Simulations
A third class of models is anthropometric representations of the size, shape, range of motion, and biomechanics of the human body (see also the section on physical ergonomics). Digital human models have been created to predict how humans will fit into physical workspaces, as in ground,
aircraft, or space vehicles or to assess operations under the constraints of encumbering protective clothing. Representative of these models are commercial offerings, such as Jack (http://www.ugs.com/products/tecnomatix/human_performance/jack/) (Badler, Erignac, and Liu, 2002), Safeworks, (http://www.motionanalysis.com/applications/industrial/virtualdesign/safeworks.html), and Ramsis (http://www.humansolutions.com/automotive_industry/ramsis_community/index_en.php). They are available as computer programs that represent the static physical dimensions of human bodies, and they are increasingly able to represent the dynamics and static stresses for ergonomic analyses (Chaffin, 2004). They are primarily used for checking that the range of motion and accessibility are feasible, consistent with safe ergonomic standards, and efficient. They typically contain an anthropometric database that enables them to perform these evaluations for a range of types and sizes of users.
Dynamic anthropometric models are thus routinely used to reduce the risks of creating unusable or unsafe systems. The resulting models and analyses can be shared between designers and across design phases. Having a concrete computer mannequin that confirms the success or failure of accommodation at a workplace is a very useful shared representation. There is beginning to be interest in integrating these models with human behavior representations to integrate the physical and cognitive performance of tasks. MIDAS provided an early demonstration of this concept, and new developments are being introduced regularly (e.g., Carruth and Duffy, 2005).
Models that Mimic Human Cognitive and Perceptual-Motor Behavior
A fourth class, human performance and information processing models, simulates the sensory, perceptual, cognitive, and motor behavior of a human operator. They are referred to by some as integrated models of cognitive systems and by the military as human behavior representations. They interact with a system or a simulation and represent human behavior in enough detail to execute the required tasks in the simulation as a human would, mimicking the results of a human-in-the-loop simulation without the human.
Some of these models are based on ad hoc theories of human performance, such as the semiautonomous forces in simulations, such as the military ModSAF and JSAF. Others are built on cognitive architectures that represent theories of human performance. Examples of cognitive architectures include COGNET/iGEN (Zachary, 2000), created specifically for engineering applications; Soar (Laird, Newell, and Rosenbloom, 1987), an artificial intelligence–based architecture used for modeling learning, interruptability, and problem solving; ACT-R (Anderson et al., 2004), used to model learning, memory effects, and accurate reaction time performance;
EPIC (Kieras, Wood, and Meyer, 1997), used to model the interaction between thinking, perception, and action; and D-OMAR (Deutsch, 1998), used to model teamwork. Available reviews note further examples that have been developed for specific purposes (Morrison, 2003; National Research Council, 1998; Ritter et al., 2003).
These human behavior representations are more detailed because they actually mimic the information processing activities that generate behavior. They require a substantial initial investment, and each new application requires additional effort to characterize the task content to be performed. However, once developed, they can be used, modified, and reused throughout the system development life cycle, including to support conceptual design, to evaluate early design prototypes, to exercise system interfaces, and to support the development of operational procedures. They offer the ability to make strong predictions about human behavior. Because they provide not only what the descriptive models provide, but also the details of the information processing, they can be used to support applications in which it is useful to have models stand in for users for such things as systems analyses, or in training games and synthetic environments as colleagues and opponents. Models in this class have been used extensively in research and demonstration, but they have not, as yet, been widely used in system design (Gluck and Pew, 2005).
In some cases, models of human performance are represented only implicitly in a design tool that takes account of human performance capacities and limitations in making design recommendations. Automatic web site testing software is an example of this. Guidelines and style guides that suggest good practice in interface design are increasingly being implemented in design tools and guideline testing tools. A review of these types of testing tools shows their ease of use and increasing range (Ivory and Hearst, 2001). For example, “Bobby” (http://www.watchfire.com/products/webxm/bobby.aspx) is one of many tools to test web sites. Bobby notes what parts of a web site are barriers to accessibility by people with disabilities and checks for compliance with existing accessibility guidelines (e.g., from Section 508 of the U.S. Rehabilitation Act). Bobby does this by checking objects on a web page in a recursive manner against these guidelines (e.g., that captions for pictures are also provided to support blind users, that fonts are large enough).
While the developers of these systems may not have thought specifically about developing a model of the user, the guidelines and tools make assumptions about users. For example, Bobby makes assumptions about the text-to-speech software used by blind users, as well as about the range of visual acuity of sighted users. The implementation often hides the details of these models, creating human performance models that are implicit with the shared representation being only the results of the test, not the assumptions
supporting the test. On one hand, to their credit, these tools represent methods of incorporating consideration of human characteristics into designs that are very easy to use. On the other hand, just as with using statistics programs without understanding the computations they implement, using these tools without understanding the limitations of their implicit user models and performance specifications creates risks of inappropriate application or overreliance on the results.
Contributions to System Design Phases
Human-system simulation can play an important role in system design across the development life cycle to reduce the development risk. Human-in-the-loop simulation is widely accepted and has been applied successfully in all of the life-cycle phases discussed below. In this section, we focus on applications of human-system modeling because this kind of modeling has been less widely applied and has the potential to make significant contributions. In research labs routinely and increasingly in applied settings, the use of explicit computer models representing human performance has been demonstrated for a variety of uses, including testing prototypes of systems and their interfaces; testing full interfaces to predict usage time and errors; providing surrogate users to act as colleagues in teamwork situations; and validating interfaces as meeting a standard for operator performance time, workload, and error probability. They can also be used to evaluate the ability to meet user requirements and the interface consistency in a common system or a system of systems. Further reviews on models in system design are available (e.g., Beevis, 1999; Howes, 1995; National Research Council, 1998; Vicente, 1999).
Exploration and Valuation
Human-system models can be useful in exploratory design, because they can range from back-of-the-envelope calculations to formal models that reflect, at a detailed level, the costs and benefits of alternative approaches to a new or revised system. If one is working in air traffic control, for example, models of traffic flow in the U.S. airspace could be modified to postulate the impact of introducing alternative forms of automation. Analysis and network models will be particularly helpful in this stage because they are more flexible and can be performed earlier in the design process. In many cases, the model’s impact in the elaboration phase may be derived from design lessons learned from previous designs—they will help the designer choose better designs in what can be a very volatile design period.
An important contribution of a model, especially in the early development stages, is that the model’s development forces the analyst to think very deeply and concretely about the human performance requirements,
about the user-system interactions, and about the assumptions that must be made for a particular design to be successful. For example, a network model can help make explicit the tasks that must be supported, providing a way for development teams to see the breadth of applicability and potential requirements of a system.
Architecting and Design
During the system’s construction period, models help describe and show the critical features of human performance in the system. A human-system performance model can serve as a shared representation that supports envisioning the HSI implications of a design. As such, they can help guide design, suggesting and documenting potential improvements. Most model types can be used to predict a variety of user performance measures with a proposed system. These measures, including the time to use, time to learn, potential error types, and predicted error frequency, can provide predicted usability measures before the system is built. The models do not themselves tell how to change the system, but they enable alternative designs to be compared. As designers incorporate the implications of a representation in their own thinking, the models also suggest ways to improve the design. In addition, experience with models reflecting multiple design alternative provides a powerful way to help designers understand how the capacities and limitations of their users constrain system performance. Booher and Minninger (2003) provide numerous examples in which redesign was performed, sometimes with initial reluctance but with long-term payoff based on model-based evaluations at this and later stages of design.
In a previous section, the usefulness of prototypes was highlighted. Prototypes can be represented at many different levels of specificity. When the design has progressed to the point at which concrete prototype simulations can be developed, it can be very useful to exercise the simulation with a human behavior representation. The development of the human behavior representation itself will be illuminating because it will make the tasks and human performance concrete, but it will also be useful for exploring alternative operational concepts, refining the procedures of use, and identifying the user interface requirements. Again, the human-system simulation can serve as a very useful shared representation that brings the development team together.
Models can be very helpful in evaluating prototype system and user-interface designs. That is, using a model of the user to evaluate how the interface presents information or provides functionality.
Refining and testing offer perhaps the canonical application of user models in system design. The same or refined versions of models applied earlier in the design process can be reused to support system evaluation. A human model can exercise the interface and compute a variety of usability and system performance measures. While the system is still evolving, evaluation is formative—that is, supporting refinement and improvement. In the later stages of test and evaluation, the evaluation is summative, providing estimates of how the system will perform in the field. Many examples of refining systems using models are now are available (Booher and Minninger, 2003; Kieras, 2003; St. Amant, Freed, and Ritter, 2005).
Also, all types of models have been used to help create system documentation or for developing training materials. As the model specifies what knowledge is required to perform a task, the model’s knowledge can also serve as a set of information to include in training and operations documentation, either as a manual or within a help system.
The designs of a complex system are never complete because they continually evolve. Human-system simulations can continue to be applied to guide the evolution as experience is gained from the system in the field. Potential changes can be tried out in the simulated world and compared with the existing performance. This has frequently been done in the space program, in which engineers on the ground try out solutions with simulation to find the best one to communicate to the actual flight crew. It should be noted that simulations are less successful as complexity grows and for dealing with conditions such as boundary conditions and anomalies.
Strengths, Limitations, and Gaps
Simulations, particularly human-in-the-loop simulations, and human-system models are especially valuable because they make concrete, explicit, and quantitative the role of users in task execution and their impact on the characteristics of the systems to be controlled. They provide concrete examples of how a system will operate, not only how the equipment will operate, but also what human-system performance will result. Another aspect of the use of models and simulations in design is the cumulative learning that occurs in the designer as a result of a simulation-based design and evaluation process. When using a model or simulation to design an interface, the designer receives feedback about users, their behavior, and how they interact with systems. In their next design task, if the feedback was explicit
and heeded, designers have a richer model of the user and of the system, their joint behavior, and the roles users play. Having the knowledge in the designer’s head supports the creative process and makes the knowledge easier to apply than through an external tool.
Ease of use. If the models are more challenging and costly in time and effort than practitioners are willing to use, then one cannot expect them to be used to reduce risk during development. Full-mission human-in-the-loop simulation is costly and time-consuming to apply and should be used only when the potential risks and opportunities justify it. Part-task simulation is a less costly alternative in which only the elements that bear critically on the questions to be answered are simulated. Human-system models range widely in their scope and the effort required to apply them. While the keystroke-level model version of GOMS can be taught fairly quickly, other modeling approaches all appear to be more difficult to use than they should be and more difficult than practitioners currently are willing to use routinely. Even IMPRINT, a well-developed and popular collection of models, is considered too difficult for the average practitioner to use. This may be inherent in the tools; it may be due to inadequate instructional materials or to inadequacies in the quality of the tools and environments to support model development and use. It may also result from the lack of education or experience about how valuable the investment in models can be—that the investment is worth the cost in time and effort. Few people now note how expensive it is to design and test a computer chip, a bridge, or a ship or bemoan the knowledge required to perform these tasks. And yet humans and their interactions are even more complex; designing for and with them requires expertise, time, and support. Further work is needed to improve the usability of the model development process and the ease of use of the resulting models.
In order for human-system models to be credible as shared representations, they must make their characteristics and predictions explicit in a way that can be understood by the range of stakeholders for whom they are relevant. There is a range of questions that people ask about models including what their structure is, how they “work,” and why they did or did not take a particular action (Councill, Haynes, and Ritter, 2003). This problem is more acute for the more complex models, particularly the information-processing models. Unclear or obtuse models risk not being used or being ignored if they are not understood. Promoting the understanding of models will increase trust in understanding where the system risks are. Future models will need to support explanations of their structure, predictions, and the source of the predictions.
How models are developed will be important to how models will be used in system design. Using models across the design process from initial conception to test and evaluation will require adapting the level of depth and completeness to each task. Right now, model developers are at times still struggling with building user models once, let alone for reuse and across designers and across design tasks.
There have been several efforts to make models more easily used. For human behavior representations, these include Amadeus (e.g., Young, Green, and Simon, 1989), Apex (Freed et al., 2003), CogTool (John et al., 2004), Herbal (Cohen, Ritter, and Haynes, 2005), and G2A (St. Amant, Freed, and Ritter, 2005). At their best, these tools have offered, in limited cases, a 3 to 100 times reduction in development time, demonstrating that progress can be made in ease of use.
While promising, these tools are not yet complete enough to support a wide range of design or a wide range of interfaces, tasks, and analyses. For example, CogTool is useful and supports a full cycle of model, test, revise interface. It cannot model problem solving or real-time interactive behavior, but it starts to illustrate what such an easy-to-use system would look like. Research programs have been sponsored by the U.K. Ministry of Defence (“Reducing the Cost of Acquiring Behaviours”) and by the U.S. Office of Naval Research (“Affordable Human Behavior Modeling”) to make models more affordable and are sources of further examples and systems in this area.
Integration. There are gaps in integrating user models within and across design phases as well as connecting them to the systems themselves. As models get used in more steps in the design process, they will serve as boundary objects, representing shared understanding about users’ performance in the systems under evaluation—their goals, their capabilities to execute tasks, and their behavior. IMPRINT has often been used this way. Once widely used, there will be a need to integrate models to ensure that designers and workers at each stage are talking about the same user-system characteristics. The models might usefully be elaborated together, for example, starting with a GOMS model and moving to a human behavior representation model to exercise an interface. This kind of graceful elaboration has been started by several groups (Lebiere et al., 2002; Ritter et al., 2005, 2006; Urbas and Leuchter, 2005) but is certainly not yet routine.
The models will also have to be more mutable so that multiple views of their performance can be used by participants in different stages of the design process. Some designers will need high-level views and summaries of behavior and the knowledge required by users to perform the task, and other designers may need detailed time predictions and how these can be improved.
It is especially valuable for models of users to interact with systems and their interfaces. Models that interact with systems are easiest for designers to apply, most general, and easiest to validate. Eventually it could allow models’ performance to serve as acceptance tests, and it may lead to new approaches, such as visual inspection of operational mock-ups rather than extensive testing. Currently, connecting models to system or interface simulations is not routine. The military has shown that the high-level architecture connection approach can be successful when the software supporting the models and systems to be connected is open and available for inspection and modification. However, much commercial software is proprietary and not available for modification to support model interaction (Ritter et al., 2000). In the long term, we think that the approach of having models of human behavior representations interacting directly with an unmodified or instrumented interface software will become the dominant design approach, which can also include automatic testing with explicit models. Models that use SegMan represent steps toward this approach of automatic testing of unmodified interfaces (Ritter et al., 2006; St. Amant, Horton, and Ritter, 2004).
High-level languages. Currently, many models, particularly human behavior representation models, require detailed specifications. Creating these models for realistic tasks can be daunting. For example, there are at least 95 tasks to include in a university department web site design (Ritter, Freed, and Haskett, 2005). One way to reduce the risk that human behavior models will be unused is to provide a high-level language that is similar to that used in network models. Interface designers will need a textual or graphical language to create models that are higher level than most of the current human behavior representation languages, and analysts will need libraries of tasks (or the ability to read models as if they were libraries), and they will need to be able to make it easy to program new and modified tasks. More complete lists of requirements for this approach are available (e.g., Kieras et al., 1995; Ritter, Van Rooy, and St. Amant, 2002).
Cultural, team, and emotional models. Models of individual task performance have rarely included social knowledge about working in groups or cultural differences. Users are increasingly affected by social processes, including culture and emotion. As one better understands the role of these effects on systems, models will need to be extended to include what is known about these characteristics as a further element of risk reduction. For a mundane but sometimes catastrophic example, consider the interpretation of switches in different cultures. Some cultures flip switches up to be on, and some switch them down. The design, and implementation,
of safety-critical switches, such as aircraft circuit breakers or power plant controls, needs to take account of these cultural differences.
Social knowledge, cultural knowledge, theories of emotions, and task knowledge have been developed by different communities: models of social processes will need to be adapted if they are to be incorporated in models of task execution (like human behavior representation models, Hudlicka, 2002). Understanding and applying this knowledge to design is of increasing interest as a result of a desire to improve the quality of models performance and an acknowledgment that cultural, team, and emotional effects influence each other and task performance. For example, there is a forthcoming National Academies study on organizational models (National Research Council, 2007) and there is also recent work on including social knowledge in models of human behavior representation (e.g., Sun, 2006).