10
Strategies for Research

The fundamental methodological challenge for human factors research in air traffic control involves the central requirement to find a cost-effective means of generating valid human factors information and design recommendations. Resources for human factors research programs are usually limited, and as a result early planning decisions must be made to determine which research topics can be included in a program and which ones cannot. An awareness of alternative research methods, including their strengths and weaknesses, can assist in determining which research topics are the most likely to produce valid conclusions with the resources available.

Two salient questions in the design and development of complex human-machine systems, including the modernization of the contemporary air traffic control system, are: "What methodologies will identify the variables that influence operator performance and thus yield the information required for system design?" and "What are the appropriate methodologies for evaluating existing or new systems?" Both of these questions revolve around two requirements: (1) the specification of behavioral or performance criteria and (2) the determination of valid measurement procedures. This chapter focuses primarily on the different means by which the relevant human factors variables are identified (Dennison and Gawron, 1995; Wickens, 1995a). Although cautions for evaluation methodology have been provided by several authors in a recent volume on verification and validation of complex systems (David, 1993; Hancock, 1993; Harwood, 1993; Hollnagel, 1993a; Jorna, 1993; Woods and Sarter, 1993), we also provide an overview of measurement issues in complex systems.

Throughout this chapter, references are made to the concepts of validity and



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 197
Flight to the Future: Human Factors in Air Traffic Control 10 Strategies for Research The fundamental methodological challenge for human factors research in air traffic control involves the central requirement to find a cost-effective means of generating valid human factors information and design recommendations. Resources for human factors research programs are usually limited, and as a result early planning decisions must be made to determine which research topics can be included in a program and which ones cannot. An awareness of alternative research methods, including their strengths and weaknesses, can assist in determining which research topics are the most likely to produce valid conclusions with the resources available. Two salient questions in the design and development of complex human-machine systems, including the modernization of the contemporary air traffic control system, are: "What methodologies will identify the variables that influence operator performance and thus yield the information required for system design?" and "What are the appropriate methodologies for evaluating existing or new systems?" Both of these questions revolve around two requirements: (1) the specification of behavioral or performance criteria and (2) the determination of valid measurement procedures. This chapter focuses primarily on the different means by which the relevant human factors variables are identified (Dennison and Gawron, 1995; Wickens, 1995a). Although cautions for evaluation methodology have been provided by several authors in a recent volume on verification and validation of complex systems (David, 1993; Hancock, 1993; Harwood, 1993; Hollnagel, 1993a; Jorna, 1993; Woods and Sarter, 1993), we also provide an overview of measurement issues in complex systems. Throughout this chapter, references are made to the concepts of validity and

OCR for page 197
Flight to the Future: Human Factors in Air Traffic Control validation. Both are integral to discussions of research and system development. Campbell and Stanley (Campbell, 1957, 1969; Campbell and Stanley, 1966) have distinguished between the concepts of internal validity and external validity. Internal validity means that the findings (e.g., the observed cause-and-effect relationships) of a particular investigation follow logically and unequivocally from the way the investigation was designed and conducted. An investigation would be internally valid (and would constitute a controlled study) if no contaminating factors (i.e., confounding of variables) undermined the conclusions. External validity refers to the generalizability of the findings of an internally valid study to other situations. Generalizability refers to the assumption that a finding will hold or apply in situations other than the one in which it was observed (Chapanis, 1988; Locke, 1986; Sherwood-Jones cited in Taylor and MacLeod, 1994). Internal validity is a prerequisite for external validity or generalizability. Validation refers to the determination that a system design is appropriate for the intended purpose (i.e., that the system, when implemented, will provide the necessary functionality and that it will allow the articulated operational goals to be achieved, presumably in a safe and reliable manner). This chapter describes the research methodologies available for collecting human factors data, as well as the relative strengths and weaknesses associated with each approach. We consider: human engineering databases and literature, analysis of controller responses, computer simulation and modeling, design prototyping, real-time simulation, and field studies. Each of these methodologies is reviewed; we then discuss how the different methodologies can be combined in the investigation of a particular topic (e.g., operational errors in air traffic control). The chapter ends by summarizing the human factors measurement issues associated with the design and evaluation of complex systems. What we are describing is a series of strategies for research on air traffic control. Many of these strategies involve collection of data from past (through accident and incident analysis), present, and projected users of the system. However, other strategies, particularly those involved with modeling and computer simulation, examine issues in the absence of new data, for example, when a model is run to predict trade-offs between the safety and efficiency of a new system innovation, such as free flight. Appropriate marshaling of the full arsenal

OCR for page 197
Flight to the Future: Human Factors in Air Traffic Control of research strategies requires understanding both the strengths and weaknesses of each, as detailed in the sections that follow. HUMAN ENGINEERING DATABASES AND LITERATURE The application of human factors research to potential problems in air traffic control is sometimes prompted by the identification of generalizable human performance issues in other complex systems and at other times by direct analysis of the operational environment of air traffic control, including accident and incident analysis. Similarly, the relevant human factors literature and engineering databases may generalize across particular domains of human behavior but may also be the subject of tailoring for application in specific contexts. The breadth of engineering databases and the human engineering literature reflects the diversity of the human factors needs, not only in aerospace but also in surface transportation, weapons, medical, and other types of systems. Literature Sources Researchers in aviation systems are likely to be aware of specialized periodicals such as the International Journal of Aviation Psychology and the proceedings for the biannual meetings of the International Symposium on Aviation Psychology and the International Conference on Experimental Analysis and Measurement of Situation Awareness (Garland and Endsley, 1996). The Proceedings of the Human Factors and Ergonomics Society also publishes several papers on aerospace systems, including air traffic control. The same Proceedings, together with the society's journal, Human Factors, publishes many papers on human performance issues (e.g., workload, models of human error, perceptual processes, decision making, shift work, workspace design) that are directly relevant to problem areas in air traffic control as well as other human-machine systems. One of the most comprehensive references for the human factors database is Boff and Lincoln's (1988) Engineering Data Compendium: Human Perception and Performance. Although it has the appearance of a design handbook, in fact it can be valuable as a resource for air traffic control research questions as well. Handbooks such as Boff et al. (1986), Handbook of Perception and Human Performance, and Salvendy (1987), Handbook of Human Factors (with Volume 2 in press), provide research reviews (and design implications) of the major human factors issues. Several texts on aviation human factors are available, including Cardosi and Huntley (1993), Fuller et al. (1995), Hawkins (1993), Hopkin (1982a, 1995), Jensen (1989), Maurino et al. (1995), McDonald et al. (1994), O'Hare and Roscoe (1990), and Wiener and Nagel (1988). Texts by Cardosi and Murphy (1995) and by Hopkin are specifically addressed to air traffic control. Others, like the Jensen

OCR for page 197
Flight to the Future: Human Factors in Air Traffic Control and the Wiener and Nagel texts, include chapters on air traffic control, and the research on many human performance issues in the cockpit are applicable to the operational controller as well. For example, the human factors issues in cockpit resource management described in Wiener et al. (1993) and their applicability to air traffic control environments are relevant to our discussion of teamwork and communication. A series of documents published by the International Civil Aviation Organization focuses on human factors issues in air traffic control (e.g., International Civil Aviation Organization, 1993, 1994), including workspace, automation, communication, navigation and surveillance, and management systems. Another collection of texts by Wise and his colleagues (Wise and Debons, 1987; Wise et al., 1991, 1993, 1994a, 1994b) provides a wide-ranging survey of human factors problems and needs in aviation, with a particular emphasis on air traffic control. These texts report on papers presented and discussed at working conferences on contemporary challenges to the development and certification of complex aviation systems. Through the Civil Aeromedical Institute (CAMI) in Oklahoma City, the FAA Office of Aviation Medicine has compiled a listing of aviation medicine reports (Collins and Wayda, 1994). This list and the FAA Technical Center bibliography of human factors studies (Stein and Buckley, 1994), completed at the center over the last 35 years, together provide another significant component of the aviation human factors database. More recently, the Armstrong Laboratory at Wright-Patterson Air Force Base has published a bibliography of 50 years of human engineering research in the Fitts Human Engineering Division (Green et al., 1995). Transferability of Human Factors Data Caution must be exercised before lessons and recommendations obtained from databases and literature are transferred to air traffic control system design. As an example, Sperandio's (1971) early descriptions of how controllers use adaptive strategies to maintain performance in the face of increases in task load was one of the first indications that existing models of workload and performance could not be generalized for the direct prediction of controller behavior. Another indication is provided by the findings that operational errors tend to be associated with moderate or low workload (Stager and Hameluck, 1989, 1990; Rodgers, 1993). Some research has been done in developing cognitive models of controller processes (discussed below), but there are not yet normative models of controller behavior or controller performance by which to ascertain the validity of transferring findings from another task environment (within the broader aviation context) to air traffic control. Substantial work in completing cognitive task analyses for different types of air traffic control sector operations is required to help adapt existing human factors literature to air traffic control needs.

OCR for page 197
Flight to the Future: Human Factors in Air Traffic Control At the same time, some literature would appear to be more or less directly applicable. For example, descriptive models of human error (discussed below in the section on combining sources of human factors data) may be directly applicable in the post hoc analyses of operational errors and in the design of more effective controller-system interfaces. Limitations The data and recommendations contained in the human engineering literature frequently have not been tailored for specific applications. Expert interpretation is often required to determine the applicability (particularly without further validation) of data to a specific research question. Although it is often possible for human factors specialists to extrapolate from the literature to a design application, whenever possible, usability testing (i.e., for user acceptability) should be conducted in a rapid-prototyping or other simulation environment (see below). ANALYSIS OF CONTROLLER RESPONSES In this section, we describe three sources of human factors information: (1) incident analysis, (2) reporting systems, and (3) subjective assessments and verbal reports—a collective expression for a number of independent methodologies that depend on the subjective responses or comments of controllers rather than on their performance within the system. Our emphasis in this section is on the working system and the user experience (and performance) within the air traffic control system. Subjective assessments, however, are also an integral part of the rapid-prototyping process. Incident Analysis Because of the multiple causes of most accidents in highly redundant systems, such as those involved in aviation, accident analysis is often ambiguous in revealing human factors causes (Diehl, 1991). The occurrence of aviation accidents that are directly attributable to an air traffic control system error, such as the runway collision at Los Angeles International Airport (National Transportation Safety Board, 1991), is extremely rare. Incidents (which are referred to as operational errors), such as a loss of the required separation between aircraft, are more common but still relatively infrequent (Rodgers, 1993). The low frequency of incidents (versus the occurrence of errors that do not result in incidents) imposes particular constraints on the observation of precipitating conditions and statistical inference. By definition, incidents are concerned with either system or operator error or with a procedural deficiency. McCoy and Funk (1991) recently attempted to develop a taxonomy of operator errors based on a model of human information processing using NTSB aircraft

OCR for page 197
Flight to the Future: Human Factors in Air Traffic Control accident reports. They found that the air traffic control system was a contributing or probable cause in 6 of the 38 accidents they reviewed for the 1985—1989 period. When the search was extended back to 1973, they found a total of 29 examples of air traffic control involvement. The errors were related to attention, memory, perception (i.e., the validity of the controller's world model), and response selection (including the issuing of a clearance, coordination, and a variety of other procedures). From an analysis of operational errors, Redding (1992) reported that failure to maintain adequate situation awareness was the likely cause of most errors. As a result of their own review, McCoy and Funk argued for the design of error-tolerant systems (see Wiener, 1987, 1989) while still trying to prevent errors. Stager and Hameluck (1989, 1990) reported that, in their analysis of 301 Fact Finding Board reports, the occurrence of incidents was not related directly to rated workload. Operating irregularities were associated with conditions of moderate or low workload, normal complexity, and intermediate traffic volume and complexity. Allowing for the fact that more than one cause could be assigned by a review board to the same incident, both attention and judgment errors were cited as the cause in more than 60 percent of the cases examined. In a related study, at least half of all system errors were found to have "causal or contributory factors which are directly attributable to breakdowns in the information transfer process—usually in oral communications" (Canadian Aviation Safety Board, 1990:6). Similar findings were reported by Rodgers (1993) for an analysis of the operational error database for air route traffic control centers. In a second analysis, neither controller workload (number of aircraft being worked) nor air traffic complexity was found to be related to the severity of the operational errors. Incident analysis is a post hoc process, and the data that are available for analysis have frequently been filtered through a conceptual system that is reflected in the classification structure of the database itself. What data are collected at the time of a given incident are determined largely by the questions posed during the gathering of evidence. Rodgers (1993) has indicated that it is necessary to be able to review the dynamics associated with the air traffic situation (and not just the error-related event itself) when examining operational errors (Rodgers and Duke, 1994). Consequently, an analysis of operating irregularities can sometimes provide insight into the patterning of the occurrence of incidents without clearly identifying the underlying causal factors involved (Stager, 1991b; Stager and Hameluck, 1989, 1990; Stager et al., 1989). Still, from a procedural perspective, it is important to identify controller or system errors that can impact system safety (Rodgers, 1993; Durso et al., 1995). By focusing on the purely operational factors that are associated with an accident, it may be that the higher-level management and organizational factors are overlooked. Maurino et al. (1995) have recently tried to extend the scope of analysis beyond the individual to the system as a whole (see also Reason, 1990).

OCR for page 197
Flight to the Future: Human Factors in Air Traffic Control Reporting Systems The aviation safety reporting system (ASRS), coordinated by NASA in the United States (Nagel, 1988; Reynard et al., 1986; Cushing, 1994; Rosenthal and Reynard, 1991; Wickens and McCloy, 1993), the confidential human incident reporting programme (CHIRP), under the auspices of the Civil Aviation Authority in the United Kingdom, and comparable focal points in a few other countries provide confidential and anonymous channels for reporting actual occurrences or potential sources of error in the interests of aviation safety. Both pilots and controllers are able to report situations in which there has been a breakdown in standard procedures or errors in behavior have been observed. They are guaranteed anonymity, provided that the error has not been previously reported by others and a formal loss of separation did not result. The confidential reporting procedure allows appropriate follow-up action, including interviews, to be taken. This kind of reporting facility benefits safety by tapping evidence not otherwise available, complementing rather than replacing other more traditional means to improve aviation safety. The ASRS can sometimes provide a means of documenting not only a particular problem area in air traffic control operations (Monan, 1983) but also the impact of system changes, such as the introduction of the collision avoidance system, TCAS II (Mellone and Frank, 1993). The investigation of controller errors has often relied on ASRS data. For example, Morrison and Wright (1989) grouped controller errors within two broad concepts: control (monitoring, coordination) and communications (clearance composition, read/hear-back errors). Rosenthal and Mellone (1989) investigated anticipatory clearances (e.g., fast sequence clearances to expedite traffic flows in high-volume situations). Although the value of the ASRS for aviation safety has long been acknowledged, there are inherent limitations with the reporting system as a research methodology (Prinzo and Britton, 1993; Wickens and McCloy, 1993). For example, the language in which events are described by the participants does not necessarily reflect the same concepts used by human factors personnel to define causal relationships (e.g., mental workload, perceptual failure, inappropriate mental model). The controller or pilot may not appreciate the need for a description of (or may not be able to articulate precisely) the antecedent conditions. Some clarification can be achieved through the follow-up interviews by ASRS personnel and the use of keywords for incidents. There is always the concern in reporting systems (and in descriptions of operating irregularities prepared for boards of inquiry) that data can be constrained if not specifically determined by a predetermined conceptual structure. Questionnaires or lists of keywords prepared by persons with an operational background may not capture those aspects of an event that the psychologist or human factors specialist needs to interpret an incident within a valid framework. Harwood et al. (1991) have suggested that a relational schema (based on the

OCR for page 197
Flight to the Future: Human Factors in Air Traffic Control controllers' conceptualization of the air traffic control domain knowledge) overlaid on the ASRS data would be a helpful organizing tool. Conceptual structures found in their analysis of controllers' representations of relationships between concepts could provide a means of drawing together seemingly disparate incidents. Finally, the very significant volume of incident data that is collected each year itself imposes a constraint on use of the reporting system as an effective research methodology. The ASRS staff are able to follow up only a fraction of the reported incidents and to encode them in the appropriate psychological language. Moreover, a fully user-friendly means of exploring the ASRS database in order to generate hypothetical causal relationships is not yet available. Consequently, this information resource has been underutilized as a methodology. Subjective Assessments and Verbal Reports Subjective assessments are frequently used in the air traffic control environment (Hopkin, 1982b). They are convenient, inexpensive, and always available as an option; in some circumstances, they may yield data that can be obtained in no other way (Manning and Broach, 1992). For example, Harwood (1993) has described the role of subjective assessment methods in identifying, first, the human-centered issues associated with air traffic control system upgrades and, second, the required criteria and measures that are applicable during system transition. Subjective assessments are integral to the rapid-prototyping process (see below) and are commonly obtained in the laboratory investigations, real-time simulations, and field studies that provide a context for measures of controller performance as well as physiological and biochemical indices. The verbal reports can be very helpful in supplementing and explaining other measures, although they are not adequate as a substitute for other measures. For example, subjective comments and ratings are often collected to supplement behavioral, physiological, and biochemical measures of workload, effort, fatigue, and stress in air traffic controllers (Costa, 1991; Melton, 1982; Moroney et al., 1995; Smith, 1980; Stein, 1988; Tattersall et al., 1991). One of the most common applications of subjective assessments and verbal reports is in subjective workload assessments, which we discuss in a later section. In the discussion of subjective assessments in this section, however, the focus is on the use of controllers' own responses as a means of making inferences about their cognitive structure and information processing. Modeling Controller Processes Through Verbal Reports From a general systems design perspective, it is understood that the displays and inherent functionality of an operator's workstation (e.g., the nature of computer-human

OCR for page 197
Flight to the Future: Human Factors in Air Traffic Control interaction) must be compatible with the operator's mental model of the system characteristics (Edwards, 1991; Hollnagel, 1988; Lind, 1988; Van der Veer, 1987; Waern, 1989) and that the nature of the displays must match the nature and level of information processing at which the operator is working (Moray, 1988; Rasmussen, 1985; Rasmussen and Vicente, 1989). In air traffic control research, the subjective assessments of controllers coupled with both structured and unstructured measures of their information processes provide a means of gaining insight into their cognitive models (Leroux, 1993b, 1995; Mogford, 1991, 1994; Murphy et al., 1989). In some instances, the subjective assessments may depend simply on verbal reports (Whitfield, 1979; Whitfield and Jackson, 1983); in others, on the use of psychometric analyses, such as multidimensional scaling and cluster analysis, of subjective assessments (Kellogg and Breen, 1987; Stager and Hameluck, 1986). For example, verbal reports can be used to reveal the cognitive processes underlying a controller's performance during the management of traffic scenarios (Amaldi, 1994; Endsley and Rodgers, 1994) and the parameters that are considered by controllers in their decision making and that contribute to perceived airspace complexity (Mogford, 1994; Mogford et al., 1994a, 1994b). Limitations Subjective assessments can be useful, yet verbal comments reflect what individuals think they do or what they are supposed to do, not always what they actually do. It is often advocated that videotaped records be made in order to ensure completeness of the information obtained through direct subjective assessments and verbal reports. Assessments and verbal reports are subject to error through distortions of individual emphasis and the fallibility of human memory. On some occasions, users may voice subjective preference for systems that do not support the best performance (Andre and Wickens, 1995; Yeh and Wickens, 1988; Druckman and Bjork, 1994). If subjective measures are in disagreement with other measures, this does not justify discarding either type of measure, and the disagreement need not imply that one or the other measure is wrong (Muckler and Seven, 1992). Agreement with other measures may support and help to validate both the subjective and the objective measures. Finally, it is important to emphasize that with any use of subjective data, whether ratings collected in the laboratory or opinions collected in surveys, a good deal of expertise is necessary in order to design the instrument in such a way that the data obtained will be unbiased. Workload Assessment One of the most crucial functions of subjective reports in human factors has been to provide estimates of mental workload. However, this is an area that must

OCR for page 197
Flight to the Future: Human Factors in Air Traffic Control be considered very much in concert with other workload assessment methodologies. Four main classes of workload measurement procedures have been proposed and used: primary task, secondary task, physiological, and subjective measures. In addition, modeling, or predictive workload assessment, has also been proposed. Only the major aspects of each method and its associated strengths and limitations are considered here. For additional details on each of these classes of workload methods, consult O'Donnell and Eggemeier (1986), Lysaght et al. (1989), and Wickens (1992). Primary-Task Measures This method involves measurement of performance on the primary tasks of interest. The techniques available for assessment of controller performance, particularly as they pertain to selection and training, were discussed in Chapter 3. Given that reliable performance assessment methods are used, the performance of the controller on a particular task is assumed to reflect directly the mental workload associated with achieving that level of performance. Primary-task measures such as the number of aircraft per unit of time, number of control actions, and mean aircraft proximity (Rodgers et al., 1994) have the merit that, if shown to be valid, they can be directly related to operational performance, which can be an advantage in relating workload to system performance, thereby aiding in system evaluation. However, there are at least two nullifying disadvantages: (1) Primary-task performance can be dissociated from mental workload; that is, the same output level of performance may be associated with different degrees of controller workload. Sperandio (1971) showed that controllers often respond to an increase in imposed task load (e.g., increased traffic density) by subtle variations in operating procedures (e.g., shortening the length of verbal messages to pilots) in order to regulate their performance. More generally, controllers may use a variety of strategies to maintain a certain criterion level of performance in response to an increase in task load. However, this could come at the cost of higher mental workload, leaving potentially little margin for dealing with emergencies or additional tasks. The primary-task workload index is insensitive to this potential problem. (2) Primary-task performance measures may be difficult to obtain in practice, particularly for cognitive activities such as planning and decision making, during which the controller may make very few if any overt responses that can be measured. Furthermore, the overt response represents the end product of a number of considerably demanding information-processing activities and as such may provide only an incomplete index of the workload associated with these processes. Secondary-Task Measures In the secondary-task procedure (Brown and Poulton, 1961; Garvey and

OCR for page 197
Flight to the Future: Human Factors in Air Traffic Control Taylor, 1959), the operator is asked to concentrate on performing well on the primary task and to allocate any residual attentional resources or capacity to the secondary task. The basic premise is that performance of the secondary task reflects the workload demands of the task to be assessed, given that primary and secondary tasks both make demands on the same information-processing resources.1 Early studies of controller workload used the oral and written communications of the controller as an embedded secondary task and found that, as task load increased, verbal communications became shorter and more stereotyped (Leplat and Browaeys, 1965) and handwriting deteriorated in form and content (Kalsbeek, 1965). More recent studies have used various secondary tasks drawn from the experimental psychology literature on dual-task performance, for example, probe-reaction time, rhythmic tapping, Sternberg memory search, random number generation, time estimation, and combinations of tasks (see O'Donnell and Eggemeier, 1986, for a review). One of the advantages of the secondary-task procedure is that the choice of secondary task can be theory-driven (e.g., the multiple resource theory of Wickens, 1984) and therefore potentially diagnostic of the source of workload, rather than simply providing an estimate of overall workload. This method is also one of the few workload techniques that can reveal the upper limit of a controller's capability and hence can be potentially valuable for estimating controller response to emergency events. One of the major disadvantages of the technique is its relative obtrusiveness, particularly when even transient diversion of the controller's resources away from the primary task may compromise safety. The use of secondary tasks that are embedded as a natural but lower-priority element within the main tasks that the controller has to perform (e.g., removal of a flight strip that has been handed off to another controller) may partially overcome this problem. Physiological Measures Physiological measures that reflect aspects of mental workload have been a focus of continuing interest for many years, and early applications included assessment of controller workload (e.g., Kalsbeek, 1965). Although different classification schemes can be used to describe the various physiological measures that are available, in general they cluster around two types: (1) background measures that are not specifically linked to ongoing task events or the timing of controller activities or responses. Measures in this category include the spontaneous 1   Various other assumptions must also be met for the secondary-task method to yield interpretable results. For example, both tasks should be resource sensitive and not data limited. (Norman and Bobrow, 1975); the secondary task should not be capable of being performed purely automatically; and primary-task performance should not vary with the introduction of the secondary task or with different secondary tasks (see Fisk et al., 1986, and Wickens, 1984, for further discussion).

OCR for page 197
Flight to the Future: Human Factors in Air Traffic Control DESIGN PROTOTYPING A significant methodological concern in human factors research has been how to bridge the gulf between laboratory and field (Chapanis, 1967; Dipboye, 1990). In response to this concern, advances in contemporary computers and software have contributed to the development of rapid prototyping in a laboratory environment. Description of Prototyping Rapid prototyping is a technique for gathering the comments and impressions of future users and others regarding the capabilities and limitations of a simulated air traffic control workstation or workspace while it can still be changed. It is not a substitute for testing, and its functions are more analogous to planning, since it encourages the formulation of alternatives but cannot usually yield quantitative evidence, for example, about capacities or characteristic human errors. Prototyping can be a useful technique to discard fundamentally flawed options quickly, to identify crucial combinations of circumstances that require testing, and to discover main topics of agreement and of disagreement among controllers. The display of dynamic and interactive representations of proposed operational controls and displays (i.e., the computer-human interface) on computer screens enables several alternative design concepts to be quickly evaluated by research, design, and operations personnel (Dennison and Gawron, 1995). Suggested modifications can be made to the prototype displays, often on line, with the observers providing further input to the revised interface and the proposed functionality. Contemporary prototyping provides a means of simulating many of the elements of the displays, including the input mechanisms, the visual display characteristics, and the data processing functionality that will ultimately comprise the nature of the controller's interaction with the fielded system. Task analyses can provide the basis for a workstation design for an air traffic control tower, for example, in order to meet design requirements (e.g., Miller and Wolfman, 1993), but whether the specifications translate into a usable interface in the operational environment has to be determined through prototyping and evaluation by experienced users (Fassert and Pichancourt, 1994). There is a need, for example, to promote human factors experimentation and user involvement during the design process in order to ensure design validity. One of the objectives for design programs in several air traffic control projects has been to ensure active participation by experienced controllers in the developmental work (Day, 1991; Dujardin, 1993; Leroux, 1993a; Simolunas and Bashinski, 1991; Stager, 1991a). After a review of the procurement process for the advanced automation system (AAS), Small (1994) forcefully recommended more effective use of prototyping and more appropriate use of controller teams in design programs.

OCR for page 197
Flight to the Future: Human Factors in Air Traffic Control Limitations Much of the prototyping process depends on subjective assessments. In some instances, subjective assessments may be the only type of data collected during rapid prototyping. For that reason, as noted above, the concept of performance-preference dissociations (i.e., subjective preferences that are not supported by measures of performance) are discussed in this context (Bailey, 1993). Wickens and Andre (1994) and Andre and Wickens (1995) have reviewed evidence concerning the dissociation between preference and performance in terms of design factors such as display interfaces and color applications (Hopkin, 1994; Narborough-Hall, 1985). Druckman and Bjork (1994) discuss other aspects of this dissociation. The potential for the dissociation of preference and performance measures argues for the regular use of performance measures to augment preference ratings during prototyping and usability testing (Bailey, 1993). Practical constraints on training do not enable controllers to become fully proficient with the prototyped interface. The objectives in prototyping are therefore often limited to demonstrating feasibility and viability rather than aspiring to determine ultimate performance capacities and quantification. As an evaluation methodology, rapid prototyping is constrained in the evidence that it can afford the investigator, but it is capable of demonstrating that a certain design response to a given requirement is feasible and warranting of further development. Given the apparent fidelity of the computer-controller interaction that can be presented and evaluated in the rapid prototyping environment, however, there is a concern that acceptability in the design laboratory can be too quickly equated with task validation for the operational environment. Hopkin (1995a) has observed that it is appropriate to use the prototyping environment to establish the feasibility of design concepts, provided that it is recognized that, in part, rapid prototyping is actually a surrogate for the planning, thinking, and formulation stages of system evolution and cannot function as an alternative form of validation. Similarly, Jorna (1993) has cautioned that subjective opinions (during prototyping) are not necessarily sensitive to all design factors and can actually lead to nonoptimal designs, and that relative ease of prototyping can lead designers to skip or postpone the research phases of more systematic, integrative performance evaluations (i.e., evaluations that carefully consider a full range of operating conditions). Such evaluations are best accomplished through real-time simulation. REAL-TIME SIMULATION Contemporary human factors research, whether directed toward problem solving in current air traffic control operations or toward system development, usually involves system simulation and the measurement of operator performance in real time (i.e., with the tempo of actual or live operations). Early human

OCR for page 197
Flight to the Future: Human Factors in Air Traffic Control factors research, however, depended either on laboratory experimentation (with the expectation that the observations could be extrapolated to operational settings) or on the use of simulators with controls and displays that could not be readily changed (e.g., Fitts, 1947; Green et al., 1995; Taylor, 1947). Even the early attempts at simulating larger parts of systems (e.g., Fitts, 1951; Parsons, 1972; Porter, 1964) depended on electromechanical and hardwired equipment and were closer to complex laboratory experiments than the advanced high-fidelity simulations familiar today. Uses The use of simulation as a research methodology is usually undertaken with the intent to validate human factors design decisions or tentative conceptual models. As a technique for collecting human factors information, real-time simulation has many strengths (Sandiford, 1991). It can also be useful in predicting training requirements, identifying sources of potential error or failure, validating planning decisions (Beevis and St. Denis, 1992), and exploring interactions between human and machine roles (Hollnagel, 1993a). There are three kinds of approaches to real-time simulation for human factors air traffic control research and development. The first uses a complex air traffic control simulation facility built to replicate the functioning of major air traffic control regions. As many as 20 or 30 staffed control positions may be studied concurrently, together with much of the interaction and communications between them and pilots in the simulation as well as with adjacent sectors and other agencies (Stager et al., 1980a, 1980b; Transport Canada, 1979). The purpose is to test and quantify the viability of forms of air traffic control proposed for the region simulated or for other regions of airspace that it typifies. Many system measures as well as human factors ones are normally taken, because the simulation is not exclusively or even primarily a human factors evaluation (Stein and Buckley, 1994). Sometimes the purpose is exploratory, to establish feasibility and quantify capacities or to compare options by simulating them. Measures include system functioning, communications demands and characteristics, system errors or deficiencies, and controller performance (and its costs in terms of workload, effort, motivation, attitudes, and team roles and relationships). Particular expertise may be required to determine the validity of the behavioral data collected from a given simulation environment (Narborough-Hall and Hopkin, 1988). Such simulations are seldom continuous; each exercise typically lasts for an hour or two, although simulations of longer duration are not uncommon. Questions that would require continuous running that would involve gross fluctuations in traffic demands or shift characteristics, for example, are rarely undertaken. In the second approach, simulation experiments represent an integral part of the design process in order to provide iterative validation assessments of the

OCR for page 197
Flight to the Future: Human Factors in Air Traffic Control proposed operational system (see also the section below on when the measurement process should occur). For advanced complex automated systems, Taylor and MacLeod (1994) advocate that progressive acceptance testing and evaluation, using real-time simulation, should be embodied throughout the different stages of system design and development. The third kind of approach to real-time simulation is usually simpler than the first two approaches and often deals more exclusively with human factors. Typically it identifies the question first and simulates whatever is thought to be necessary within the resources and time scales available to answer it validly (Albright et al., 1994; Bussolari, 1991; Collins and Wayda, 1994; Mogford, 1994; Parsons, 1972; Stager and Paine, 1980; Stein, 1985, 1988, 1992; Stein and Garland, 1993; Vortac et al., 1994, 1992). Traditional laboratory experimentation on a wide range of perceptual-motor and cognitive processes relevant to the air traffic control environment is usually required to evaluate and document new technology (e.g., display systems, input devices) (Stager and Paine, 1980; Vortac et al., 1993; Vortac and Manning, 1994). When a new technology is considered for air traffic control (e.g., electronic flight strips), researchers may employ quite a rudimentary air traffic control simulation to discover its potential strengths and weaknesses and to ascertain whether it is a serious candidate for application. The terminal airspace simulation facility (TASF) at the FAA Technical Center provides an example of such a facility to examine controller issues with the introduction of automation in both terminal and en route environments (Benel and Domino, 1993). If the question is simple, the simulation might be as well. Comprehensive simulation of every aspect of air traffic control is not sought; the aim is to impose sufficient control to disentangle the effects of the main variables and to ascertain the most sensitive measures. Limitations Simulation Fidelity There has been a tendency in real-time simulation to equate fidelity with validity. Strenuous efforts may be made to replicate faithfully air traffic control workspaces, tasks, equipment, and communications so that even an informed visitor may not recognize at once that the air traffic control is not real. There is no adequate theoretical or empirical basis, however, for prescribing which aspects of air traffic control must be simulated with what fidelity in order to yield findings with a specified validity. In the absence of such guidance and in order to provide assurance that the desired level of predictive validity will be achieved, the practice has been to strive for similarity between the experimental simulation (i.e., subjects, equipment, tasks, and test environment) and the intended application (Chapanis, 1988). There is no established procedure for measuring similarity,

OCR for page 197
Flight to the Future: Human Factors in Air Traffic Control however, or knowing how similar is similar enough (Chapanis, 1988; Druckman and Bjork, 1994; Meister, 1991). A knowledge of human capabilities and limitations, and especially of principles of learning and the transfer of training (Singley and Anderson, 1989; Holding, 1987), often helps to provide an insight into which aspects of a system are crucial for the human tasks (and must be simulated faithfully) and which are not. Most simulations impose restrictions on their usage, and the methods and measures employed must take account of this (Buckley et al., 1983). All participants in simulations receive specific instructions. Initiatives, nonstandard practices and short cuts, the development of professional norms and standards, and team and supervisory roles are typically curtailed in simulation. Many organizational, managerial, and scheduling features of air traffic control, its work-rest cycles, its working conditions, and the interactions between work and domestic life are absent altogether from simulated air traffic control. If simulation exercises run too smoothly, the time absorbed in communications, coordination, liaison, delays, and difficulties in conducting dialogues or in reaching agreement is underrepresented. The roles of teamwork and supervision can be underestimated in evaluations if these elements have not been fully developed. In addition, real-time simulations tend to underplay individual differences between controllers as unwanted sources of variance in relation to the simulation objectives. Practical Constraints on Data Collection Experimenters frequently attempt to enhance the face validity of experimental simulation by using highly trained operators as subjects. Baker and Marshall (1988) suggest that experienced operators are probably more highly motivated to do well, but they will also tend to perform well on any reasonable system. As a result, it is difficult to find differences between alternative designs even though, in operational environments, it is the differences experienced by the less well trained (or less skilled) controller that are most likely to compromise safety (Wickens, 1995b). In many human-machine experiments, the combination of shorter work periods and high motivation can lead to artificially high levels of operator performance or, simply, invalid estimates of human behavior in the planned system (Baker and Marshall, 1988). A restricted test population often requires elaborate repeated-measure designs, with their attendant problems of fatigue, and practice. The restriction to use highly trained operators may also provide an inadequate sample size for stable performance estimates. One of the primary sources of difficulty for human-machine experiments is the restricted time scale under which most system development projects must operate. For this reason, Baker and Marshall have expressed the concern that

OCR for page 197
Flight to the Future: Human Factors in Air Traffic Control experimental factors may be manipulated more from considerations of expediency than from validity and that the experiments provide an overly optimistic expectation concerning the input of factors related to shift work, fatigue, and boredom. There is an obvious need to look at longer experimental sessions, low activity periods, and transitions from inactivity to peak loads (e.g., Hancock, 1987; Huey and Wickens, 1993; Smolensky and Hitchcock, 1993). In real-time simulations, it is often difficult to establish the validity of data concerning absolute capacities, workloads, strategies, and error rates because of the constraints outlined here. Specialized knowledge is usually required to evaluated their generalizability to actual operations. FIELD STUDIES Field investigations enable operational evaluations of elements of a system to be conducted on integral equipment that is being used on-line in the control process (Moody, 1991). In addition, field studies are likely to be undertaken when the interactions of a comprehensive set of variables can be observed dependably only in the actual operational environment (i.e., the limitations of real-time simulation are most evident). These studies represent a cost-effective method of obtaining human factors information provided that the constraints imposed on the data collection process do not invalidate the observations because of sampling restrictions (e.g., limited parameter variation and less experimental control of the relevant variables). Uses of Field Studies One of the advantages of collecting human factors information in a field setting rather than in real-time simulation is that the live operational environment provides a means of capturing the subtleties in operational practices and work habits that may not carry over into a simulation environment. Direct access to the operational personnel allows the discovery of unexpected feature use and an assessment of the extent to which a proposed tool or functionality will support the controllers. The ability to capture their experience with new technology, for example, is especially important for complex automation in which the implications of the interactions between system components are largely unknown prior to implementation (Harwood, 1994). In field studies, the methodology followed for data collection can be quite varied from one study to another, depending on the objectives of each study. In some cases, unobtrusive measures are used to collect data to avoid contamination through the observation process itself, but the validity of the data are highly dependent on the sampling procedures. In their review of the air traffic control communications literature, Prinzo and Britton (1993) point to the advantages of audiotaped databases (e.g., objective, reliable, and verifiable real-time records)

OCR for page 197
Flight to the Future: Human Factors in Air Traffic Control versus the ASRS database. The studies of air-ground communications reported by Cardosi (1993) and by Morrow et al. (1993) were based on off-line analyses of audiotaped samples. Similarly, the cross-validation of the type and frequency of communications errors observed in the bilingual simulations (Transport Canada, 1979) described by Stager et al. (1980a) involved the analysis of audiotaped communications from the actual operational sectors that were also being simulated in the laboratory. V.D. Hopkin (personal communication, 1995b) has described a field study undertaken to determine the effect of lowering the altitude for the base of the holding patterns used at Heathrow airport. In order to evaluate the impact of the change on adjacent en route sectors, concurrent simulation of Heathrow and adjacent sectors would have been required—a requirement beyond the available simulation capability. Because the field trial involved live traffic, it could only be conducted when the controllers and their supervisors agreed to try it. One of the best and most sensitive measures proved to be of circumstances when they were willing to try it and other circumstances when they were not willing to go on trying it any longer. Some of the main findings concerned variables that would have been held constant in a simulation but could not be controlled live and therefore had to be measured as and when they occurred. Harwood and Sanford (1994) and Scott (1996) describe recent field evaluations of an element of the center-TRACON automation system (CTAS) (Erzberger et al., 1993; Lee and Davis, 1995), undertaken at Denver International Airport. Harwood and Sanford suggest that early field testing during the development cycle can provide both an insight into how the system elements will function in the operational environment and an opportunity to capture and refine meaningful requirements for system certification. The development and evaluation of the CTAS was undertaken at two operational field sites, applying a field development and assessment process to one of the CTAS tools, the traffic management advisor (TMA). Context-sensitive data collection techniques (i.e., techniques based on observation and interpretation in the context of the user's work environment—Whiteside et al., 1988) were used in the evaluation. Direct behavioral observations in field settings can also provide the data for a research method called the critical incident technique (Flanagan, 1954; Meister, 1985). Researchers have traditionally relied on the critical incident technique to determine measures of proficiency or the attributes relevant to successful performance in complex operations. With the increased emphasis on cognitive activities in operational environments, however, behavioral observation has become more difficult, if not irrelevant for understanding operator performance; Shattuck and Woods (1994) have cautioned that a new set of principles is needed. This is particularly the case in air traffic control, in which observable behaviors do not adequately reflect what the controller is doing at any given moment.

OCR for page 197
Flight to the Future: Human Factors in Air Traffic Control Limitations of Field Studies In field studies, there are inherent constraints on variable or parameter control (e.g., traffic volume or complexity) that limit the conditions over which the fielded system can be evaluated. There may also be restricted on-line access to controllers. These constraints, however, have to be weighed against the advantages that have already been described and the fact that the limitations cited for real-time simulation are less likely to apply in the field environment. COMBINING SOURCES OF HUMAN FACTORS DATA Although each source of human factors data has been described independently of others, it is usually preferable that one source or research setting supplements another. For example, Seamster et al. (1993) have described their cognitive task analysis of expertise in air traffic control using a combination of methods, including simulation exercises, structured and unstructured interviews, critical incident interviews, paired paper problem solving, cognitive style assessment, structured problem solving, and simulated performance modeling. The designs that evolve through rapid prototyping are customarily validated in real-time simulation. The effects of certain constraints in real-time simulation can be evaluated only through systematic observation in supplemental field studies. Sarter and Woods (1995) provide an excellent example of the convergent use of real-time simulation, verbal protocols, and accident and incident analysis for understanding mode errors in flight management systems in transport aircraft. One of the best examples of using different methodologies to expand the array of evidence on a human factors problem is the investigation of human error in aviation (Nagel, 1988; Wiener, 1980, 1987, 1989), and particularly in air traffic control (Danaher, 1980; Stager, 1991b). The description and analysis of the variables associated with human error tend to lie in the domain of descriptive models of human error and cognitive processes (Hollnagel, 1993a, 1993b; Rasmussen, 1987; Reason, 1990, 1993; Reason and Zapf, 1994; Senders and Moray, 1991; Stager, 1991b; Woods, 1989; Woods et al., 1994). Nagel (1988) suggests that there are four approaches to gathering evidence about errors (already reviewed in this chapter): direct observation of the operational environment; incident analysis (Baker, 1993; Rodgers, 1993); use of the ASRS system (although the methodology of using self-reports need not be limited to formal aviation reporting systems and can involve diary studies within one or more control centers; Empson, 1991); and the observation of operator behavior in simulations. In spite of the inherent difficulty of acquiring low-frequency error data, real-time simulation probably represents the only potential source for such data and therefore the best means of validating models of human error. Even with the operational fidelity of real-time simulations, however, the validity of any error data can be brought into question by the assumptions that are made

OCR for page 197
Flight to the Future: Human Factors in Air Traffic Control concerning how real-world failures occur (Hollnagel, 1993a; Maurino et al., 1995; Reason, 1990, 1993; Rubel, 1976). For this reason, observations in the simulation environment need to be seen as complementary to structured observations in the actual operating environment (Hollnagel, 1993a); in fact, observations in either context are influenced by information drawn from incident analysis and reporting systems. MEASUREMENT IN COMPLEX SYSTEMS All of the methodologies that have been described in this chapter raise issues of measurement, including the associated concepts of measurement validity and system validation. This is particularly true for real-time simulations and field studies, for example, when the studies are undertaken to evaluate proposed design changes. The investigation of human factors questions and design validation in air traffic control requires that three questions be addressed: What is to be measured? How should the measurement be done? When should the measurement process occur? All three aspects can affect the external validity (i.e., the generalizability) of evidence that has been gathered. Contemporary human engineering design (and ultimately system validation) is challenged by the requirement to accommodate and to predict the variance in human behavior in complex human-machine systems, in spite of the practical constraints that can be placed on studies of operator behavior (Stager, 1993). Validation can easily be seen as a matter of measurement (Kantowitz, 1992) with the concomitant concerns of what one measures as criterion variables (Harwood, 1993) as well as how one measures the behavior of human-machine systems (Hollnagel, 1993b; Reason, 1993; Woods and Sarter, 1993). A distinguishing feature of many performance measures (Hopkin, 1979, 1980, 1982a) is that they are not simply direct measures of controller behavior and that the same measures are often taken, for example, to be indices of system safety. What Is to Be Measured? One of the issues in measurement might be called the criterion problem. Current engineering requirements, as outlined in MIL-H-46855B (U.S. Department of Defense, 1979), call for any contractor to establish and conduct a test and evaluation program to ensure fulfillment of the applicable requirements. Section 3.2.3 states that human engineering testing is to include the identification of criteria for acceptable performance of the test. The criterion measures that are associated with the operational requirements

OCR for page 197
Flight to the Future: Human Factors in Air Traffic Control and detailed in system specifications define only a part of the evaluation process that will be required in system development. The challenge for human engineering evaluation is created by the criteria that are left unspecified and/or are to be identified in the human engineering program (Harwood, 1993). Often the unspecified criteria will relate to error rates, workload, maintenance of situation awareness, and the potential for performance decrement associated with prolonged operational stress or fatigue. The issue of defining criterion measures for the human-system component in complex systems therefore impacts system safety and efficiency (Christensen, 1958; Harwood, 1993). Criterion measures of performance are also required in order to evaluate the relative effectiveness of system design concepts. Only when there is a standard of required performance that is compared with actual performance can one talk about measures of effectiveness (Meister, 1991). When there are insufficient grounds for extrapolation of performance standards from an existing system, human performance models and models of cognition may provide estimates for the level of performance that can be anticipated (Bainbridge, 1988; Pew and Baron, 1983; and Rouse and Cody, 1989). Although there are human performance models for basic cognitive processes (Elkind et al., 1990) and for task allocation and workload analysis (McMillan et al., 1989, 1992), there is a fundamental requirement for the development of human performance models that are directly applicable to the air traffic control environment. How Should the Measurement Be Done? External validity and thus generalizability can be viewed as having three major components: representativeness of subjects; representativeness of variables; and representativeness of setting (i.e., ecological validity) (Kantowitz, 1992; Westrum, 1994). Test situations have to be representative of those encountered in the operational environment, and the functions provided by the interface have to be sufficiently complete if valid measures are to be obtained (Hollnagel, 1993a). The validation methods chosen have to be sufficiently sensitive to detect design errors (Woods and Sarter, 1993). When Should the Measurement Process Occur? An encouraging aspect of the trend toward including cost-effectiveness as a criterion in judging the efficacy of research is that it may force more critical consideration of the reliability and validity of air traffic control research and its outcomes (Westrum, 1993). Validation (as iterative evaluation) has to be an integral part of system design rather than a ''fig leaf" at the end of the process (Woods and Sarter, 1993). The objective of system evaluation should be to help the designer improve the system

OCR for page 197
Flight to the Future: Human Factors in Air Traffic Control and not simply justify the resulting design (see Chapter 11). When system designers are properly seen as experimenters, the measurement process becomes a critical element in the design and interpretation of the converging evidence on system performance (Woods and Sarter, 1993). CONCLUSIONS A number of research methodologies are available to obtain human factors data, each with relative merits as well as constraints and limitations. Our review of human factors methodology has identified the following critical needs for air traffic control research: Systematic efforts are needed to make access to the aviation safety reporting system database more user-friendly, both to encourage exploratory data analysis and to enable specific questions on human performance to be asked and answered more quickly. The constraints on the usability of the data contained in aviation reporting systems work against an early focus on potential safety areas within the air traffic control system. Systematic work is needed to formalize the role and to enhance the contribution of rapid prototyping to the process of determining the characteristics of computer-human interaction. Careful application of current computer technology to the methods, standards, and objectives of rapid prototyping, particularly for multitask evaluations, could significantly advance the activity as a integral methodology for human factors research. A cost-effective simulation capability is needed, within each system design program, that will support progressive acceptance testing. At present, there is a tendency to equate acceptability in the design-prototyping laboratory prematurely with task validation for the operational environment. There is a need for the development and validation of human performance models applicable to air traffic control research, as well as for approaches to integrate human performance models with system models. There is a need for universally recognized quantifiable dimensions of controller performance. Dependent variables that define controller performance across the spectrum of operational contexts that are sensitive to variation in determinants of performance (including, for example, the cognitive variables of workload and situation awareness) are needed for human factors research. In the absence of a commonly accepted set of measures, the articulation of the critical (but measurable) variables is likely to be undertaken anew in each project. Additional human engineering standards and guidelines are needed for design validation (i.e., beyond the current MIL-H-46885B), that will be applicable to research undertaken to support system development.