While human-machine collaboration differs substantially from human (only) teamwork, it helps first to understand how humans work together in teams in order to understand the expectations that humans have for teamwork situations. This chapter focuses on several elements that affect human teams, such as decision analysis, trust, memory, and accounting for human error; it ends with a brief discussion on task allocation.
Humans rely on their fast intuitive decision making capabilities in many situations but, when decisions are complex and the stakes are high, a slower, more deliberative process based on decision theory and decision analysis is worth using instead. Such a process does not try to predict or mimic intuitive human decision making but instead decomposes complex problems into their component parts, so as to take actions based on normative engineering principles.1 It provides more consistency between the actions taken in similar situations and more transparency in the reasoning and judgments used to choose those actions. This transparency also allows machines to use these same component parts to make decisions and to support human decision makers.
Underlying the decision analysis approach is a Bayesian view of the world, where the decision maker’s beliefs about uncertain distinctions and quantities are represented with probabilities, continuously updated to reflect the decision maker’s observations. The decision maker’s initial beliefs are represented explicitly and can therefore be informed by expert judgment. However, those beliefs can become insignificant when there is sufficient relevant data.
The two other components needed to make decisions are alternative choices and preferences over prospective outcomes. The alternative choices are represented by a set of available actions in a given situation, and the preferences by a utility value for each outcome. At any given point in time, the decision maker should take the action that provides the greatest utility, taking into account the probabilities and utilities of the possible outcomes that can result from the action. Therefore, we can distinguish between the quality of a decision, based on the reasoning that went into it, and the quality of the outcome, which was still uncertain when the decision was made. All of this is set within a decision frame, the underlying context that captures the appropriate uncertain distinctions and alternative choices.
When facing or anticipating a decision, the best action is often to gather more information. The decision maker must weigh the cost of the information in time, money, and other resources against the benefit arising from the ability to change and improve the choice depending on what will be observed. If the decision maker would make the same choice
1 E.g., Hammond, J.S., Keeney, R.L. and Raiffa, H. Smart Choices: A Practical Guide to Making Better Decisions. 1999. Harvard Business School Press.
regardless of what information will be revealed, it would not be worth gathering.
The quality of the decision can depend critically on access to all of the relevant information available. It is important to develop methods for recognizing which data sources are useful for particular decisions and which are not. When multiple information sources are available, the decision maker can use a model to account for any relationships among them. In situations involving data with systematic errors or potential deception, the decision maker can interpret the data by a similar modeling process, and thereby learn indirectly about the distinctions of interest.
Many parameter judgments are needed to represent the decision maker’s beliefs and preferences. Performing sensitivity analysis, seeing how the choices and their values change when these parameters are perturbed, can identify which parts of the model are less robust, where additional information gathering would be most valuable and improve the quality of the decisions.
Graphical models, such as Bayesian belief networks and influence diagrams, have been valuable tools for building, communicating, learning, and analyzing models among decision makers, experts, analysts, and machines.2 They also help identify which sources of information might be relevant for particular decisions. There have been many promising and fielded applications using these methods, with applications as diverse as Space Shuttle engine monitoring,3 genetic analysis,4 and breast cancer diagnosis.5 However, there are still many outstanding challenges associated with decision analysis, such as difficulties determining a utility function or assessing outcome probabilities, and our limited ability to model human behavior.
2 See, e.g., Koller, D, and Friedman, N, Probabilistic Graphical Models: Principles and Techniques, 2009. Cambridge, MA: MIT Press; Miller, A.C., et al., Development of Automated Aids for Decision Analysis. 1976, Stanford Research Institute, Menlo Park, CA; Pearl, Judea, Probabilistic reasoning in intelligent systems: Networks of plausible inference. 1988, San Mateo, CA : Morgan Kaufmann Publishers; Pearl, Judea, Causality: Models, Reasoning, and Inference. 2nd Edition, 2009, New York: Cambridge University Press; and Shachter, R.D., Evaluating influence diagrams. Operations Research, 1986.34(November-December): 871-882.
3 Horvitz, E, Ruokanga, C, and Srinivas, S. A Decision-Theoretic Approach to the Display of Information for Time-Critical Decisions: The Vista Project, In Proceedings of SOAR-92, NASA/Johnson Space Center, Houston, TX, 1992.
4 Fishelson, M, and Geiger, D, Exact genetic linkage computations for general pedigrees, Bioinformatics. 2002. 18, p. s189-s198.
5 Beck, A.H., Sangoi, A.R., Leung, S., Marinelli, R.J., Nielsen, T.O., va de Vijver, M.J., West, R.B., va de Rijn, M., and Koller, D., Systematic analysis of breast cancer morphology uncovers stromal features associated with survival, Science Translational Medicine. 2011, 3(108):108-113.
Teamwork has become the strategy of choice when organizations are confronted with complex and difficult tasks.6 The study of human team performance has produced a considerable body of knowledge, and recent discoveries have important consequences for human-automation collaboration.7 Technology has had a strong impact on the structure and operation of teams. In particular, coordination tools (e.g., networked systems, bots) are advancing greatly.
In human teams, work is assigned according to a number of considerations, including particular individual competencies, quantity and quality of appropriate resources, time constraints, and availability. Thus, in a human team, someone with enhanced mathematical or statistical expertise would normally be assigned to tasks requiring this kind of knowledge and skill, but during the course of the activity, if some other aspect requires more aid, these people would shift to help. Similarly, if the mathematical or statistical workload rose too high, less qualified workers would assist, ideally doing lower-level assignments that match their abilities. The important point is that team members typically concentrate upon their areas of expertise, but the division of labor remains flexible.
There are benefits to rotating assignments among team members so that every team member experiences the others’ activities. This practice provides training that allows substitutions (albeit, not always perfect ones) when the situation demands. It also provides each team member with a deeper understanding of the requirements and difficulties of colleagues’ tasks. This familiarity enhances the communication and interaction among team members even when they are doing their primary tasks (Nikolaidis and Shah, 2013; Hollan, Hutchins, and Kirsh, 2000; Hutchins, 1995). Indeed, great teams tend to distinguish themselves by how well they manage soft interdependencies, i.e., emergent opportunities to offer and receive help that are not part of one’s explicit job duties (Johnson et al., 2014a).8
This rotation of assignments may or may not include the team leader. In some cases, the team leader’s understanding of external context might not be shared in any depth by the team’s member, and the leader may not have the technical skills to serve on the team. But if those conditions do not hold, and the work schedule can withstand some disruption, rotations involving the team leader can help the team better understand how its work feeds into the bigger context. Plus, the experience can broaden the team’s thinking because a temporary change in leadership can introduce new thoughts about priorities, processes, and relationships.
When automated systems are available, team benefits may accrue if the human team members occasionally take on the tasks of the nonhuman agents. For example, in flying an airplane, the automation might be turned on or off, depending on the overall workload. By deliberately not using the automation, other team members could learn what that component does and, moreover, remain practiced at performing that task in case the automated system fails. Team members may also attain a better understanding of what the automated system cannot do—for example, an airplane pilot might be able to visually spot potential sources of turbulence ahead and take early action, whereas the automated system would rely on different sensors and perhaps be delayed. In some situations, though, such as robot-assisted search in inhospitable
6 See Salas et al., 2008; Cooke et. al., 2012; Wildman et al, 2013.
7 See McKendrick et al., 2013; van Wissen et. al., 2012; Cuevas et al., 2007.
8 We thank an anonymous reviewer for the thoughts presented in this paragraph.
environments, human substitution might not be possible at all. In others, the person and machine could collaboratively guide the behavior through teleoperation. It would even be better if the automation did not have to be either on or off, but rather could be biased and guided. Humans can guide and teach the automation and, in turn, the automation can guide and teach humans. Ideally, the human team members will learn about the limitations of the automated system, including any points of failure, so as to develop a realistic sense of how much trust they can have in the automated assistance.
Recent research by Tausczik, et al. (2013), and of Woolley, et. al. (2010) works to elucidate the effectiveness of groups based on characterizations of their composition and of the functioning of ideal groups for problem solving.
Communication is critical, whether the teams are purely human or a mix of humans and machines. Quite often, when difficulties arise, they can be traced to insufficient or inappropriate communication, although a mismatch of skills does play a role (Bradshaw et al., 2013). Three major communication challenges are: (1) what information to convey to other teammates, (2) which teammates to communicate with about this (new) information, and (3) when to communicate. These questions need to be addressed for both human and machine members of a team, and when a communication traverses a human/computer interface, additional care is necessary to ensure that the receiver and sender share the same implicit assumptions about the information and that the receiver knows how to interpret the information.
In a fully cooperative team, all members communicate as needed. Of particular importance is communicating about the status of tasks they are doing or their own performance as limits are reached. Thus, when one set of team members starts to become overloaded, they are apt to signal this by stating that they might need some help, alerting other team members to look over their activities and to step in when required. Even when a member is not overloaded, they may be reaching the edge of their comfort zone, in terms of performing tasks with which they are less capable or for which the available information is inadequate. In those cases, adding another team member with different skills, or splitting the workload, may not address the problem. The most important mitigation might be for other team members to recognize that some additional uncertainty may be creeping into the overall process, so they can take steps such as slowing down, adding redundancy, or relying more on other members.
Many failures of automated systems come from a lack of communication of their activities. We see this in the crash of Asiana Airlines Flight 214 at the San Francisco airport on July 6, 2013. The airport’s vertical guidance service for instrument landings was not operative, so manual control of the glideslope was required. The lack of complete communication and awareness of the states of the airplane and of the automated equipment, coupled with the pilots’ understanding of these states, have turned out to be a factor in this incident.9
9 See The New York Times, June 25, 2014. P. A-11. “Flight crew missed multiple cues before San Francisco crash, board says.”
While all teamwork requires the establishment of appropriate levels of trust, collaboration between humans and machines raises the issue of human trust in the inanimate teammate.10 If people on the team are to rely and act on contributions and recommendations of automation, they will do so only if they have confidence that the teammate will make a positive contribution, plus some sense of how much, and in what ways, they can trust the automation. Such trust must be earned.
Operators’ lack of trust in automation—and the resulting possible disuse of data that it presents—limit the potential that technology offers. However, operators’ inappropriate excessive trust and the resulting automation misuse could lead to complacency and the failure to intervene when the technology fails or degrades (Cummings, Pina and Crandall), or has not been programmed for the appropriate circumstances (Parasuraman and Riley, 1997; Lee and Moray, 1994; Hoffman et al, 2013). A nuanced understanding is needed: for example, a complex decision might build on information from different searches (each with its own blind spots or ambiguities), different databases (with differing levels of quality), statistical inferences (with complex uncertainties), and simulations, which are only imperfect models of reality. Somehow, the team—and the ultimate decision-maker—must aggregate these inputs, taking into consideration the degree of confidence that each can contribute to the decision.
Several elements affect the development of an appropriate level of trust, i.e., trust calibration (Hoffman et al., 2012, 2013). The machine should perform reliably and predictably, measured in timeliness and accuracy of response. It should contribute information that is valuable to the decision-making process and deliver this information to the appropriate people or machine agents. Further, it is important that the people relying on automation understand the basis for the machine’s decision or recommended action. To do so, computer-based participants require algorithms and heuristics that are able to reason about the information’s importance and significance at a given time to a given individual, and are able to receive and display information about the basis for its recommended action. In addition, if a computer can teach or assist a human trainee so that the novice can perform at a higher level, the machine will have gained some trust. One important component of trust is observability: In the absence of appropriate observability (communication), people (or machines) may be unable to calibrate their trust appropriately—undertrusting competent human/machine behavior or overtrusting human/machine behavior—because the signals that would allow them to perceive problems are insufficiently salient or absent altogether.11
Consider two human-machine systems, each of which aims to provide perimeter monitoring around a building complex. Assume that imaging devices are mounted so that, in combination, they maintain a persistent view of the surrounding space. The simple system just records and displays images. Guards watch the images in one or more control rooms. These
10 Automation’s trust of humans bears consideration, but it was not discussed to any extent by the committee.
11 E.g., Hoffman, R. R., J. D. Lee, D. D. Woods, N. Shadbolt, J. Miller, and J.M. Bradshaw. The dynamics of trust in cyberdomains. IEEE Intelligent Systems (2009, Nov/Dec), pp. 5-11; Hoffman, R. R., Matthew Johnson, J.M. Bradshaw, and Al Underbrink. Trust in Automation. IEEE Intelligent Systems, January/February 2013, 28(1) 84-88.
humans are responsible for making all decisions about whether to act, when to act, and what action to take.
The second system includes teamwork between the machines and humans, which requires much more sophistication in the automation and, more importantly, a different design philosophy. In this imagined system, the automation would include image analysis, detection, and recognition algorithms or heuristics. It would alert the humans when it sees unusual changes. It might identify and characterize objects or creatures and communicate those observations. It might be programmed to take action on its own cognizance—to sound alarms, for example, or turn on lights in the area of suspicious activity.
Concurrently, the human operators would observe the same scene (with appropriate displays of the information gleaned by the sensors and perhaps some machine-generated suggestions or inference). They would work with the system and determine what trust they place in the automated teammates. They too might identify and characterize objects or creatures and communicate those observations. They might take action on their own cognizance—to sound alarms, for example, or turn on lights in the area of suspicious activity. They might question the automation’s conclusions and/or direct it to attend more carefully to particular aspects. The human operators might thus enhance and supplement automated actions, helping to continuously train the machines (if so designed), and/or contravene them. Perhaps they would come to trust that the system will always alert them to any suspicious activity. The number of humans on watch might be reduced, and they might discontinue their scrutiny of image displays because they know that the automation will perform reliable detection. But they might find that the analytic software often mischaracterizes entities, mistaking dogs for small unmanned ground vehicles or people, or failing to distinguish multiple trespassers from one. The humans on this team might have confidence that the automation can detect an intrusion, but not that it can identify the intruder. In this situation, the humans might restrict the authority of the automated teammate to nondestructive action. This scenario also points toward issues of assessment—for example, if the machine could accurately report the degree of confidence with which it has identified the interloper, the humans might give it more rights to act if those measures exceeded some threshold. This last item points toward the desirability of research into methods that evaluate the confidence level of a potential decision maker, and some such techniques might apply to humans as well as machines.12
In short, the team members would mutually observe, analyze, and decide upon the course of action, each using the perceptual skills and knowledge that they are best suited for. Furthermore, the level of human or automated involvement could be modified over time, depending on evolving levels of trust. A key observation is that in all of the scenarios the human is supervising the machine, although the level of supervision may diminish as the trust and understanding grow more nuanced.
The military as well as field intelligence and law enforcement teams operate with clearly specified rules of engagement. Decisions about what authority to delegate to an automated element are weighty. The committee considered the extreme end of the military context—whether any circumstance would warrant conferral upon a machine the ability to “pull a trigger” with a human “outside the loop.” The response to that issue depends heavily upon the degree of human trust in automation that has accumulated through observation of the machine’s behavior
in a variety of situations. The committee assumes that for the foreseeable future, these kinds of decisions will remain with humans. However, it also recognizes that decisions that fall short of “pulling the trigger” can also be dangerous—a machine could bias some of the contributing information in a way that leads a human to a decision they would not make if they had better ground truth—and is mindful that decision making that depends on human-machine teams can introduce risks.
Finding 4. Computer assists to human decision making will “come of age” when some of the computational elements are not simply assistive, but perform at a level that they are trusted as “near-peer” teammates in an integrated human-computer system. One of the key challenges of this integration will be the development of new techniques for test and evaluation that build trust between the human partner and the computational elements.
In the past decade, our understanding of human cognition has undergone major change. There is greater understanding of the interplay between the relatively slow, linear mental processes of consciousness and the rapid subconscious mechanisms that involve parallel processing. Progress on computational models of attention is providing new tools to design and test whether a system taps into these fast, parallel processes or overloads deliberative forms of cognition.13 Balancing fast parallel processes with executive processes that test for relevance is a critical part of the cognitive work of sensemaking, which is a critical aspect of analytics and for melding the capabilities of humans and machines. Sensemaking is especially important to the ability to critique or test results from machine partners.14
In addition, our understanding of human memory systems is undergoing rapid change. Human memory is a powerful pattern matcher, capable of finding information from prior experiences that are analogous to the current experience. This gives the human unparalleled ability to form new connections and to use related experiences successfully in new applications. However, this same powerful ability is also subject to numerous biases. For one thing, human memory is reconstructive. That is, what is recalled is not a precise compilation of prior experience, but rather a reconstruction based upon current conditions and expectations. This can cause difficulties that lead to error when the reconstruction does not in fact reflect an authentic statement of that prior experience. Worse, the reconstruction then is irreversibly retained along with the original experience. Each memory retrieval therefore impacts what is retained in memory (Oudiette et al., 2013). A danger in memory retrieval is that once a person finds what appears to be a match, they can become locked into that as a solution and therefore are unable to give fair assessment to other alternative possibilities.
A second aspect of human memory is that there are numerous subsystems that retain different kinds of information (e.g., semantic, declarative, episodic) and different temporal durations (e.g., working, or short-term memory; long-term memory). Working memory is
13 For example, see Itti, Laurent; Geraint Rees; and John K. Tsotsos, Neurobiology of Attention, Academic Press, 2005.
14 We thank an anonymous reviewer for contributing important points to this paragraph.
particularly susceptible to interference. It only holds a relatively small amount of information at any moment and is highly susceptible to interference by other events, such as intervening tasks.
The best cooperative systems will couple the powerful capabilities of human memory for (a) rapid pattern matching and (b) analogical, metaphorical extrapolation of past events to new situations, with the accuracy and completeness of the memories of computational systems.
Systems can enhance working memory by keeping an active display of all current information (properly grouped and displayed to make it easy to access relevant items without further increasing computational load). By ensuring that all items needed for the current decision are readily available working memory can be enhanced. If the person is multitasking, having each separate task display its relevant working memory set in a different, but well-marked, location has the potential to reduce the interference caused by multitasking and make it easier and faster for a person to recover situation awareness when switching among tasks. Note that the graphical display is critical: It must be designed with good, psychologically-derived, design principles to ensure minimal computational workload.
Pattern-matching memory can be enhanced by providing aids to recover specific stored information relevant to the person’s decision process. If a person thinks “this is just like situation Z” the ability for a computer to retrieve information about situation Z would minimize the distortion that might accompany a person’s memory reconstruction.
Similarly, the system might also provide other situations that it has determined relevant (much as a book-recommending system points out that the book being looked at is similar to specific other books). This would also lessen the risk of a person prematurely focusing on a similar (but different) early event. Note that computer systems are only partially successful at detecting true relevance, being subject to both misses and false alarms, but if the presentations are done well, the inaccuracies do no harm and might even help in encouraging the human operators to critically assess the suggestions rather than simply accept them blindly.
Even though we do not know the underlying architecture of human processing and decision making, there is considerable helpful observational evidence about the resulting behavior, to help us see which kinds of situations lend themselves to decisions being made rapidly and efficiently, situations that lead to poor decisions, and the strengths, weaknesses, and biases of the process.
A highly over-simplified model that helps put much of the behavioral observations in perspective simply asserts that conscious processes are relatively slow, serial, and limited in the amount of information that can be maintained in an active state, especially in relation to time-stressed decision making. Novel information is particularly difficult to maintain, and conscious attention is severely limited. Conscious processing has very limited computational resources available to it so much so that only a few different threads can be tracked at the same time. (Some theorists would argue that “few” is one, or perhaps two, if the two are related to one another.)
Subconscious processes are fast, efficient, and parallel, with multiple processes operating at the same time (in different cortical areas of the brain). They tend to do energy minimization, which is a kind of pattern-matching process. Well-learned, familiar patterns that are consistent with the information available are attracted quickly to stable configurations (attractors, in the language of dynamical systems). As a result, people can be very efficient when dealing with known situations: Give them a little bit of information and they settle into a stable solution. This
is the basis for many psychological phenomena, where people, objects, and even complex situations can be identified extremely rapidly, far before there is sufficient information to provide a reliable estimate.
But the rapid capture by familiar patterns is also a source of bias that can lead to erroneous decisions. The more overlap there is between the current situation and previous ones, the more likely the decision maker is to be trapped by an attractor that represents an earlier situation. Once there, it is very difficult to get out even when discrepant data arrive. Remember, there is often a superabundance of data, much of which is irrelevant: Sifting out the relevant from the irrelevant is difficult until some sort of working hypothesis is formed, but once the hypothesis exists (often because a stable configuration has been identified), anchoring can occur and discrepant data may be filtered out as irrelevant. Here is where the joint operation of humans and machines can have an advantage: When one system gets stuck in a local energy minimum, the other system can gently nudge it out of that state. Work on computational models of attention gives designers a mechanism to tap into these states and modulate the outcome. Research reveals, for example, that it is possible to track, measure and model human attention in real-time with relevant stimuli.15 This work is confirmed by human psychophysiological studies,16 and these models can be implemented in systems to help provide the “nudge” that a human observer might need to dislodge from prior expectations. As models are both improved by neuroscience studies and in turn used to improve performance of humans, we expect this area to be widely implemented in visual detection tasks. As the systems become engaged semantically, in addition to capturing visual features, they will approach the collaborative systems that we have been envisioning for complex tasks. These models should be consciously designed into the networked systems. Otherwise, the well-studied frailties of human judgment and decision making (e.g., Tversky and Kahneman, 1974), especially in the face of uncertain information, will continue to limit the quality of decision making.
But the rapid capture by familiar patterns is also a source of bias that can lead to erroneous decisions. The more overlap there is between the current situation and previous ones the more likely the decision maker is to be trapped by an attractor that represents an earlier situation. Once there, it is very difficult to get out even when discrepant data arrive. Remember, there is often a superabundance of data, much of which is irrelevant: Sifting out the relevant from the irrelevant is difficult until some sort of working hypothesis is formed, but once the hypothesis exists (often because a stable configuration has been identified), anchoring can occur and discrepant data may be filtered out as irrelevant. Here is where the joint operation of humans and machines can have an advantage: When one system gets stuck in a local energy minimum, the other system can gently nudge it out of that state. These models should be consciously designed into network systems. Otherwise, the well-studied frailties of human judgment and decision making (e.g., Tversky and Kahneman, 1974), especially in the face of uncertain information, will continue to limit the quality of decision making.
15 See Itti, Li and P. F. Baldi, A principled approach to detecting surprising events in video. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 631-637, June 2005.
16 So for example, Itti, Li and C. Koch, Computational modelling of visual attention, Nature Reviews Neuroscience, 2(3):194-203, March 2001.
Teams make errors, which arise from a range of sources, including social dynamics, time and resource constraints, inappropriate communication, and erroneous and/or incomplete data and/or thinking. In addition to the multitude of judgment errors that can arise from flawed data and faulty thinking or weak information processing on the part of an individual, other systemic challenges crop up—for example, the ability to discover errors.
Researchers have divided human errors into two broad classes: slips and mistakes (Norman, 1988, 2013; Reason, 1990; Woods and Branlat, 2010). A slip occurs when an intended action is not performed. A mistake occurs when the intention is wrong. Both types of errors transpire in the context of human decision making.
A slip is relatively easy to detect, because a comparison of the intended action with the actual one reveals a discrepancy. Mistakes are difficult to detect, because the actual actions match the intended ones, but the intention is wrong. People’s actions are consistent with their misguided intent and there is nothing to signal that it is the intention that is wrong. Because mistakes are difficult to detect, they are by far the more worrisome error.
Mistakes fall into three major classes: rule based, knowledge based, and memory lapse. In a rule-based mistake, the person has appropriately diagnosed the situation, but then decided upon an erroneous course of action by following the wrong rule. In a knowledge-based mistake, the problem is misdiagnosed because of erroneous or incomplete knowledge. Memory-lapse mistakes take place when forgetting occurs at the stages of goals, plans, and evaluation.
The decision theory perspective is a helpful way to think about how to deal with inevitable errors. In that context, the best action is a function of the current situation, the actions available now, the estimated probabilities of possible outcomes of the actions, and the estimated utilities of each outcome. Mistakes can be made at each point: not knowing what the current situation is; not recognizing all possible actions; missing some possible outcomes from an action (or their likelihood), and not knowing how good or bad a certain outcome will be. This complements the categorization of errors into a rule-based, knowledge-based, and memory-lapse taxonomy.17
Even when a mistake is the result of a faulty diagnosis of the situation, it can be surprisingly difficult to discern. One might expect that the actions would turn out to be ineffective, so the discrepancy would be noticed, leading to a reexamination of the diagnosis. But misdiagnoses are not random. Usually they rely on considerable knowledge and logic. The misdiagnosis is often logical and it might help eliminate observed symptoms, at least at first. As a result, the initial actions tend to be relevant and helpful. This situation makes the challenge of discovery even more difficult and can postpone it for hours or days. Mistakes caused by memory lapses are even more difficult to detect: The absence of something that should have been done is always more difficult to detect than the presence of something that should not have been done.
A major difficulty in discovering mistakes occurs because people tend to lock themselves into the solution, blinding themselves to alternative explanations. The mistaken hypothesis or intention is usually rational and, more often than not, appropriate. If it is not appropriate, many of the observed symptoms are still consistent with the mistaken interpretation. Moreover, inconsistent observations are easily explained away. Note that complex situations involve huge quantities of observations, many of which are irrelevant. Distinguishing signal from noise,
17 We thank an anonymous reviewer for the thoughts presented in this paragraph.
however, is often possible only after the nature of the signal has been determined. A working hypothesis can help decision makers sift noise from signal, but if the wrong hypothesis is being entertained, inappropriate sifting can occur. Improved methods to identify the sources of variability (or noise) that affect data quality and contribute to decision “correctness” might be useful.
Individuals and teams can also err by explaining away problems when they should not. Seldom does a major accident occur without a prior string of failures such as equipment malfunctions, unusual events, or a series of apparently unrelated breakdowns and errors. No single step has appeared serious, but by overlooking these precursors, a major disaster can brew. In many of these cases, the people involved noted and discounted each item, finding a logical explanation for the otherwise deviant observation.
To some extent, this practice is necessary. Many potentially suspicious things that teams could pay attention to would turn out to be false alarms or irrelevant minor events. At the other extreme, teams could ignore every apparent anomaly and rationally explain each one.
Because of their ability to store large bodies of precursor information and sift through it to find patterns, computers might be suited to assisting humans at identifying potentially problematic patterns. Machines could help focus attention on particular events that have proved problematic in a past case or which deviate too far from their normal range. Automation might be particularly helpful when large quantities of data are emerging within a short time frame. Improvements in normality modeling, which should help identify exceptional behavior in any particular context, could help humans identify activities of interest.
Another common error is that events can seem logical in hindsight. The contrast in our understanding before and after an event can be dramatic. The psychologist Baruch Fischhoff (1975) has studied explanations given in hindsight, where events seem obvious and predictable after the fact but had not been predicted beforehand.18 When Fischhoff presented people with a number of situations and asked them to forecast what would happen, they were correct only randomly. He then introduced the same situations along with the actual outcomes to another group of people, asking them to state how likely each outcome was. In that situation, the actual outcome appeared plausible and likely, and other outcomes were ranked as unlikely.
Foresight is difficult. During a complex situation, clear clues do not necessarily emerge. Many things are happening at once; workload, emotions, and stress levels are high. Many events will turn out to be irrelevant, while things that appear irrelevant will turn out to be important. Accident investigators, working with hindsight, focus on the pertinent information, but when the events were unfolding, the operators could not distinguish one from the other (see Woods and Branlat, 2010). Decision-makers who are sorting through large amounts of information and complex interplays of options can be faced with the same challenge.
18 A modern treatment of this issue is provided by Duncan Watts in his book Everything is Obvious: Once you Know the Answer. New York: Crown Business (2011). Available at www.everything is obvious.com.
Historically, engineers have tended to assign operations to either humans or machines depending on their capabilities (Christoffersen and Woods, 2004), or they have automated as much as possible, leaving leftover tasks to humans. In both scenarios, people are expected to take action when the automation ceases or fails. Furthermore, they often must enter data into computer systems in ways that are easiest for the machine to understand and interpret. As a result, precise, unambiguous, and numerical inputs dominate, which often need to be delivered in a repetitive manner. Humans must be attentive for long periods, mostly monitoring events that require no attention, yet ready to respond immediately and effectively to rare emergencies. Finally, people are asked to absorb and synthesize data that are not necessarily presented in a way that is optimally suited for the human brain.
This approach has long been viewed as problematic.19 It requires the more versatile and capable teammate, the human, to rescue the more limited machine, often with no advance notice. The human frequently must act rapidly, with little situation awareness. People are not good at responding quickly when they have been out of the loop. Moreover, we are not skilled at precision, repetition, or continued vigilance. Rather, people are versatile, adaptive, and attentive to a wide variety of events. Thus, instead of being matched to human strengths, the machine requirements are often matched to human weaknesses. People’s ability to cope under most circumstances masks the system’s fragility; as a result, failures are blamed on human error instead of inappropriate overall design. This is at odds with a basic tenet of high-reliability organizations: that systems and processes should be engineered to reduce the risk of errors—which are inevitable—and to be robust when errors do occur.
People and machines possess distinctive capabilities and frailties that are actually often complementary. That feature thus provides an opportunity for enhancing system performance by leveraging that complementary. Data-presentation choices, for example, might rely on cutting-edge knowledge about how the brain works, and software might organize otherwise overwhelming datasets. This process would include considerations about different ways to allocate tasks among humans and machines, and it would take into account how duties might change over time, depending on circumstances.
Finding 5. Humans and computation have different strengths in what they accomplish and there are several aspects of human decision making that can benefit from computer-aided systems, such as cognition, recognition of errors in judgment and task allocation. Similarly, there are several aspects of computer processing that can benefit from human guidance, such as prioritization, dealing with unusual or unexpected situations, understanding social and cultural context, and taking environmental and contextual information into account. The committee finds that the computational assists to human decision making are best when the human is thought of as a partner in solving problems and executing decision processes, where the strengths and benefits of machine and humans are treated as complementary co-systems.
In this view, the participants—humans and machines—might, at some point, share the load more evenly and take the lead on duties that naturally fit their respective capabilities. Cross-
19 The critique goes back at least far as Paul Fitts (1951).
training combined with awareness about work assignments could allow transitions in tasks to occur naturally and gracefully among team members. A human who is overloaded or incapacitated (perhaps from injury or sleep deprivation) might ask a machine to take over some lower-level work. An overloaded or incapacitated machine (perhaps due to some system failure) might alert people that it is reaching its limits. Humans and machines might hand control and authority back and forth.20
With this perspective, one aims to understand the potential for joint collaboration between computational systems and people and to determine the design criteria and strategies needed to ensure that this is a real collaboration, where each contributes their best strengths and where communication among team members, including between people and machines, are always in the appropriate language and interactive form. A key challenge is to make sure design honors the need to address human characteristics, as opposed to today’s interaction, which typically is dictated by the needs of the machine. That is, to design so that systems adjust or adapt to people rather than presuming people will adjust to them (which often is stressful or does not work).