The previous chapter looked at several of the human elements that bear on decision making. This chapter examines several technological areas that play a role in collaborative human-machine decision making and appear to have promise for enabling advances in human-machine collaboration. The chapter ends with a brief discussion of metrics that can help assess human-machine collaboration for decision making.
While there are extensive studies of human teamwork in varied contexts,1 further studies of the characteristics of successful decision-aiding automation in the context of hybrid human-automation teams are warranted.2
Recent work on the foundations of team cognition helps to fill the need for further empirical studies of team performance that can elicit key attributes for the design of decision-aiding automation. The dominant perspective in psychology on team cognition is shared cognition, which assumes as its basic construct that some form of mental model is shared among individual team members.3 This has recently been critiqued as inadequate to explain decision-making performance in large, spatially distributed teams because in such settings, individuals can hold only partial views of the situation. Thus, for spatially distributed decision making, coordination across collections of partial knowledge is key. In fact, it has been argued that team cognition is grounded in the interactions among team members rather than in their shared knowledge structures (Cooke et al., 2013). This appears to be a promising direction.
Many approaches to designing team-like cooperation between humans and machines have been proposed, including adaptive supervisory control, adaptive automation, dynamic task
1 See, for example, Cummings et al., 2010; McKendrick et al., 2013; Salas et al., 2008; Dekker and Woods, 2002; de Winter and Dodou, 2014; Pritchett, Kim, and Feigh, 2014; Jarrasse, Sanguineti, and Burdet, 2014; Woods and Branlat, 2010; Cuevas et al., 2007.
2 The Human Factors and Ergonomics Society has dedicated its fifth annual contest in 2014 for the best paper on “human factors/ergonomics research that pertains to effective and satisfying interaction between humans and automation” (http://www.hfes.org/web/pubpages/hfprize.html).
3 See, for example, S. Fiore and J. Schooler. Process mapping and shared cognition: Teamwork and the development of shared problem models. In Team Cognition: Understanding the Factors that Drive Process and Performance, E. Salas and S. Fiore, eds., American Psychological Association, 2004. See, also E. Entin and D. Serfaty. Adaptive team coordination. Human Factors, 41, 1999. S. Fiore, E. Salas, and J. Cannon-Bowers. Group dynamics and shared mental model development. In How people evaluate others in organizations: Person perception and interpersonal judgment in industrial/organizational psychology, M. London, ed. Lawrence Erlbaum Associates, 2001. R. Hoeft, J. Kochan, and F. Jentsch. Automated team members in the cockpit: Myth or reality. In Advances in Human Performance and Cognitive Engineering Research, A. Schulz and L. Parker, eds.. Elsevier Science, 2006.
allocation, adjustable autonomy and mixed-initiative interaction. To underpin novel design requirements, researchers in cognitive systems and artificial intelligence have identified a number of general requirements for team-like interactions among humans and automation (e.g., Christoffersen and Woods, 2004; Klein et al., 2004; Johnson, 2014b; Bradshaw et al., 2013). Of particular relevance to this report are the concepts of (a) mutual predictability of teammates, (b) establishment and maintenance of common ground, and (c) ability to redirect and adapt to one another. The discussion on coordination in joint activity follows Klein et al. (2004).
Mutual Predictability (Klein et al., 2004): To be a team player, an intelligent agent—like a human—must be reasonably predictable and reasonably able to predict others’ actions (Sycara and Lewis, 2004). It should act neither capriciously nor unobservably, and it should be able to observe and correctly predict its teammates’ future behavior. One risk of making automation more adaptable is that it might make its behavior less predictable. To make actions sufficiently predictable, targets, states, capacities, intentions, changes, and upcoming actions should be obvious to the people and automation components that supervise and coordinate with them. Note that this requirement runs counter to the advice sometimes given to automation developers to create systems that are barely noticed.
Common Ground (Klein et al., 2004): Perhaps the most important basis for interpredictability is common ground (Clark and Brennan, 1991), which refers to the pertinent mutual knowledge, mutual beliefs, and mutual assumptions that support interdependent actions in a joint activity. Common ground refers to the process of communicating, testing, updating, tailoring, and repairing mutual understandings and permits people to use abbreviated forms of communication, such as head-nods (or an automation analogy) and still be reasonably confident that potentially ambiguous messages and signals will be understood. It also includes what parties know about each other prior to engagement—for example, the others’ background and training, habits, and ways of working.
Directability and Mutual Adaptation (Klein et al., 2004): Directability refers to deliberate attempts to modify the actions of the other partners as conditions and priorities change. For example, as part of maintaining common ground during coordinated activity, and relying on mental models of each other, team members must expend effort to appreciate what each other needs to notice, within the context of the task and the current situation. It pushes the limits of technology to get the automation to communicate even close to fluently as if it were part of a well-coordinated human team working in an open, visible environment. The automation will have to signal when it is having trouble and when it is taking extreme action or moving toward the extreme end of its range of authority. Such capabilities will require interesting relational judgments about agent activities: How does an agent tell when another team member is having trouble performing a function but has not yet failed? How and when does automation effectively reveal or communicate that it is moving toward its limit of capability? (Christoffersen and Woods, 2004).
The major computational models of collaboration developed by researchers in multiagent systems4 all treat teams as more than a collection of individuals and collaborative activities as
4 See, for example, Levesque, Cohen, and Nunes, 1990; Grosz and Kraus,1996; Kinny et al., 1992.
more than the summation of individual activities, and to varying extents have specific computational mechanisms to capture the above-mentioned requirements. The formal specifications in these models include commitments by team members to the team activity and each other’s actions, requirements for communication to ensure team members are in sync and aware of the state of each other’s activities, and requirements or mechanisms for reasoning about the skills of potential team members and allocating (and possibly reallocating) tasks among team members.
Other relevant work includes efforts to provide people with new tools and platforms that enable them to solve problems jointly and to tap into larger crowds of people and their intellect. Many relevant studies have been done in the Computer-Supported Collaborative Work (CSCW) community, including work to develop tools that allow multiple problem solvers to participate in problem solving. (See, e.g., H. Zhang, et al. Human Computation Tasks with Global Constraints, CHI 2012, Austin, TX, May 2012. http://dl.acm.org/citation.cfm?id=2207708.)
Other efforts and examples with importance for the topics discussed include work to develop more flexible representations of the degree of autonomy that machines have in hybrid human-computer systems. An example may be found in Scerri, et al. (Paul Scerri, et al. Towards Adjustable Autonomy for the Real World (2003). Journal of Artificial Intelligence Research 1 (2003) 2-50. http://www.cs.cmu.edu/~pscerri/papers/JAIR-AA.pdf). 5
The previous discussion focused on the cognitive expectations that humans have when working in human or mixed human-computer teams. In addition, humans expect their counterparts to be able to work—that is, to function properly—and to be flexible. Yet automated systems have tended to be brittle. They are rigid and, when overloaded, they break down suddenly, often without warning (e.g., see Smith, McCoy, and Layton, 1997; Bass, 2013); alternatively, they do not completely stop working, but they lack the flexibility to catch up with ongoing activities. Brittleness poses multiple problems. For example, in aviation, automated systems fail when the demands upon them become too high—ironically, when they are needed the most.
Human performance, in contrast, tends to degrade gradually. It slowly deteriorates and maintains partial effectiveness.
Brittleness undermines a desired feature of collaborative work: the ability of members to adapt to teammates’ changing capacities or behavior. If teammates know what others are doing and when they might be reaching their cognitive or system limits, they might anticipate when assistance is needed. Toward this end, enhanced self-awareness and knowledge by the automated systems could help combat brittleness and its negative effects in decision making.
People often are blamed for problems that arise from the brittleness of the systems they operate. Furthermore, routine and reliability are often emphasized and promoted among human team members—yet flexibility is often a hallmark of successful troubleshooting.
Potential approaches for addressing these challenges come through multiple avenues, including the development of “resilient systems” (e.g., Hollnagel, Woods, and Leveson, 2006)
5 We thank an anonymous reviewer for the thoughts in this paragraph and the one that precedes it.
through so-called resilience engineering. Typically, an adverse event triggers investigations into what went wrong. Fewer efforts, however, tend to probe what goes right most of the time (under normal conditions) or after a positive outcome to surprise circumstances. As stated in a recent document from the National Academies, “resilience engineering focuses on the story of the accident that never happened.”6 Information that grows out of such explorations might reveal adaptive interactions or practices that “routinely produce safe and reliable performances in the presence of hazards and opportunities for failure” (ibid.). Successes as well as failures offer constructive lessons.
Resilience engineering places value on a system’s ability to monitor and recognize variable and/or unforeseen events and behavior, respond, and gain knowledge from the experience. The power of this strategy stems from the observation that unexpected conditions, in the context of complex decision making, are normal and can be expected. The ability to foresee challenges and adjust accordingly increases performance quality. Unpredictability is an inherent part of any complex system and task.
Many of the powerful advances today in computational vision, language processing and translation, and reasoning rely upon statistical and sub-symbolic techniques, such as neural networks, Bayesian networks, and other machine-learning approaches instead of techniques based on deterministic logic. Because they depend on statistics and probability rather than rigid rules, these systems are frequently less brittle than systems that are purely rule- or logic-based. Another source of flexibility for some of the newer algorithms comes from incorporating self-learning instead of relying on hand-crafted rules. For instance, they can read reports and update themselves rather than relying on a static set of input data.
Although such systems can be extremely effective, they lack deep understanding of the domain to which they are applied; they can make inferences, but inferences have a non-zero chance of being wrong. People bring a variety of contextual information to bear on interpretation of data, deriving meaning that extends far beyond the raw data. Computer systems remain limited in their ability to tie individual pieces of information together and connect them with prior experience. Fundamental research breakthroughs in various subfields of artificial intelligence (including natural-language processing, automated reasoning, and probabilistic inference) are needed to increase computers’ abilities to reason effectively with contextual information. For instance, such sophisticated systems as Watson and Siri interpret each single utterance in isolation. Some of the funniest errors Watson made during its Jeopardy appearance occurred when it did not bring the context into account.7
Several commercial and academic tools are available for automatic speech recognition (ASR), all employing some variant of statistical supervised learning. However, the performance of most systems is still relatively poor in non-laboratory environments, especially when the
6 See Ideas to Innovation: Stimulating Collaborations in the Application of Resilience Engineering to Healthcare. Meeting Summary. 2013. Government-University-Industry Research Roundtable, The National Academies. Available at http://sites.nationalacademies.org/PGA/uidp/PGA_055253.
7 Other promising results are starting to materialize as well. In 2012, researchers discovered that computers can identify cat faces, even when they are not directed to do so. After looking at millions of YouTube thumbnail frames, the 1000-machine network figured out—on its own—that something about cat faces is important. They accomplished this without human assistance or prelabeling of the images, based solely on patterns detected in the data. This capability mimicked, to some degree, humans’ expertise at recognizing patterns that matter to us. Available at http://www.wired.com/wiredscience/2012/06/google-x-neural-network/. Last accessed March 19, 2014.
language that the dialogue participants are using is not restricted. For example, Google Voice search services have an error rate of 17 percent; Carnegie Mellon University’s Sphinx system shows similar accuracy in laboratory settings but much worse in field settings (e.g., 68 percent for the “Let’s Go Public” project [Raux et al., 2005a]).8 Improved performance can be gained by using additional inputs, such as facial gestures.
Another element of natural language processing is sentence-level processing. The input to this component is a transcription produced by the ASR. The main purpose of sentence-level processing in dialogue systems is often to extract dialogue acts, which convey the action that the persons performed in its action of speaking.9 These dialogue acts provide higher level “building blocks.” Recent work also attempts to identify the emotional attitude of the human interlocutor (Forbes-Riley and Litman, 2011).10 Current approaches to extract dialogue acts include using a context-free grammar of dialogue acts, and inferring from that the current dialogue act. More advanced methods use Hidden Markov Models (HMMs), where the states are the dialogue acts and the text detected by the ASR is the observations. The HMM is then used to return the mostly likely dialogue act sequence. Other statistical machine learning methods have also been proposed, using keywords as features.
Scientists are approaching the communication goal in numerous ways. Some are attempting to create computers whose architecture mimics that of the nervous system. If successful, such “neuromorphic” computers might rely on as-yet-undiscovered knowledge about how individual neurons and the circuits they compose enable spontaneous learning, adaptability, and multisensory integration—and how the brain achieves its renowned plasticity. Mimicking some degree of that plasticity might one day help engineers build robust automation that can rewire itself, as the brain does, if a portion degrades or fails.11
Future systems could integrate the approaches mentioned above, combining large datasets, computational power, and statistical and subsymbolic processing with the deeper understanding provided by appropriate sets of concepts, representations of domain-specific knowledge, and symbolic reasoning. The resulting machines might well contribute more fully to decision making. The new field of “deep learning” may produce advances along these lines.
Data analytics is the loosely defined term for the set of capabilities that enables the winnowing and analysis of massive amounts of data and its presentation in a form that is interpretable by (usually) a human. It has become a key enabler in the path from data to decision. An important part of the final step—representing the information in a format that is readily and reliably interpretable—is often abetted by “visual analytics,” which is considered here to be part of data analytics.
8 Antoine Raux, Brian Langner, Dan Bohus, Alan W Black, and Maxine Eskenazi. Let’s go public! Taking a spoken dialog system to the real world. In Proc. of Interspeech 2005.
9 Stolcke, A. et al. . Dialogue act modeling for automatic tagging and recognition of conversational speech. Computational linguistics, 26(3):339–373, 2000.
10 Kate Forbes-Riley and Diane Litman. When does disengagement correlate with learning in spoken dialog computer tutoring? In Artificial Intelligence in Education, pages 81–89. Springer, 2011.
11 According to the 2008 National Academies study, Emerging Cognitive Neuroscience and Related Technologies, achieving the full vision of these goals with significant depth is still decades away.
Traditionally, data analytics focuses on using descriptive and predictive models created with statistics, operations research, and more recently, machine-learning techniques to gain insights from data. The insights are typically in the form of correlations and patterns such as association rules and groupings, discovered by posing open-ended queries, or optimal values based on some mathematical models, computed with predefined objectives. With the advent of big data, a current focus in analytics research is to address the unprecedented volume of data. Various computational algorithms and architectures (cloud, Hadoop, etc.) designed to accommodate ultralarge data volumes have emerged and gained commercial successes. But in addition to this, traditional methods of analysis must be adjusted to be efficient at large scales, or replaced by different algorithms that can scale to terabytes and beyond. Analytical methods for big data must be developed with a clear understanding of the reliability of the inferences being made because it is, if anything, easier to spot patterns and correlations when one has massive data, and some or many of these could be spurious (false positives).
Because the majority of today’s data are unstructured, another key focus of data analytics is information extraction—extracting useful information and features from raw data into structured and machine-readable formats suitable for analysis. The raw data can be in structured format—for example, spatial and temporal information in the form of GPS data for vehicles, or location information of mobile phone users. It can be in semistructured or unstructured text format, such as machine logs or tweets and blogs on social media. It can even be in multimedia formats such as images and videos from surveillance cameras. In the age of big data, automated information extraction technologies must be able to process large, complex, and dynamic datasets, and analyze them together with structured data stored in traditional relational databases, often in real time. It is also important to be able to handle both human-generated and machine-generated data. Much of the current research on information extraction and data-analysis methodologies focuses on data generated by humans, such as through social media. However, as more machines begin to communicate with other machines, the fastest growing and most pervasive segments of big data will be those generated by machines for machines, through websites, applications, servers, networks, and mobile devices.
A key issue that arises in such a setting is that the data will be accumulating not just in unprecedented volume but also with ferocious velocities, thereby making their storage infeasible. As such, data analysis will have to be performed dynamically, as the data stream through the processor, often in real time. This poses a fundamental algorithmic challenge, as conventional data-mining and machine-learning approaches often assume the availability of a large set of static data. Current research in data-stream processing, event detection, and online machine learning seeks to address these issues.
In addition, the data might not only be changing continuously, but they could also be sparse, scattered, and noisy. One approach to addressing this problem is to integrate the data with additional information; for example, incorporating domain information (e.g., meta-data annotations) or combining it with additional data sources to fill some of the gaps. In traditional, small-scale data analysis, humans had the luxury of examining raw data for keystroke errors, duplications, or obvious outliers, and then cleaning up the data set before analysis. Humans are good at such functions, whereas machines are not. However, with even moderately sized data sets, and certainly with massive data, this sort of manual inspection is not feasible, so algorithms are being devised to emulate such human capabilities so as to handle this pre-processing. Data that are generated with the purpose to deceive could also be embedded in the source. It is therefore important to develop intelligent analytics techniques that are able to detect deception
and misinformation within accurate real-world data. Researchers working on the development of human-machine decision making systems in such a context need access to data that are very heterogeneous and streaming in order to create appropriate methods; small-scale, controllable data sources are often qualitatively different.
The physical world itself is fast becoming a type of information system: Networked sensors are being embedded in devices ranging from mobile phones, smart energy meters, and cars to personal health monitoring devices and industrial machines that can sense, create, and communicate data about the state of the physical world. As most sensor data are monitoring some aspects of the physical world, “cyber-physical-aware” analytics algorithms that can leverage physical constraints (e.g., temporal, spatial) are useful for addressing some of the analytics challenges, and this has not been typical in past information technologies.
Data mining and machine learning discover historical patterns, associations, and relationships hidden in the data, but the interpretation of the discovered patterns to extract knowledge for decision making is done primarily by the human decision maker. As data-mining algorithms are enhanced by automated reasoning about the statistical relations discovered, intelligent decisions can be made with deep knowledge of risks, options, and consequences. Not all of this information can be derived from the data by the machines themselves. Humans, machines, and networks need to be intimately involved in the decision-making process, interactively and collaboratively.
In networked environments, decision-making processes are increasingly supported by technology. Orchestrating collaboration among humans and automation in scenarios that involve large numbers of participants and highly interconnected networks of people and machines brings challenges that do not apply to smaller teams. In particular, the scale shift in complexity, brought about by the many interdependencies across processes and activities, changes responses to key questions about what it means to be “in control.”
In networked decision making, the acts of gathering data as well as data analysis and comprehension can occur over a distributed network with many humans and automated agents adding data to the system, often nearly simultaneously. Furthermore, the disparate subteams might gather the data in different ways or it might exist in different modalities at different locations; funneling and transforming this collection into a single, uniform collection poses significant challenges.
As discussed above, a key limiting factor in most human-machine interactions is communication among people and machines. Machine-design strategies often expect people to be precise and unambiguous in the issuance of commands and information (although people’s skills in this area are weak), with limited information back to the people and, even then, often in forms understandable only by the technical elite. Even in relatively simple settings, studies of human performance illustrate how communication can break down between human and automation due to factors such as attention being misdirected or misfocused or goal conflicts being missed or misprioritized (Cuevas et al., 2007). Environments in which humans “control” a coupled collection of automated subsystems place increasing emphasis on complex cognitive functions such as goal monitoring, which enables shifts of goal priorities, and management of a complicated set of constraints.
Team cognition is a challenge for developers of networked human-automation systems. Shared understanding allows management of uncertainties that machines may have about humans’ goals and focus of attention, as well as uncertainty that humans have about automation’s plans and status. Regardless of the machine’s role, creating machine understanding of human intent and making the machine’s results intelligible to a human are problems to be addressed by any human-automation system. Conversely, finding ways to increase human confidence in a machine’s activities, with an appropriate degree of caution, is increasingly important as computers are doing more of the predecisional work. In time-sensitive settings or where the amount of incoming data is large, humans may not be able to work through the details of what the machine has done.
In complex, networked scenarios, the imperative for establishing and supporting team cognition results in a technological need for computer-based support that can promote collaborative processes and tasks. This need is further exacerbated and complicated by the fact that the machines translate data from the real world through sensors and computers that often must process or delay the raw data. Such issues are the primary reason that human decision makers are needed, particularly in networked environments, to resolve uncertainties that result.
Finally, transitions in authority and control in cooperative systems become crucial as authority and autonomy relationships shift. Roles adjust in line with the changing demands of situations and capabilities of the team members. As systems become multilayered, these facets become harder to identify and manage. New polycentric control architectures are being developed to dynamically manage and adapt these relationships across diverse but interdependent roles, organizations, processes, and activities (Woods and Branlat, 2010).
To illustrate by analogy some attributes of a collaborative human-machine activity that incorporates features of shared cognition, Flemisch et al. (2003) consider horseback riding, where the horse is an analogue of a powerful, intelligent automaton. In normal situations, the rider directs the horse at a high level, but the horse takes over the details of movement, including local path planning, navigation over or around obstacles, and so on. In cases of perceived danger, the horse alerts the rider (Norman, 2007).
Flemisch and colleagues (2003) have shown how this analogy can be applied to a person’s control of an automobile. Experienced horseback riders signal the horse about the degree of autonomy to be permitted. When in “tight-rein” mode, the rider exerts considerable control, even directing individual foot movements. In “loose-rein” mode, the horse is in charge, allowing the rider to relax, perhaps even to fall asleep, while the horse traverses a known trail or an easy one. These two modes are signaled by the tightness of the reins; in an automobile, similar options can be exercised by the degree of control the human exerts over the steering wheel or joystick.
A similar idea occurs in the design of the Segway, where the rider controls the vehicle speed and direction, but if the Segway determines the speed to be unsafe, it pushes back the control lever, causing the driver to lean backwards, which reduces the vehicle speed. Flemisch and associates use a similar scheme for their automobile, so if the vehicle is going too fast or approaching another car or an obstacle, the wheel-controlling device pushes back at the driver,
thus signaling the difficulty and also taking control. (The driver can force the vehicle to do the action anyway, just as a horseback rider can force a horse to do an action that it resists.)
In the horse-rider situation, members of the duo do not perform equivalent roles, but both contribute to the goal. Crucial information—although not every new piece of data that each member is gathering—flows from one to the other. Decision authority passes back and forth, yet choices about some aspects of the task remain firmly assigned to either the animal or the human. Together, the horse and rider arrive at their goal faster, more safely, and more accurately than if either one had tried to make the journey alone.
The committee found this work was quite innovative, and the interplay between a horse and rider is richer and more fluid than what can currently be attained between humans and computers. For that latter reason, the metaphor may be useful in pointing to possible extensions of human-computer interplay.
Other studies of shared control between a robot and a human performing interactive motor tasks illustrate that although there is considerable interest in design for such interactions, little work so far has achieved a deep understanding of the physical interaction issues or implemented even simple collaborative behaviors (Jarrasse, Sanguineti, and Burdet, 2014).
With the growth in research, development, and operational deployment of complex, networked systems, a need is emerging to judge whether a particular technology is adding value above and beyond a legacy system. It is often difficult to compare competing systems, because standardized performance metrics for the system or its operator or operators either do not exist or are flawed. In principle, one might want to know how much the technology or information system enhances human reasoning and understanding, how well and how rapidly it aids decision making, and how successful the decisions are.12
Many evaluation programs gather large sets of metrics, which often include traditional human factors such as reaction time, error rates, and so forth; such metrics, however, fail to capture the effectiveness of the human-system interaction and do not diagnose the cause of problems they expose. To gauge system effectiveness, vague and context-dependent mission performance characteristics, such as situation awareness and time to mission completion, are often collected. Although these attributes are important, it is not clear how they can equitably be compared across networked systems that involve different human-system interactions. An alternative strategy, in which massive amounts of data are collected without a clear evaluation focus at the time, might provide the raw material for standardized comparisons, but such a shotgun approach is expensive in terms of time and money.
Recent work has explored the development of metric classes for human interaction with automated systems. Much of the following discussion follows Cummings, Pina, and Donmez (2008). A metric class is defined as the set of metrics that quantify a certain aspect or component of a system. The rationale for defining metric classes stems from the assumption that particular metrics are mission specific, but metric classes might apply across different missions. Other efforts have probed robot-effectiveness metrics, human-robot interaction metrics, and
12 Note that this section is dealing with metrics about the performance of human-machine teams. It is not meant to address metrics for gauging the quality of decisions.
single human–multiple robot metric classes.13 Metric selection is inherently linked to the practitioner’s objectives and depends on the context and resources available, which reflects the inherent cost-benefit nature of such endeavors. Detailed discussions about selecting appropriate metrics through consideration of criteria such as experimental constraints, construct validity, statistical efficiency, and measurement-technique efficiency can be found elsewhere (Donmez and Cummings, 2009; Cummings and Donmez, 2013).
When humans and automation are working in complex, networked arrangements, it is essential to evaluate not only the performance of individual humans and machines but also the complex interactions among team members. Furthermore, one would like to know how these metrics relate to the overall system. Such a task poses numerous challenges, especially in situations where team members and the jobs they are executing are distributed in space and time.
Researchers (Pina, Donmez, and Cummings, 2008) have proposed five metrics classes with which to assess individual components as well as holistic systems:
- Mission Effectiveness – For example, key mission-performance parameters relating to the whole human-automation system.
- Autonomous Platform Behavior Efficiency – For example, usability, adequacy, autonomy, learnability, errors, user satisfaction, automation speed, accuracy and reliability, neglect time.
- Human Behavior Efficiency – Operators perform multiple tasks such as monitoring autonomous platform health and status, identifying critical exogenous events, and communicating with others as needed. How humans sequence and prioritize these multiple tasks provides valuable insights into system design effectiveness.
- Information processing efficiency (e.g., decision making)
- b. Attention allocation efficiency (e.g., scan patterns, prioritization)
- Human Behavior Precursors–The underlying cognitive processes that lead to specific operator behavior, as compared with the human behavior metric class that captures explicit behavior.
- Cognitive precursors (e.g., situation awareness, mental workload, emotional state)
- Physiological precursors (e.g., physical comfort, fatigue)
- Collaboration Metrics—That is, team-level metrics.
- Human-automation collaboration
- Automation-automation collaboration
- Human-human collaboration
The final class—collaboration metrics—addresses the degree to which the humans and automation are aware of one another and can adjust their behavior accordingly. As discussed above, effective collaborative teams are notable for their cohesion and flexibility. To achieve this state, it is not enough for people to understand their machine colleagues; machines should understand aspects of humans and their goals as well. Toward this end, machines need to model people in ways that capture their expectations, commands, and constraints and also be able to understand what people “say” (in whatever language—formal or natural—they are using). What does the human expect the computer to do? What is the human telling the computer to do? What
13 See, for example, D. R. Olsen and M. A. Goodrich, 2003; A. Steinfeld et al., 2006; and J. Crandall and M. L. Cummings, 2007.
are constraints on the human, such as fatigue and bias, that might affect the human’s behavior? Although machines that can “understand” humans in such ways are not typically found in current operational settings, relevant work is emerging from research laboratories. With increasing deployment of these features, automated parts of human-machine systems could modify their actions in response to human behavior and predicted states.
The human-automation collaboration metric revolves around measures of team cognition and trust. Evaluation of these parameters can inform system design requirements as well as the development of training material. Objective measurement of trust, a difficult task, is important when system reliability and a culture where different knowledge domains exist in distinct silos could create trust barriers.
In the automation-automation collaboration subclass, the quality and efficiency of the collaboration among the machines can be measured through metrics such as speed of data sharing and decision making among automated agents, quality of the system response to unexpected events, and the ability of the system to handle network disruptions.
The last collaboration metric subclass is human-human collaboration, also referred to as team collaboration. In networked settings, a human team necessarily works together to perform collaborative tasks, so performance should be measured at the holistic level rather than by aggregating team members’ individual performance (Cooke et al., 2004). Because team members must consistently exchange information, reconcile inconsistencies, and coordinate their actions, one way to measure holistic team performance is through human-human coordination, which includes written, oral, and gestural interactions.
Human-human coordination is generally assessed through communication analysis, which can include quantitative physical measures such as how long team members spend communicating, as well as more qualitative measures that focus on the communication content. In addition, the measures can focus on a single point in time or they can address dynamic features, such as patterns of communication. Measures of behavioral patterns such as communications and social networks are traditional metrics in team research (Entin and Entin, 2001; Morrow and Fischer, 2013).
In addition to measuring team coordination for the human-human metric subclass, assessing team cognition, which refers to the thoughts and knowledge of the team, can be valuable in evaluating team performance and identifying effective training and design interventions (Fiore and Schooler, 2004). As efficient team performance has been shown to be related to the degree to which team members agree on, or are aware of task, role, and problem characteristics (Fiore and Schooler, 2004), team mental models and team situation awareness should be considered.
Determining which and how many metrics to gather depends on many details of a given situation. Designing a solution can occur only in the context of a specific system.
The metrics discussed in this section are intended to measure the first-order effects of human-autonomous system interaction, but they do not assess the larger sociotechnical impact of a technology and potential derivative effects. For example, when a new automated decision support tool is introduced into financial trading services, how the introduction of such a tool could affect market trading patterns is generally not known. Such behavior emerges after some time, with potential subsequent problem that must be addressed by regulatory agencies after the fact. Other examples might include the use of drones, which saves lives of the attackers but can kill innocent people and negatively impact attitudes of the affected population. Similarly, decisions about what kind of car to buy, or how far to live from work, can affect climate change,
which is usually not consciously measured while such decisions are debated. What metrics can be used to capture the large-scale impacts of important decisions? We don’t yet have strong capabilities for predicting the sociotechnical impacts of decisions and of how they could be measured, nor of accurately predicting how a system might develop. This is an open area of research and one that deserves more focus.