The third workshop panel showcased cutting-edge work on the science of perception and cognition as it applies to the way individuals make sense of information. Moderator Barbara Dosher, University of California, Irvine, commented on the observation that intelligence analysts are faced with mounting volumes of data as a result of societal behaviors increasingly tied to digital enterprises. She pointed out that humans are decidedly limited in the amount of information they can remember and process. This panel, she said, would discuss some of people’s perceptual and cognitive limitations. It would also consider tools designed to help process large-scale datasets so they can be understood by users, as well as the role of the human in human–machine partnerships.
Edward Awh, University of Chicago, discussed the concepts of working memory and attention and how they can be measured using robust scientific techniques. He defined working memory as the online memory system—the things that are currently held “in mind” at any given point in time. Attention, he continued, involves selecting those parts of the environment that are given access to the mental workspace of working memory, adding that it is under some degree of voluntary control.
Awh then provided an example illustrating that people’s working memory and attention are both very limited. He showed a video in which someone disguised as a construction worker asks a passerby for directions on a university campus. The passerby begins giving the man directions, looking
right at him at times, and the two are then momentarily interrupted by other “workers” passing between them, carrying a large slab of wood. Behind the cover provided by this slab of wood, two of the “workers” switch places so that once the “workers” carrying the wood have passed, the direction giver is talking to an entirely different person wearing a shirt of a different color. In this study, according to Awh, only 30 percent of people who were giving directions noticed the change in the construction worker to whom they were speaking. He stated that this example demonstrates that “minds are very limited in the bandwidth of information that can be taken in at any given moment, and after that information is taken in, the number of thoughts that you can actually hold in mind at any given moment is much more limited than the average person’s intuition would suggest.”
Awh continued by observing that working memory and attention are highly correlated with each other and show a large degree of variability across individuals. He added that these abilities appear to be core components of intelligent behavior because they are linked to performance in real-life situations. He explained that the limits to working memory and attention are fairly definitive, stating that if a person exceeds either capacity, “things which are extremely salient and obvious in the environment may [be missed].”
Over the last two decades, Awh reported, robust measures of the capacity of working memory and attention have been developed. He described research on one measure of the capacity of working memory in which people were asked to remember a set of colors, and the number of colors they were able to keep in mind at one time was assessed. Results showed a broad distribution of abilities, with a range of zero to five and an average of three colors remembered. This measurement, Awh noted, was shown to be unchanging over time and fairly resistant to a strong practice effect. He added that studies have shown that as much as 30 to 40 percent of the observed variation in intelligence across individuals can be explained by this color memory test, clearly indicating that working memory is important for intelligent behavior.
Since the 1970s, Awh continued, advances have been made in the ability to relate brain activity to the cognitive operations of working memory and attention, including developments in animal neurophysiology and human neuroimaging techniques. Positron emission tomography (PET) is one such technique, which, he explained, allows scientists to see regions of the brain that have increased blood flow—an indirect measure of the brain’s neural activity. According to Awh, PET has helped scientists map the brain regions that participate in memory and attention. He added that more recently, such techniques as functional magnetic resonance imaging (fMRI), electroencephalography (EEG), and magnetoencephalography (MEG) have allowed scientists to see not just which brain areas are active during dif-
ferent tasks but also to look at the specific pattern of activity within these areas. Furthermore, he noted, EEG and MEG, used to examine electrical activity in the brain, can make it possible to view brain activity with excellent temporal resolution.
Awh then provided the example of a spatial working memory task during which neuroimaging techniques were used to study brain activity.1 In this study, he said, subjects with electrodes attached to their scalp were asked to focus on a stimulus at a certain position on a computer screen for 1.75 seconds, and then asked to remember the angle of that position after the screen went blank. He explained that an approach called pattern classification2 allowed the researchers to predict which screen position the subjects would remember by using the unique patterns of electrical activity associated with various screen locations. He pointed out that because of the high temporal resolution of this electrical approach, the brain activity data gathered can be plotted over time, providing a representation of not just what subjects are thinking about but also when they are thinking about it.
Awh explained that the ability to look at the current content of memory and attention with temporal resolution will have practical applications in building powerful brain–computer interfaces, such as robotic limbs that can be controlled by signals from electrodes implanted in the brain. These techniques may also be useful, he suggested, for assessing cognitive differences among individuals, which may be valuable for determining which individuals are better suited to certain tasks. He concluded by expressing his belief that as the research in this area moves forward, the new developments in neuroscience that allow tracking of working memory and attention using brain activity will be a powerful complement to behavioral measures.
Danielle Albers Szafir, University of Colorado Boulder, discussed the benefits and challenges of using visualization to support people in using data to solve problems of different scales. She explained that the growth in the number of people who want to leverage data for science and decision making has outpaced the growth in the number of people with formal training in statistics. According to Szafir, existing statistical tools are not always sufficient to provide users with a full understanding of datasets, given the increasing amount, variety, and complexity of data and the grow-
1 Foster, J.J., Sutterer, D.W., Serences, J.T., Vogel, E.K., and Awh, E. (2016). The topography of alpha-band activity tracks the content of spatial working memory. Journal of Neurophysiology, 115(1), 168–177.
2 “Pattern classification” is a neuroscience term of art for the computational algorithms developed to explain patterns of brain activity in relation to behavioral tasks.
ing complexity of the questions the data are asked to answer. She added that, although statistics can be powerful tools, people understand patterns, so visualizations can make data more intuitive, particularly for users who are not trained in statistics. “The way we can deal with challenges and scalability in our current data economy,” she said, “is to harness the power of the human visual system to help people find the right kinds of patterns in their data.”
Szafir described a ranking system, developed in the 1980s, that is still used in conventional visualization systems today.3 This system ranks the effectiveness of different aspects of the visual presentation of data, including position, length, orientation, area, lightness, and color. Szafir explained that the choice of the visual presentation can fundamentally affect the kinds of patterns people extract from the data, illustrating this point with research in which study participants were shown a simple bar graph with two bars, A and B, of different sizes.4 When asked to describe the data, participants generally said, “B is bigger than A,” but when the same data were plotted as a line graph, participants said, “the values are increasing.” According to Szafir, the results of this research demonstrate that the type of presentation affects the kinds of patterns people extract from visual displays.
However, Szafir qualified this point by noting that much of the experimental evidence used to support it was generated using low-level comparison tasks with only two data points. This, she observed, is a much different situation from the terabyte-plus-sized datasets available today. As the size of a dataset increases, she explained, questions can be asked on fundamentally different scales. As an example, she showed a map of the United States with superimposed circles of varying sizes indicating the poverty rate in various areas.5 She noted that questions could be asked about these data on two levels: the first being a low-level questions, such as a two-point comparison question (e.g., “How does poverty in Los Angeles compare with that in Phoenix?”), and the second being a high-level question involving data synthesis (e.g., “How does the poverty level in the southeast compare with that in the northwest?”). Effective visualizations of large datasets, emphasized Szafir, must support both low- and high-level questions about the data.
According to Szafir, much is known about the way different data presentations support low-level, two-point comparisons, but much less is known about visual displays for high-level tasks. She described four different categories of high-level tasks that visualization can address: (1) identify-
3 Cleveland, W.S., and McGill, R. (1984). Graphical perception: Theory, experimentation and application to the development of graphic methods. Journal of the American Statistical Association, 79(387), 531–554.
4 Zacks, J., and Tversky, B. (1999). Bars and lines: A study of graphic communication. Memory and Cognition, 27(6), 1073–1079.
5 Available: http://www.nytimes.com/newsgraphics/2014/01/05/poverty-map [February 2018].
ing interesting data points in a distribution, (2) summarizing the statistical properties of a data distribution, (3) segmenting like data elements, and (4) identifying emergent structures that occur across a set of data. She shared findings from a controlled study conducted to evaluate how different visualizations present data and support these four tasks.6 “This experiment,” she said, “revealed a fundamental dissociation between the scale of task that was performed and the visualizations that supported these tasks.” For example, she elaborated, visualizations that support high-level tasks, in which information is synthesized across many data points, may not allow for precise, low-level comparisons.
Szafir then discussed two strategies that could allow users to navigate data effectively across different scales—harnessing human vision and collaborating with computation. To illustrate harnessing human vision, she described a text analysis system in which 5.2 million books were analyzed to understand how written language has evolved since 1660. After attempting multiple methods of presenting this large dataset visually, the researchers found that the visualizations that were normally the least effective according to the ranking system described above did the best job of presenting data of the scale in this study.7 Szafir underscored the fact that scale matters in such cases because (1) scales change the questions that are asked; (2) scales change the visual presentations that work for the data; and (3) scales require multiple perspectives on the data to enable viewing both high-level and low-level information.
Turning to the strategy of collaborating with computers to help people analyze large-scale data, Szafir described an ongoing project that involves analyzing satellite imagery data to detect targets of interest, such as intercontinental ballistic missiles. In this project, the system allows the computer to query the analysts at important times in order to draw their attention to significant events or targets in the data, and also allows the analysts to integrate their interpretations and contextual information into the dataset. Szafir reported initial findings of the project showing that this type of human–machine collaboration can increase the accuracy of prediction by 40 percent and decrease prediction time by about 3 minutes.
In closing, Szafir posed several open questions that she believes should be addressed by future research. First, how can formal, quantified models be developed to enable building systems that can be optimized for and adapt to various analysis needs? Second, what should be done with imperfect data, and how can different visual presentations bias or change the interpretation
6 Albers, D., Correll, M., Gleicher, M., and Franconeri, S. (2014). Ensemble processing of color and shape: Beyond mean judgments. Journal of Vision, 14(10), 1056.
7 Szafir, D.A., Stuffer, D., Sohail, Y., and Gleicher, M.L. (2016). TextDNA: Visualizing word usage with configurable colorfields. Computer Graphics Forum, 35(3), 421–430.
of the data? Finally, what means can be used to avoid multiplying biases when machines and people are integrated in data analysis tasks?
Remco Chang, Tufts University, discussed how the physical limits of display technology and perceptual limits of humans can be understood to improve the data visualization pipeline—for example, by decreasing the wait time experienced during searches of large databases. He began by describing the three current components of analysis of large amounts of data: machine learning techniques, storage of data in large databases, and visualization tools used to present data and analyses in ways that help users understand the data. He explained that visualization systems (such as Tableau, Spotfire, and SAS Visual Analytics) have become useful commercial tools but are insufficient for larger amounts of data. He alluded to the problem of latency, or the wait time experienced by the user as the amount of data scales up, which interrupts the flow of analysis. A delay of even 500 milliseconds, he argued, has enough of an effect on short-term memory to change what an analyst finds.
Chang proposed as a solution to this problem the development of highly interactive systems that would enable high-throughput analyses with increased efficiency and accuracy. To create such systems, he asserted, the database and visualization components of analysis of large amounts of information should work in the opposite direction, starting from the display side. “Start with understanding what people can perceive and understand,” he said, “and [let that] drive the visualization.” The visualization, he added, will retrieve enough data to illustrate what the user can comprehend. And, he continued, the visualization can be “simplified” as long as the user cannot perceive the difference. He noted that the technique of simplification is commonly used in image compression, and that even with a 10-to-1 compression, images look the same to people from a distance. These techniques, argued Chang, can help cut down on latency.
Chang then pointed out that two related questions arise from the proposed techniques of retrieving limited data from the database and simplifying their visualization: (1) How will the data points be selected? and (2) How will what people can see be determined? He described a mathematical model, called Weber’s law, that captures the ability of people to discriminate among different stimuli. Weber’s law includes the concept of “just noticeable difference,” defined as the smallest change in the intensity of a stimulus that can be perceived. Chang described a 2010 study that found that human perception of correlation in a scatterplot also follows Weber’s
law.8 Chang’s group expanded on this finding in work conducted in 2014, testing the perception of correlation using other types of bivariable visualizations. Chang reported that the perception of correlation in all the tested visualizations could be modeled using Weber’s law,9 and described three exciting facets of this finding: (1) a model built around these results will be able to describe both what people see and do not see from the data, as well as predict what people will be able to see if they are presented with new data; (2) this model could be used to compare the effectiveness of different visualizations; and (3) this model could also be used to speed up computation through appropriate data sampling and approximation.
Chang also acknowledged, however, a limitation of this model in that it addresses only correlation, and not other kinds of visualized relationships. To resolve this limitation, Chang’s group recently began studying visualization metamers—things that are different but appear the same under certain conditions. Visualization metamers, Chang continued, are different visual presentations of data that lead the user to derive the same conclusion or make the same decision. So, he asked, “Can I come up with visualizations that are close enough that you can’t tell the difference, but still get the job done?” He explained that his focus thus far has been on one-dimensional visualizations, such as bar charts and line graphs. He noted that some of the properties required of the model have been identified and that these visualizations should be progressively refining so that a somewhat coarse visualization can be seen as soon as the data load, but the quality of the visualization will improve over time.
Chang concluded with some ideas about where research could be in 10 years. He argued that progress needs to be made in the science of perception to enable better understanding of what people can and cannot see when they perceive data. He also suggested that data systems engineering needs to improve so that it is based on scientific principles of cognition and perception, and that data systems should be able to predict what people can and cannot see given the visualization of the dataset. He argued as well that perceptually driven computation should be able to highlight aspects of the data that the model predicts an analyst will miss. These advances, he said, will require collaboration between the fields of psychology, cognitive science, and computer science and the community of intelligence analysts.
8 Rensink, R.A., and Baldridge, G. (2010). The perception of correlation in scatterplots. Computer Graphics Forum, 29(3), 1203–1210.
9 Harrison, L., Yang, F., Franconeri, S., and Chang, R. (2014). Ranking visualizations of correlation using Weber’s law. IEEE Transactions on Visualization and Computer Graphics, 20(12), 1943–1952. Available: http://visualthinking.psych.northwestern.edu/publications/Harrison-weberlaw-infovis2014.pdf [April 2018].
Peter Pirolli, Institute for Human & Machine Cognition, discussed multilevel models that can be used to describe sensemaking tasks and how such models can promote effective collaboration between humans and artificial intelligence (AI). Sensemaking tasks, he explained, involve a problem to be solved or a decision to be addressed through the processing of data. He added that the utility of the decision or solution changes with the amount of knowledge that is extracted from the data. “The problem,” he said, “is that you’ve got massive amounts of data that you’ve got to process in order to get that knowledge.” To deal with the enormous amounts of data available and come to a decision, he asserted, technologies are clearly needed to augment and accelerate the human sensemaking process.
In Pirolli’s view, a multilevel model of the human sensemaking process is necessary to enable the development of optimally performing human–AI interaction systems. Such a model, he said, would account for different types of processes (biological, psychological, rational, and social) that occur at different time scales. For example, he observed, neuropsychological and biological processes take 10 to 100 milliseconds; psychological processes (goals, memories, and motivations) take seconds to tens of seconds; and rational processes (incentive mechanisms) and social processes (communication and collaboration) take much longer. He explained that this kind of multilevel model is important because events at either end of the time scale can percolate up or down the scale, affecting processes at other levels, adding that interventions can take place at any one of these levels and subsequently impact higher or lower levels.
Cognitive models can be used to understand how people find information using visual displays, Pirolli continued, and to predict how new kinds of visual displays will affect a user’s ability to seek and find information. He described an example in which study subjects were asked to learn about a given topic. The subjects were first given a pretest to assess what they already knew about the topic, and were then allowed time to do research and produce a simple report from the information they had learned. In a post hoc analysis, a computational cognitive model traced their actions and the displays they were viewing so it could later predict what they would learn from their research. The model used in the study, Pirolli explained, can be used by researchers to understand how visual systems should be arranged so they can maximize the knowledge users gain from them.
Pirolli then discussed trust, or the credibility of information sources, as another important component of successful human–AI interactions that can be studied using a multilevel cognitive model. He described work performed to understand how people form credibility judgments about different infor-
mation sources and how their level of trust influences their subsequent decision making.10,11 Taking the computational cognitive modeling approach, he said, the researchers were able to predict, with fairly good accuracy, the level of credibility people would attribute to an information source and to develop a novel credibility ranking algorithm that could predict credibility assessments.
Pirolli next summarized the benefits of multilevel models of the human sensemaking process: they allow researchers to predict how difficult it will be for a person to find certain information, how much a person will learn from using a particular system, whether people will be biased in their information searches and sensemaking, and what kinds of credibility judgments people will make about information sources. He also identified as a major challenge going forward establishing successful human–AI interdependence. The emerging standard model of cognitive processing, he said, “is about a fairly constrained, reasonably well-defined set of tasks and domains,” but “the world we live in is not well defined, it is very open ended.” He suggested, however, that research has shed light on some of the necessary aspects of successful human–AI collaboration, and cited three such factors: (1) the observability of one system to another, or transparency; (2) the ability of one system to predict what the other will do; and (3) the ability of one system to direct the other to perform a task.12 Regarding the first factor, he noted that current AI systems are extremely difficult to understand, even when good visualizations are available. This influences trust, he asserted, since people tend to trust what they understand; thus, he emphasized the need to improve the transparency of the machine for the end user. Regarding directability, he stated that “we are nowhere near having very easily done interactive tasks between a human and AI system where I can direct it to do tasks that are reasonably complicated.”
Pirolli predicted that in 3 to 5 years, there will be better understanding of how to create and use multilevel models, explainable AI for visual analytics and simulated drone operations, and interactive task learning for simple robots with well-defined tasks. Looking 10 years out, he highlighted the
10 Canini, K., Suh, B., and Pirolli, P.L. (2011). Finding credible information sources in social networks based on content and social structure. IEEE International Conference on Privacy, Security, Risk, and Trust, and IEEE International Conference on Social Computing. Available: http://www.parc.com/content/attachments/finding-credible-information-preprint.pdf [April 2018].
11 Liao, Q.V., Pirolli, P., and Fu, W.-T. (2012). An ACT-R model of credibility judgment of micro-blogging web pages. Proceedings of the International Conference on Cognitive Modeling, 103–108. Available: https://pdfs.semanticscholar.org/6370/68e384b6f2d383c392b79f93b6bbf9f15896.pdf [April 2018].
12 Johnson, M., Bradshaw, J.M., Feltovich, P.J., Jonker, C.M., van Riemsdijk, M.B., and Sierhuis, M. (2014). Coactive design: Support for interdependence in joint activity. Journal of Human–Robot Interaction, 3(1), 43–69.
need for foundational science around human–AI collaboration that would take into account continuous human-in-the-loop collaboration, dynamically changing tasks, and co-adaptation of the human and AI components. He concluded by emphasizing that analyzing and modeling human–AI collaborations at multiple levels of detail will have applications both within and beyond the Intelligence Community (IC).
Following the presentations summarized above, panelists participated in a discussion and responded to audience questions. Sallie Keller, Virginia Polytechnic Institute and State University, asked whether making AI explainable might limit its development by forcing it to be too simplistic. Pirolli replied that the user must have some degree of knowledge about how the AI works in order to trust it, but that such knowledge can be limited to a “common ground interface” in which the user appreciates the capabilities of the machine and when tasks should or should not be handed off to it. Chang added that understanding how something works is not a prerequisite for trusting it enough to use it, saying, “I don’t really know how my car works, but I can drive.” He suggested that if AI is built to be very intuitive and if it is accurate, trust will develop such that over time, people will trust the technology enough that they will no longer be concerned about how it works.
Jeremy Wolfe, Harvard Medical School, asked whether visualization methods will reach the point at which they can help general users with cases of multiple interacting variables, as opposed to simpler two- or three-way interactions. Szafir responded that the average person will have difficulty perceiving significant interactions in high-dimensional graphs unless the correlations are extremely strong. However, she added, dimensionality reduction and other multidimensional scaling techniques are available to support users’ exploratory interaction with the data. She argued that visualization developers should ensure that these techniques and the results generated with them are interpretable by the average user, which in turn requires expressing statistical methodologies in ways that are intuitive and anchored to the data. Chang added that the burden of interpreting data should not be restricted to the visualization tool; the human user will have a role, and individual differences in abilities should be considered. He referred to a study on whether visualization tools could help users understand Bayesian or conditional probability. When participants were divided into groups of high and low spatial ability, he said, the tools could be improved using data from interactions of the group with high spatial ability, whereas it was difficult to use combined data derived from both groups to improve on the effectiveness of the tools.
Gary Klein, MacroCognition LLC, asked whether there are other traits, in addition to those of working memory capacity, attention, and spatial ability, that differ among individuals and are relatively stable as people gain experience. Chang replied that it is difficult to determine a core set of stable traits. He gave the example of a trait he has studied known as locus of control,13 which has been associated with differences in how people explore or search for data. However, he said, the trait has been found to be manipulable such that some priming can change an individual’s natural search strategy.
Another participant asked the panelists whether they had experienced resistance among research subjects to participating in research that involves observing and dissecting cognitive processes. Awh responded that such resistance is sometimes seen, but the tasks subjects are asked to perform generally are not personal or invasive, which mitigates resistance. He suggested further that technology to monitor brain activity will continue to improve and become less invasive and more widespread, normalizing these kinds of observations and making them easier to obtain.
Fran Moore, CENTRA Technology, Inc., commented on the idea that high-level skills of working memory and spatial reasoning will be useful for some human–machine data analysis. She cautioned that these skills may need to be balanced with other types of skills given the range of capabilities needed across the analytic workforce. She encouraged the panelists to think about how the science could advance understanding of how different people think and interface with other people, as well as with machines and visualization tools, and what range of skills is needed to do the work of intelligence analysis.
The panelists concurred with respect to the value of a broad range of skills. In addition, their responses called attention to the importance of experience and developed expertise. Expertise and long-term memories, argued Awh, can speed identification of the most relevant aspects of a situation and compensate for any limitations in working memory. Pirolli agreed, noting, for example, that younger analysts may on average score higher than their more experienced counterparts on cognitive measures of working memory and spatial reasoning, but experienced analysts can often work more efficiently with the knowledge they have accumulated. He suggested that the IC focus on training and practices to support gaining expertise as quickly as possible.
13 “Locus of control,” according to Chang, is a term denoting a measure of whether people think they are in control of the events around them or things just happen to them.
This page intentionally left blank.