Read "New Directions in Assessing Performance Potential of Individuals and Groups: Workshop Summary" at NAP.edu

Page 109 Cite

Suggested Citation:"7 A Way Forward." National Research Council. 2013. New Directions in Assessing Performance Potential of Individuals and Groups: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18427.

×

7

A Way Forward

During the workshop’s second day, Earl Hunt, an emeritus professor of psychology at the University of Washington in Seattle, presented “Assessing Cognitive Skills: Case History, Diagnosis, and Treatment Plan.” Although his focus was specifically on cognitive skills, rather than the broader range of traits of interest to the military, his talk captured a number of possible ways forward, as the Army seeks to improve its assessment and selection processes. He offered a useful way of thinking about the constraints and the challenges in that endeavor. Although Hunt’s presentation occurred in the second day’s panel on individual differences and predicting individual behavior, it has general relevance to how the Army might proceed to take the next leap forward in assessments, in light of the information shared at the workshop.

THE BORING BOX

When the French psychologist Alfred Binet developed the first broadly usable intelligence test in the early 1900s, Hunt said, what he really discovered was “drop in from the sky” testing. “He found that you can drop in from the sky and, out of context, ask a bunch of questions of somebody and get a reasonable—not perfect, but a reasonable—idea of their cognitive skills with very little cost.” Hunt attributed the phrase “drop in from the sky” to Robert Mislevy of the Educational Testing Service and the University of Maryland, College Park.

Page 110 Cite

Suggested Citation:"7 A Way Forward." National Research Council. 2013. New Directions in Assessing Performance Potential of Individuals and Groups: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18427.

×

Binet’s “drop in from the sky” tests, Hunt continued, could be done in a limited time period, and they reasonably assessed a significant portion, but not all, of an individual’s cognitive skills. A century of testing has refined Binet’s paradigm, he said, but it has not expanded the range of skills tapped by Binet’s test in any major way.

Hunt said that this testing approach, while very valuable, is limited by a conceptual problem that is inherent in the approach. In 1923, the psychologist Edwin Boring defined “intelligence” as “what the tests test,” which were those cognitive skills that could fit into the box of time allowed for the “drop in from the sky” testing, and thus, Hunt observed, came the term, “Boring’s box.” Although most psychologists will demur that this is not what they mean by “intelligence,” in practice this is exactly how it has been defined. A century of work has produced useful cognitive models for the skills that the tests evaluate—but those are only the behaviors that fit in the box, he emphasized.

As an aside, Hunt noted that Binet’s test was still a valuable achievement because there is a great deal of overlap between the cognitive skills required for life and the cognitive skills evaluated by the test. “It’s not perfect, but there is enough overlap that this information is quite useful.”

With respect to the current status of testing, Hunt said that tests for behaviors and attributes fit into Boring’s box, and almost a century of competent research has made these tests very good—within the limits of the box. He cited as an example the Armed Services Vocational Aptitude Battery (ASVAB), saying it is the distant heir to the Army Alpha test, which was created almost 100 years ago. The developers of ASVAB “have been competent people,” Hunt said. “They have known what they’re doing, and it is arrogant to consider that you are so much smarter than the people who went before, that you are going to make a great improvement.”

Bending the Box

According to Hunt, there are ways to bend Boring’s box a little to fit a few more things into it. The development of computer testing, he said, was one such example, as was adaptive testing and item response theory. To understand ways in which cognitive testing may be improved in the future, one can look for other ways to bend the box.

Perhaps the biggest way, he suggested, is to use new constructs of the types that were discussed in the workshop presentations by Michael Kane and Christopher Patrick (see Chapter 3). In particular, he pointed to working memory capacity and the ability to focus and control attention as offering very promising ways to bend the box. The relevant tests include dichotic stimulus paradigms, which stress the control of attention; N-back

Page 111 Cite

Suggested Citation:"7 A Way Forward." National Research Council. 2013. New Directions in Assessing Performance Potential of Individuals and Groups: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18427.

×

tests, which stress short-term memory functioning; and Stroop-like tests, which require a resolution of conflict between signals.

A second type of box bender is the dual cognition task. “For a long time,” Hunt said, “people have talked about two cognitive systems”: the fast gut response and the slower, more thoughtful response. The psychologist Jonathan Haidt (2006) has used the analogy of an elephant and its rider to describe how these two systems interact. The job of the rider, Hunt recounted, is to keep the elephant on task, but while the rider is rational and develops plans for how things should be done, the elephant is “pretty stupid.” For example, the elephant operates according to statistical associations in the environment. “The elephant says that crime is going up because you get a lot of reports on TV. It’s the rider who says, ‘No, crime is going down. Look at the statistics.’” The elephant is also prone to emotional reactions.

And what happens, Hunt continued, is that the elephant biases the rider. When conscious thought—the rider—comes up with various possible solutions to a cognitive task, the elephant can bias the rider to choosing one solution over the other (Kahneman, 2011).

Hunt discussed two particular areas where managing the elephant is important, and these, he suggested, offer the possibility for tests that bend the box. One of the areas is overcoming decision biases. Today, there is a very large body of literature on decisions and decision biases which is largely not reflected in current cognitive testing, Hunt said. A well-known example is the “Linda problem,” which requires a person to overcome the normal tendency to apply a representativeness heuristic to solving a problem (Tversky and Kahneman, 1982).

The second area is overcoming unconscious social biases. Blindspot, a recent book by Mahzarin Banaji and Tony Greenwald (2013), offers a number of examples of social situations in which bias affects the way people deal with other people. Racial prejudice is the obvious example, Hunt said. “People who don’t consider themselves racist, who will not endorse racist attitudes publicly, can be shown to have what’s best described as an elephant-style response to members of other races, and that will influence their behavior, although they are not aware of it.”

Hunt suggested two other ways of bending the box. One is to work with orientation skills. There are a number of jobs in which orientation skills are important, such as maintenance on Navy ships where people must work in confined spaces surrounded by live wires. Orientation skills are also important in firing artillery and in a number of other tasks. They can be evaluated using virtual environment techniques (Allahyar and Hunt, 2003).

The other approach is to measure processing speed, which has been claimed to be a very important factor in intelligence (Jensen, 2006). The

Page 112 Cite

Suggested Citation:"7 A Way Forward." National Research Council. 2013. New Directions in Assessing Performance Potential of Individuals and Groups: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18427.

×

idea is that simple reaction times can reveal something about the state of a person’s nervous system. However, Hunt said that he is very suspicious of this data. “I think that if you do the experiments the way they did, you will get the results they did, but there are some aspects [open to] interpretation, and also it is an open question whether this is a significant source of variation in young adult populations.”

In all of these efforts to bend the box and squeeze more in, a number of constraints must be kept in mind, Hunt said. First, the Boring box is full, so if something new goes in, something old must come out. The issue is not the bivariate correlation; it is the incremental validity (will the new method increase the predictive ability of an existing method of assessment?). Furthermore, many of these new methods, especially the working memory methods, are time hogs. They require time to familiarize the people taking the test with the equipment or the procedure.

A second issue is that the nature of some of these new tasks can actually change with practice. “This is why I’m suspicious of the reaction time data,” Hunt said, “because they allow very little time for practice. The result is that the factor structure of a reaction time task, including the working memory task, may change over practice and over days.” Hunt mentioned research studied by Bittner and colleagues (1986) that found that, while the within-day reliability of the tasks they were measuring was very high, the across-day reliability was not as high. “That doesn’t mean they’re invalid,” he said, “but it means that it would be harder to fit them into Boring’s box.”

Breaking the Box

The other approach to improving the measurement of cognitive skills significantly would be to move outside Boring’s box, Hunt said. There are certain abilities “that we have to find a way to evaluate if we are going to increase predictivity on the basis of cognition,” he said, adding that evaluating those abilities will almost certainly require breaking the box.

One example Hunt gave is people’s ability to take multiple perspectives—to not just jump to a conclusion but to realize that the problem can look very different when considered from different angles. This is important in a variety of areas. One is trouble-shooting mechanical problems. Another is dealing with social problems, which have been a particular problem when members of the military come in contact with indigenous populations, Hunt said. “You don’t necessarily have to agree with another person’s perspective on the problem, but you’d better know it.”

Another example of breaking the box is studying performance in groups and teams, as presented during this workshop by Tannenbaum,

Page 113 Cite

Suggested Citation:"7 A Way Forward." National Research Council. 2013. New Directions in Assessing Performance Potential of Individuals and Groups: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18427.

×

Woolley, and DeChurch (see Chapter 4). As an aside, Hunt noted that he distinguishes between a group and a team when characterizing performance. Much of the academic research is done on groups, meaning collections of people who have not met previously but who are assembled for the purpose of the research. Sometimes researchers will even have leadership studies in which they bring a group of people together who do not know each other and assign one of them to be the “leader.”

“That’s not the way the drill sergeant works,” Hunt said. “Military teams exist over time. Privates do not lead teams, but they can disrupt them. So the Army has a real need to be able to predict a person’s performance in a team.”

A third way to break the box would be to study how well people are able to organize and carry out plans. “Basically this is the ability to get up, organize yourself, order goals, often delay gratification,” he explained. The ability to delay gratification has been shown to be particularly important—its degree in children in preschool can predict performance on the Scholastic Aptitude Test (SAT) 10 to 12 years later (Duckworth et al., 2010). Setting goals and then meeting them is also a critically important trait.

“These skills can be measured,” Hunt said, “but not inside the box. I can’t think of any way to measure these cognitive skills within Boring’s box.”

How might they be measured? Hunt noted that relevant information is already physically obtainable in people’s electronic footprints: school records, credit transactions, Facebook friends, court records, and so on. Should these things be used? To that question, Hunt said there are various ethical and legal issues to consider, but it might be possible to get informed consent under certain situations.

Another approach would be to monitor recruit training carefully, including selected situational tests. This would provide a great deal of information and could probably improve classification, but it would also increase the expense of recruit training.

“So here’s my take-home message,” Hunt said, “Boring’s box has been cleaned out.” He offered an analogy with mining gold in California. “It was a very good idea in 1849. I point out that the only real fortune that came out of it was Levi Straus, and what he did was make miners’ trousers….He took a different perspective on the problem and remembered that the goal was to get rich, not to mine gold.”

Hunt predicted it will continue to be possible to make minor improvements by further research inside the box or by bending the box. But major advancements in assessment, he concluded in his written notes provided to all workshop attendees, “will have to move out of Boring’s box, to examine cognitive talents that simply cannot be revealed in a two or three

Page 114 Cite

Suggested Citation:"7 A Way Forward." National Research Council. 2013. New Directions in Assessing Performance Potential of Individuals and Groups: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18427.

×

hour testing session. Any such movement will be costly. However, something can be costly and still be cost-effective. The military cannot make the best use of recruits’ talents unless those talents are known.”

FINAL THOUGHTS

As the workshop’s final speaker, committee member Randall Engle revisited the major points of the invited presentations and offered his thoughts on emerging themes and the future of measuring human capabilities. Throughout his summary, Engle repeatedly reminded the workshop participants of the study sponsor’s perspective, as presented by Gerald Goodwin, of the U.S. Army Research Institute for the Behavioral and Social Sciences, who had noted that the statement of task for the larger study calls for the committee to make recommendations on basic research. Engle said he was thrilled to see this call for recommendations on basic research because the field seems to him to need fresh thinking about important concepts such as working memory, rather than just redoing the same constructs over and over again. Working memory, he explained, is actually many things, and if researchers just stop at measuring the same constructs that have been assumed to capture it, because they know how to make those measurements, then the field has basically died. By undertaking research at a more fundamental level, as Engle understands basic research, new ways of thinking about concepts such as working memory can emerge and move the field forward. And he believes that kind of research is important.

Engle also identified a recurring theme across many of the presentations: the emergence of an electronic and technological revolution in the way we communicate with each other. “Web-based testing is going to happen,” he predicted. “I think there are really exciting things about that, and some real dangerous things about it—dangerous in a very broad sense.” And while the workshop discussions had not delved deeply into those dangerous issues, he continued, Rodney Lowman’s presentation on ethics provided the participants with difficult questions to think about as modes and methods of assessments move forward. Engle also noted that Paul Sackett’s discussion of real-time faking in testing (see Chapter 5) presents significantly new implications for Web-based testing. Engle also validated Sackett’s concerns by describing his own research, which suggests working memory tests performed online using verbal measures are more susceptible to faking than spatial tasks because test-takers will write down the verbal cues rather than retaining them in memory. “Many people are just going to Web-based things” Engle said, “without really looking at the consequences, costs, and benefits of doing that. And I think

Page 115 Cite

Suggested Citation:"7 A Way Forward." National Research Council. 2013. New Directions in Assessing Performance Potential of Individuals and Groups: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18427.

×

that’s a really, really important issue to think about here. The faking, I think, is all part and parcel of that.”

Over the course of the workshop, Engle also noted a relationship between testing knowledge and learning. Fred Oswald (see Chapter 2) spoke extensively about declarative knowledge, but “he didn’t say anything about how one gets that declarative knowledge.” Engle noted that he found it curious that the idea of learning did not receive much attention during the workshop except in relation to group composition, especially during the final panel presentations (see Chapter 4). The workshop’s keynote speaker, Alina von Davier, also emphasized that valid assessments should reflect that people learn and use skills in collaborative ways (see Chapter 5). Cognitive abilities, Engle said, are important because they relate to learning, and the ability to learn, to acquire a lot of information appropriate to a situation quickly, is important to performance and improving performance. But learning had not been talked about much at the workshop, he observed.

In speaking about the utility of noncognitive skills in performance predictions, Engle indicated his understanding that (at least some time ago) research in personality factors demonstrated “almost no relationship between the noncognitive measures and task performance.” “They just were not very predictive,” he said. He then related a personal story about a meeting with Navy personnel, during which he inquired, “So why would the Navy keep using these noncognitive measures?” The answer he received captured much of the spirit of this workshop: “It’s because they predict attrition very well, and for every one percent of attrition that we can reduce in the Navy, we’re saving the American public about $10 million.” Engle recalled his reaction, “Okay, well, that’s an important one percent.” He continued, “So I think thinking about all of the variety of ways that these different assessments can become important is a big deal.”

Engle also noted an emerging theme of creating valid tests for administration across the population, especially concerning tests of personality. He noted that very different models may be required to understand both cognitive and noncognitive attributes of different groups. “These things are much more complicated than we would ever like to believe,” Engle admitted.

In the workshop’s final moments, committee chair Jack Stuster reiterated the challenge of predicting performance through conventional testing: “Tests are clearly analogs for the actual observed behaviors that you would prefer to have to inform your selection decisions. But practical issues greatly constrain the fidelity and, as a consequence, the validity of those predictions.” While admitting that practical issues may place certain constraints on the way forward, Engle concluded his summary

Page 116 Cite

Suggested Citation:"7 A Way Forward." National Research Council. 2013. New Directions in Assessing Performance Potential of Individuals and Groups: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18427.

×

with an encouraging word. Recalling Earl Hunt’s earlier presentation on Boring’s box, Engle said, “Most of the tests we have are inside that box, and it seems to me that a big part of what this workshop is about is sort of erasing the lines of that box and finding out, ‘Are there better ways that we can do these things?’ And I think there are.”

REFERENCES

Allahyar, M., and E. Hunt. (2003). The assessment of spatial orientation using virtual reality techniques. International Journal of Testing, 3(3):263-275.

Banaji, M., and A.G. Greenwald. (2013). Blindspot: Hidden Biases of Good People. New York: Delacorte.

Bittner, A.C., R.C. Carter, R.S. Kennedy, M.M. Harbeson, and M. Krause. (1986). Performance evaluation tests for environmental research (PETER): Evaluation of 114 measures. Perceptual and Motor Skills, 63(2):683-708.

Duckworth, A.L., E. Tsukayama, and H. May. (2010). Establishing causality using longitudinal hierarchical linear modeling: An illustration predicting achievement from self-control. Social Psychological and Personality Science, 1(4):311-317.

Haidt, J. (2006). The Happiness Hypothesis: Finding Modern Truth in Ancient Wisdom. New York: Basic Books.

Jensen, A.R. (2006). Clocking the Mind: Mental Chronometry and Individual Differences. Amsterdam, The Netherlands: Elsevier.

Kahneman, D. (2011). Thinking, Fast and Slow. New York: Farrar, Strauss, and Giroux.

Tversky, A., and D. Kahneman. (1982). Judgments of and by representativeness. In D. Kahneman, P. Slovic, and A. Tversky (Eds.), Judgment Under Uncertainty: Heuristics and Biases (pp. 84-100). Cambridge, UK: Cambridge University Press.

New Directions in Assessing Performance Potential of Individuals and Groups: Workshop Summary (2013)

Chapter: 7 A Way Forward

Welcome to OpenBook!

Get Email Updates