Looking to the Future
Over the two days of the workshop, a significant portion of the presentations plus a large percentage of the discussion focused on the future. Presenters and participants talked about the obstacles to field evaluation of techniques derived from the behavioral sciences and intended for use by the intelligence community, about general lessons from other fields about what it takes to implement field evaluations in a serious and comprehensive way, and about some of the particular implementation issues in the intelligence arena. The discussions were realistic about the obstacles but optimistic about the possibility of eventually developing a culture within the intelligence community in which field evaluation is accepted as a necessary and usual feature. The discussions also included a focus on the best path forward.
OBSTACLES TO FIELD EVALUATION
In one of the discussion periods, Neil Thomason commented that he had been struck by the difference in testing and evaluation between law enforcement and the intelligence community. Christian Meissner had identified many hundreds of research papers from the past several decades that applied to eyewitness identification, Thomason noted, while Thomason himself had been able to identify only six papers on the Analysis of Competing Hypotheses (ACH) from the same period. “It is just two totally different worlds,” he said. But why should this be, he asked. Why is it that when a technique or a device is developed for use by the intel-
ligence community, there is so little attempt to evaluate it in the field to see if it really works?
It is particularly puzzling, he said, in light of a comment by Steven Kleinman, who had suggested that one of the weaknesses of the American intelligence community is that it has too much money. Because so much money is thrown at intelligence work, he said, “there is a built-in assumption that if we don’t get it right, somebody else will.” If the HUMINT (human intelligence) groups don’t figure something out, then the SIGINT (signal intelligence) people will, and if SIGINT doesn’t get it, then IMINT (imagery intelligence) will. But why, Kleinman asked, hasn’t more of this money been used for field evaluation studies?
A number of the workshop presenters and participants spoke about various obstacles to field evaluation inside the intelligence community—obstacles they believe must be overcome if field evaluation of techniques and devices derived from the behavioral sciences is to become more common and accepted.
Lack of Appreciation of the Value of Field Evaluations
Perhaps the most basic obstacle is simply a lack of appreciation among many of those in the intelligence community for the value of objective field evaluations and how inaccurate informal “lessons learned” approaches to field evaluation can be. Paul Lehner of the MITRE Corporation made this point, for instance, when he noted that after the 9/11 attacks on the World Trade Center there was a great sense of urgency to develop new and better ways to gather and analyze intelligence information—but there was no corresponding urgency to evaluate the various approaches to determine what really works and what doesn’t.
David Mandel commented that this is simply not a way of thinking that the intelligence community is familiar with. People in the intelligence and defense communities are accustomed to investing in devices, like a voice stress analyzer, or techniques, such as ACH, but the idea of field evaluation as a deliverable is foreign to most of them. Mandel described conversations he had with a military research board in which he explained the idea of doing research on methods in order to determine their effectiveness. “The ideas had never been presented to the board,” he said. “They use ACH, but they had never heard of such a thing as research on the effectiveness of ACH.” The money was there, however, and once the leaders of the organization understood the value of the sort of research that Mandel does, he was given ample funding to pursue his studies.
One of the audience members, Hal Arkes of Ohio State University, made a similar point when he said that the lack of a scientific background among many of the staff of executive agencies is a serious problem. “If we
have recommendations that we think are scientifically valid or if there are tests done that show method A is better than method B, a big communication need is still at hand,” he said. “We have to convince the people who make the decisions that the recommendations that we make are scientific and therefore are based on things that are better than their intuition, or better than the anecdote that they heard last Thursday evening over a cocktail.”
A Sense of Urgency to Use Applications
A number of people throughout the meeting spoke about the pressures to use new devices and techniques once they become available because lives are at stake. For example, Anthony Veney spoke passionately about the people on the front lines in Iraq and Afghanistan who need help now to prevent the violence and killings that are going on. But, as other speakers noted, this sense of urgency can lead to pressure to use available tools before they are evaluated—and even to ignoring the results of evaluations if they disagree with the users’ conviction that the tools are useful.
Robert Fein described a relevant experience with polygraphs. The National Research Council had completed its study on polygraphs, which basically concluded that the machines have very limited usefulness for personnel security evaluations, and the findings were being presented in a briefing (National Research Council, 2003). It was obvious, Fein said, that a number of the audience members were becoming increasingly upset. “Finally, one gentleman raised his hand in some degree of agitation, got up and said, ‘Listen, the research suggests that psychological tests don’t work, the research suggests that background investigations don’t work, the research suggests interviews don’t work. If you take the polygraph away, we’ve got nothing.’” A year and a half later, Fein said, he attended a meeting of persons and organizations concerned with credibility assessment, at which one security agency after another described how they were still using polygraph testing for personnel security evaluations as often as ever. It seemed likely, Fein concluded, that the meticulously performed study by the National Research Council had had essentially no effect on how often polygraphs were used for personnel security.
The reason, suggested Susan Brandon, is that people want to have some method or device that they can use, and they are not likely to be willing to give up a tool that they perceive as useful and that is already in hand if there is nothing to replace it. This was probably the case, she said, when the U.S. Department of Defense (DoD) decided to stop using voice stress analysis–based technologies because the data showed that they were ineffective. The user community had thought they were useful, and when they were taken away, a vacuum was left. The users of these
technologies then looked around for replacement tools. The problem, Brandon said, is that the things that get sucked into this vacuum may be worse than what they were replacing. So those doing field evaluations must think carefully about what options they can offer the user community to replace a tool that is found ineffective.
Philip Rubin offered a similar thought. The people in the field often do not want to wait for further research and evaluation once a technology is available, he said, and “there are those out there that will exploit some of these gray areas and faults and will try to sell snake oil to us.” The question is, How to push back? How to prevent the use of technology that has not been validated, given the sense of urgency in the intelligence field? And how does one get people in the field to understand the importance of validation in the first place? These are major concerns, he said.
Some of the most intractable obstacles to performing field evaluations of intelligence methods are institutional biases. Because these can arise even when everyone is trying to do the right thing, such biases can be particularly difficult to overcome.
Paul Lehner began his talk with a story about field evaluation that illustrated how such biases can come into play. He had been involved in a study that evaluated how much analysts should rely on a certain type of information that they use fairly routinely. He and his colleagues had developed a simple method for retrospectively evaluating the accuracy and value of the information that the analysts were using, and they compared that retrospectively analyzed value with what the analysts had been told at the time about the value and accuracy of the information.
Their results indicated that the system being used to evaluate the information the analysts were getting was very inaccurate. Indeed, according to their study, information that was thought to be of less value was seen retrospectively as being substantially more accurate than information that had been labeled as having higher accuracy.
It was a small study, so it could not be definitive, but the important fact was that the study was easy to do and could have been repeated half a dozen times for probably less than a year of staff time, and then the results most likely would have been definitive. But that never got done.
The original sponsor who had championed the study had moved on to a new position. The new sponsor saw that the results ran counter to conventional wisdom and decided not to release the study until it had been reviewed. So the study was sent out for review—to the organization that created the particular sort of information that was the subject of the
study. This made sense, Lehner notes, since the people of that organization were the experts on the subject. But the senior expert in that organization did not believe the results and so never responded to the request for release. That made sense as well, Lehner said. “If I was that person, I would probably do the same thing. I would never say go ahead and release it, because clearly the results were wrong. Also, I would never send a formal reply recommending that the study not be released, because then I would be on record for suppressing a negative study.” So the smart thing to do was simply not to respond, which is what happened. As a result, the study was never published, and no one else ever got to see it.
This is a common way that things can go wrong with a field evaluation, Lehner said. He had experienced the same thing in slightly different versions three times in the previous six years.
What went wrong? A number of factors combine to produce this sort of situation, Lehner said. The first factor is the requirement in the intelligence community to get permission for anything you want to do. This makes sense, given that the release of the wrong information could result in people getting killed, but it creates a situation in which it is easy for information to be suppressed.
A second factor is practitioner overconfidence. People tend to have confidence in the tools and methods they have experience with and to believe that their own experience is more trustworthy than the results of a researcher who comes into an area and conducts experiments.
The third factor is organizational and bureaucratic. Field research generally requires a champion to obtain the funding and pave the way politically, but senior people tend to move around a great deal in bureaucracies, and the chances are that the champion will have been reassigned before the study is complete. The new manager is unlikely to push for—or even believe—the study that the previous manager had championed. And so the study dies of neglect.
All of this points to a basic conclusion, Lehner said: in the intelligence community there is a strong institutional bias against obtaining or reporting negative results. The bias does not arise for political reasons or from people protecting their turf. Everybody involved is trying to do what they think is the right thing. Still, the combination of factors creates a situation in which it is very difficult to perform and report field evaluations that call into doubt methods that are being used.
Something similar happens when new techniques are introduced. The people who introduce new methodologies and tools generally believe in their practices; otherwise they would not be introducing them. So most of these people believe that if a good field evaluation were to be performed, the particular methods they are introducing would pass. A corollary is that if these people are given the choice between putting their method into prac-
tice or waiting until a field evaluation is performed, they would generally go ahead. Why wait when you’re sure it works?
But that leads to a problem. Once the new method has been put into practice, there are now people who are experienced with it and are certain that it works. No matter how good or bad it is, there will be at least some experiences in which everything works out well and the practitioner now has faith in the method. As Lehner phrased it, “It becomes part of the tried-and-true methods.”
The workshop had already provided a couple of examples of this pattern, Lehner noted. As Thomason noted, the technique of ACH has achieved a cult-like status in the intelligence community without ever having had a serious field evaluation. Similarly, Veney described the Preliminary Credibility Assessment Screening System (PCASS) as a “godsend on the battlefield” even though it has never had a true field evaluation.
The main reason that such methods become part of the intelligence toolkit, Lehner said, is that they satisfy a need. New methods and tools are not put into the field because there is a great deal of evidence showing that they work. They are put into the field because something is needed to fill a void. And once they become part of the accepted set of methods, it becomes very difficult to produce negative evaluations of them, for all the reasons described above.
This in itself wouldn’t be a problem if most of the new methods worked, but that is not the case, Lehner said. Even many of the ideas that are supported by validating field experiences don’t work. Expert judgment and field experience are surprisingly poor at discriminating between what works and what doesn’t. “You see this over and over again in lots of different fields. We see it here, too.”
Lehner predicted that if the three promising methods described earlier in the workshop—ACH, PCASS, and APOLLO—were field evaluated, only one of them would pass. “I have no idea which one,” he said, “because most good ideas don’t work, even those supported by experience (but not objective testing). So just going with the base rates, I would guess that one of these methods works and two do not.”
LESSONS FOR THE PATH FORWARD
Although there are many obstacles to reaching a point at which field evaluations are a regular and accepted part of the process of adapting techniques from the behavioral sciences for use in intelligence and counterintelligence, workshop speakers identified a number of things that can make that path easier. In particular, they accumulated a number of lessons that offer components of a potential framework for taking something from the laboratory to the field.
In reviewing his presentation on research into eyewitness testimony, Meissner described a number of the factors that brought the field to the point of having a wealth of research papers bearing on the issue. The first was what he termed a “key sociological event”—the DNA exonerations proving that a number of people convicted on the basis of eyewitness testimony were actually innocent. “That shocked the system,” he said. “It not only spurred additional research on the part of experimental psychologists but also encouraged the system to change.” In short, the DNA exonerations acted as a trigger that set a number of things in motion, including increases in funding and a heightened interest in the subject on the part of researchers.
Meissner noted that the 9/11 attacks also served as a trigger of sorts for increased interest in the issue of interrogation. He had already been doing research on interrogation in the criminal justice realm, but it was only after the attacks that funding began to be available for research on interrogation in the areas of intelligence and counterintelligence. “There were just a handful of folks doing research in this area,” he said, “but now more and more researchers are coming to the table.”
A second lesson is the importance of funding for field evaluations. Grover Whitehurst made the point explicitly in talking about lessons from the field of education: “We need more investment. We need fair and open ways for people to compete for the funds from those investments to create knowledge. We need to develop priorities for those investments that move the university-based research community towards questions that are important to practitioners and policy makers. Most academics want to talk to themselves, not to people in the field, and there are ways to incentivize them to move from the bench to the trench.”
Meissner offered the same lesson from the area of psychology and law. “Having a mechanism that is constant, that is competitive, that is independent is really important to getting good science funded,” he said. If field evaluation of techniques in intelligence and counterintelligence is to advance, it will require a steady, reliable funding stream that is structured to attract academic researchers to work with those in the field to develop a body of evidence.
A Research Base
If field evaluations are to be convincing and useful to practitioners, Meissner said, they need to be part of a larger, multimethodological
research base in which the different pieces are consistent and support each other. For example, if he and other researchers in psychology and the law had had only a few studies about eyewitness testimony, they would not have been able to convince the legal community that they needed to change. But in fact, he said, they had a very robust research literature that was both high quality and extensive. They also had a consistency of findings across different methodological approaches, using a diversity of methods and analytic approaches, which indicated a general agreement among scientists.
Basic research is an important part of it, Meissner said. The plethora of studies he mentioned include not only focused eyewitness studies but also studies that examined how memory works and how people recognize faces, models of face recognition, models of memory, models of social influence, and much else. In the intelligence area, he said, there is a great deal of basic research being done in the laboratory that on the surface doesn’t seem to have any relevance for what analysts do; in fact it is highly relevant to the basic processes that influence analysts’ decision making.
Finally, he said, the research on eyewitness identification also includes a strong theoretical grounding. Indeed, there are formal mathematical models of eyewitness identification that not only replicate previous work but also predict future findings.
Ongoing work on interrogation, Meissner said, is also engaging in a systematic program of research. It includes experimental laboratory studies, field research, and surveys. It includes research on experts in the art of interviewing in an attempt to determine what makes a person an effective interrogator. It is surveying the literature. And researchers are collaborating with practitioners. This is consistent with the tiered approach suggested by Charles Twardy, in which initial research might be done with psychology students and more refined testing in an intelligence academy or with working analysts.
In her talk on policing, Cynthia Lum made the point that a solid body of research is important in getting practitioners to accept and use the work. A number of police—particularly lieutenants and higher—come to her center’s website and use the interactive tools to find studies that give them ideas for how to deal with particular issues.1 The response has been very positive, she said, because many of these police officers are being pressured to say how they are going to deal with a particular crime problem and they need to be able to back up their answer with some proof that it is going to work. The collection of research studies available on Lum’s site provides exactly that sort of evidence.
Engagement with Practitioners
A recurrent point to emerge from the discussions at the workshop was the importance of researchers establishing and maintaining a good relationship with practitioners. Meissner stated it succinctly: “It is really important to collaborate and engage the practitioners, to bring the practitioners into the laboratory, to work with them on the very problems that you are facing, to understand the issues of implementation.” This includes ensuring they understand that if methods are implemented differently than designed or adapted inappropriately, it can produce unvalidated approaches.
What are the keys to a successful engagement with practitioners? The workshop participants offered several different perspectives. The group discussed the potential value of researchers who wish to communicate well with practitioners being able to transmit information through stories. The practitioners themselves—whether intelligence analysts, police officers, or educators—tend to pass information along through stories, so if researchers are to communicate their results effectively to the practitioners, they would do well to become good storytellers.
George Brander of the UK Ministry of Defence agreed that telling stories is vitally important to practitioners. The model that has evolved in the United Kingdom, he said, is that people join the research community with skills in anthropology, psychology, sociology, or some other area of behavioral science; they start doing their research, they get closer to the practitioners and learn how best to interact with them, and eventually they figure out how to effectively provide them with advice and guidance—which often includes telling stories.
Kleinman added that storytelling is important because “there is frequently an inverse relationship between authority and expertise” in which the people who make the decisions generally will understand relatively little about the scientific details. This is why, he said, the “snake oil salesmen” are able to convince people to use techniques for which there is little or no evidence of effectiveness. They are excellent storytellers, he said. “They would have very weak data, so they don’t spend much time on it, and they definitely make sure their audience is carefully selected so that people like those in this audience, who would cut them to shreds, are noticeably absent.” Thus, he said, it is important for researchers to be able to step outside their normal linguistic comfort zones and communicate in the way these decision makers do—that is, with stories, clear images, and a strong focus on what is in it for them.
Heather Kelly of the American Psychological Association said that storytelling is particularly important when dealing with Congress—and it is Congress that ultimately controls what gets funded and what does not. The importance of storytelling is one important reason why it is easier to
sell applied research, such as that done by the Department of Defense, than it is to sell basic research, such as that done by the National Science Foundation. “I would like for you all to be thinking about the best stories that we can tell on Capitol Hill,” she said. “It is particularly powerful when it comes from outside basic researchers versus inside researchers.”
Mandel offered a different perspective, suggesting that a more important skill than storytelling for scientists is being able to listen and being open to looking at scientific issues from the point of view of the practitioners. Research scientists are generally more interested in testing theories than in examining practical problems that are of importance to the practitioner community, and the scientists who will be able to engage best with the practitioners are those who can become interested in the challenge of trying to solve their problems, rather than just working to test theories.
Mandel added that he did not see storytelling as a particularly important skill beyond simply having the ability to communicate with the practitioner community in terms that are not full of jargon. “If [researchers] can’t talk in a clear way to directors and analysts then they are going to turn those people off,” he said, “because they are not going to want to hear about theory X or theory Y or all of these strange terms that psychologists would normally employ when talking with their academic colleagues.”
Fein suggested that researchers who are able to work with, hang out with, and gain the trust of those in the practitioner community are likely to be more effective. In particular, researchers should be able to really listen to other people, understand their interests, and try to figure out what they can do that is useful.
Researchers also need to be careful not to oversell what they can do. In particular, practitioners are always interested in getting results they can use as quickly as possible. Researchers need to be honest and objective about just how long it will take to obtain results. They need to be able to say, “I really wish I could help you in the short term, but it would not be fair to you for me to tell you that.”
The last lesson that Meissner offered was the importance of a positive focus. In the eyewitness memory field, he said, the emphasis always seems to be on false memories and mistaken eyewitness identification. Few researchers talk about the positive things that could be done to improve eyewitness identification, and that is a problem. “I think it is really important to have a positive focus,” he said. “If you want to change an applied field, you don’t go to them wagging your finger saying, ‘You are doing this poorly. Stop doing this. Stop doing this.’ In fact, what you
need to have are positive alternatives: ‘Here is a way that we can improve what you do.’” With that sort of message, researchers are much more likely to listen and respond in a useful way.
In addition to the general lessons learned from other fields, the workshop participants discussed a number of issues more specific to the task of doing field evaluations of methods from the behavioral sciences applied to intelligence and counterintelligence.
One of the issues that was returned to again and again during the workshop was how to judge the effectiveness of various practitioners in the intelligence community. Particularly in the case of analysts, it is difficult to come up with ways to measure outcomes, so a large number of the metrics are based on process instead. Gary McClelland of the University of Colorado reported that of the eight standards listed on an intelligence community directive2 that he had seen, seven were based on process. Only one of them was based on outcomes: to make accurate judgments and assessments.
This will make it very difficult for researchers to perform useful field evaluations, McClelland said, and it will make it very difficult to convince practitioners to switch to more effective methods. “When we talk about when things will change,” he said, “I think it has to come from the intelligence community deciding they will keep score.”
Brandon echoed McClelland’s comments. Without a clear metric, she noted, it is impossible to set a baseline of where the field is right now, so it is equally impossible to know with any certainly when performance has improved.
McClelland observed that if one looks at the thousands of judgments and forecasts that the intelligence community makes, one would find that most of them are pretty good. But there is absolutely no way of really assessing that, he said, and so the intelligence community ends up being assessed on the basis of a few spectacular events that may be very atypical, such as the failure to foresee the 9/11 attacks and the judgment that Iraq had weapons of mass destruction. But a standardized scoring system would make it possible to keep score. The intelligence community would know how well it was doing and would also be able to see if a
Intelligence Community Directive (ICD) 203—Analytic Standards (June 2007). Available at http://www.fas.org/irp/dni/icd/icd-203.pdf.
new technique improved things or made them worse. And once the intelligence community starts to measure outcomes, then it becomes possible for researchers to compare different methods according to the outcomes that are important to the intelligence community. Researchers should keep in mind that the outcomes they measure should be ones that matter to the intelligence community, rather than ones that seem important to researchers.
Kleinman noted that in all organizations the tendency is to measure what is easy to measure. “It makes for a nice report and a great statistical presentation, but it rarely tells us what we need to know.” Things that are really valuable to measure are often quite difficult to measure, he added, and require creativity and constant learning. In the end, he said, it is almost always worth the additional thought and effort.
Take the example of a metric for rating intelligence analyses. “One could argue that good intelligence analysis provides policy makers with meaningful options about what they can do to influence situations. You have just told them they can do these six things, and each has the potential to influence the situation. But how do you measure whether those were good options? You are clear on what you want to try to achieve, but the metric may be incredibly vexing.” Still, that doesn’t mean it isn’t possible to devise suitable metrics, he said. That’s what scientists do: find clever ways to measure things. It is often simple, but rarely easy.
Lehner added that having metrics often has the additional value of making problems obvious and creating momentum for change. For example, the DNA exonerations created a very clear metric—people wrongfully convicted on the basis of eyewitness identification—and led to the push to study eyewitness identification with the goal of improving it.
Test and Field Versus Field and Test
Because of the pressure to put new methods out in the field as quickly as possible, one school of thought holds that the best approach is to skip detailed laboratory testing and experimentation and do the testing out in the field once the method has been put to work—the “field-and-test” approach. Others believe that more testing should be done before any method is fielded in order to avoid the problem of practitioners getting attached to—and wasting their time with—methods that eventually prove to be ineffective. Workshop participants discussed the pros and cons of the two approaches.
Meissner commented that there is probably a continuum between the test-and-field and the field-and-test approaches; it is not simply an either/or issue. As a scientist, he tends to be more on the test-and-field side, he said. In part this is because he has found it so difficult to get the
legal system to work with him, so whenever he has had the opportunity, he wanted to make sure he went in with his best stuff. With too many failures, the people he worked with might decide it wasn’t worth their while.
Dennis Buede from Innovative Decisions commented that a basic question when trying to determine the correct approach is, How good is good enough? When has something been tested enough to put it into the field? In some domains, more testing is needed early, he said, while others require less. “I would suggest something like APOLLO, which is focused more on thinking, would need a lot less testing prior to fielding than something like a voice stress analyzer where it is conceivable that you may not only be giving the wrong advice but may be sending them in the wrong direction.” It is important to apply some common sense when deciding how much testing to do before putting something in the field.
Lehner argued strongly for the field-and-test method. “It is flat out impractical to do full scientific validation before fielding new methods and tools,” he said. “The need is urgent, and, quite frankly, good science is just way too slow.”
Since it is practically impossible to do the testing first, it will have to be done afterward, and that can be an effective approach if practitioners learn to become effective evaluators of methods. To do this, he said, it is necessary to foster a culture of being open to negative evaluations of current practices—that is, a culture that is just the opposite of the circle-the-wagons mentality that dominates now. Managers and users should be encouraged to ask, “Does this stuff really work? I know it seems to work, but does it really work?” Once a technology or method has been fielded, practitioners should be encouraged to do rigorous evaluations, and negative results should be rewarded. Practitioners should get the message that much of what is fielded may not work, and they need good evaluation practice to sort out what really does work.
By the same token, he said, the scientific community needs to get over the idea that one has to complete all of the scientific research before something is put into the field. What scientists can do to help is to help figure out ways to improve evaluations, to study what constitutes a good process for evaluations based on case experience and personal field experience. Such work will never have the qualities of randomized controlled trials, but it should at least be possible to come up with evaluation methods that are better than what is being done now.
By contrast, Kleinman argued the case for test-and-field. “It has been my experience,” he said, “that we would be better off in many cases just not fielding anything new without some high level of confidence—and I mean confidence from the scientific perspective, not the confidence of a program manager.” Some people might argue that it is important to try new things
in an effort to find some that work, he said, but when you are talking about national security policy or military affairs, the stakes are way too high. Guessing—or hoping—can often prove to be an expensive proposition with severe strategic consequences.
Eduardo Salas sided with Kleinman. Noting that he has spent the past 25 years conducting field evaluations of systems that attempt to improve human performance in various domains, he said he would never recommend any agency to field and then test. “I think that is dangerous.”
Lehner responded by suggesting that the field-and-test approach could lead practitioners to push for better science. Once the practitioners decide that most things don’t work and start evaluating everything rigorously, they will quickly get to a point at which they are frustrated with the large number of technologies that fail the evaluations. Ultimately, he suggested, they will say, “Don’t send me this stuff until you have good evidence. I already have three things that you have sent out, none of which in the end worked.” So it is very possible, he said, that a push toward good science could become a by-product of more aware practitioners.
Getting Practitioners to Use New Techniques
Steven Rieber from the Office of the Director of National Intelligence observed that, depending on the particular area in the intelligence community, it can be easy or difficult to get practitioners to try new techniques. In the area of deception detection, people tend to want tools immediately. But intelligence analysts are often reluctant to use new tools or techniques. So he asked the group if there was any research or anecdotal evidence to suggest how best to convince these practitioners to try new techniques.
Mandel responded that time constraints are one of the biggest issues for analysts. During training, he said, the analysts are taught various methods, such as analysis of competing hypotheses, but when the analysts get on the job, “they say they don’t have the time to use those things because they just get bogged down right away and then [are] always trying to catch up.” He added that he believes that the organizational constraints that affect the uptake of even good techniques are an important topic for research.
Jim Powlen of Logos Technologies expanded on Mandel’s comments. In his discussions with analysts, he said, he finds them as eager as anybody for more effective tools, but at the same time they complain that they have too many of them. They say, “I have 500 tools. I have more tools then I can possibly remember or ever use. I don’t need another tool.”
But if you pursue it a little further, he said, you discover that what they really want is one-stop shopping—a suite that will help them con-
solidate the information that they need, so that instead of spending 80 percent of their time gathering information and 20 percent doing analysis, they can reverse it to spend only 20 percent of their time bringing in relevant information and 80 percent on analysis. “My perception is that they’re as eager for help in the technology arena as anybody else,” he said. “They have too many single-action tools, and that isn’t really helping them.”
Whitehurst offered his perspective from the education field: “A romantic idea that I used to hold is that if you found out something that was truly useful to practice, and you made it available in a pamphlet or a publication, and you even got practitioners to read it, that they would change their behavior as a result. It was, as I label it in retrospect, a hopelessly romantic view.” He now believes that, in many cases at least, the uptake of new technologies will not happen unless there are contingencies that require it to happen. “Nothing changes unless there are contingencies in the system to require change, accountability, or on-the-job requirements, or something. Then the teachers will change just like police change, just like university professors change, just like intelligence officers change—because they have to.”
Several workshop speakers and participants spoke of the value of creating an intelligence institute dedicated to producing solid research on issues of importance to intelligence, much as the National Institutes of Health produce solid research on issues of importance to health. There were a number of arguments for such an institute.
Thomason offered two basic reasons for creating an intelligence institute. The first is that there really isn’t an internal research tradition within the intelligence community, and an intelligence institute could go a long way toward establishing such an internal tradition. The second is that there are many well-trained people outside the intelligence community who would be very interested in working on intelligence-related issues if the opportunity arose, and an intelligence institute could, if it was well financed, accelerate the collaboration process.
Robert Boruch commented that unless a clear place for scientific evidence is set aside in a governmental organization, no science will be introduced into that organization. That is the idea behind the National Science Foundation, for example. Furthermore, once a science-based entity is set up, it is important to protect it and its science from nonscientific influences. For instance, federal statistical agencies such as the Census Bureau and the Bureau of Labor Statistics have special statutory provisions intended to insulate them from the influences of theology, politics, ideology, and so
on. Understanding how to build that protection into an intelligence institute is very important, he said.
In many ways, the Defense Personnel Security Research Center, or PERSEREC, parallels what people in the workshop were discussing as an intelligence research institute, and Eric Lang of PERSEREC described a bit of its history to offer some insight into what it might take to set up an intelligence institute.
After a rash of espionage cases in the mid-1980s, a security review commission recommended that the Department of Defense develop an organic research capability to understand the problems better. PERSEREC was set up with a sunset clause: it had three years to prove its worth, or it would be shut. “What we did,” Lang said, “is develop a strategic plan that had a mix of quick-hitting research studies and longer term programmatic research, and we became the institutional memory for DoD and for much of the rest of the government because there is no other similar size research entity dedicated to personnel security.”
There was constant pressure to provide devices and methods that could be used immediately—to take the “low-hanging fruit”—and PERSEREC did provide some of this. “This is part of how we earn our keep,” Lang said. But PERSEREC also devotes a significant portion of its time to long-term programmatic research, and that has paid off. Even though some of the studies have taken three years, five years, or longer, they are valued and many have resulted in policy improvements at the DoD and national levels. The clients at the undersecretary level value the programmatic research, Lang said. “We have a critical mass of mostly Ph.D.-level social scientists and psychologists who provide a stable source of knowledge and hands-on experience for understanding personnel security needs, working with the key players in the field and leadership positions, and conducting both long-term and short-term research. And we can make a case for the practical value that both kinds of research provide.”
Lang argued that the intelligence community needs something similar—an organic, ongoing research infrastructure and capability, rather than just commissioning an isolated project here and a collaboration there. Part of the value of PERSEREC, he said, is that it has been around for more than 20 years. “People in the community know our staff, track record, and capabilities. They know we will help them think through the problem, do the research, and, if needed, help with implementation and follow-up evaluation. But it takes that kind of ongoing institutional memory and critical mass of applied and basic researchers to get that job done.”
Either the Defense Intelligence Agency or the Office of the Director of National Intelligence is a logical place for an intelligence research institute, Lang said. But regardless of location, it is important for it to be established with the proper charter, one that sets up a suitable research
capability that allows the institute to delve into issues on a regular basis and not simply at workshops or in the form of consensus studies.
None of this is easy. “This is a very tough problem,” Fein commented. The workshop discussions, particularly those that presented experiences from other fields, made it clear there are many obstacles to effective field evaluations of behavioral science techniques. They also made it clear that such evaluations are possible with the right approach and enough effort—and, furthermore, that such evaluations are indeed crucial to determining which methods should be put to work. It requires patience and a long-term view, but it can be done. “I emerge from these discussions sobered but actually more hopeful than before,” Fein said, if only because the workshop demonstrated that quite a number of good minds are already at work on the problem.