Conclusions and Recommendations
We have reviewed the scientific evidence on the polygraph with the goal of assessing its validity for security uses, especially those involving the screening of substantial numbers of government employees. Overall, the evidence is scanty and scientifically weak. Our conclusions are necessarily based on the far from satisfactory body of evidence on polygraph accuracy, as well as basic knowledge about the physiological responses the polygraph measures. We separately present our conclusions about scientific knowledge on the validity of polygraph and other techniques of detecting deception, about policy for employee security screening in the context of the U.S. Department of Energy (DOE) laboratories, and about the future of detection and deterrence of deception, including a recommendation for research.
Polygraph Accuracy Almost a century of research in scientific psychology and physiology provides little basis for the expectation that a polygraph test could have extremely high accuracy. The physiological responses measured by the polygraph are not uniquely related to deception. That is, the responses measured by the polygraph do not all reflect a single underlying process: a variety of psychological and physiological processes, including some that can be consciously controlled, can affect polygraph measures and test
results. Moreover, most polygraph testing procedures allow for uncontrolled variation in test administration (e.g., creation of the emotional climate, selecting questions) that can be expected to result in variations in accuracy and that limit the level of accuracy that can be consistently achieved.
Theoretical Basis The theoretical rationale for the polygraph is quite weak, especially in terms of differential fear, arousal, or other emotional states that are triggered in response to relevant or comparison questions. We have not found any serious effort at construct validation of polygraph testing.
Research Progress Research on the polygraph has not progressed over time in the manner of a typical scientific field. It has not accumulated knowledge or strengthened its scientific underpinnings in any significant manner. Polygraph research has proceeded in relative isolation from related fields of basic science and has benefited little from conceptual, theoretical, and technological advances in those fields that are relevant to the psychophysiological detection of deception.
Future Potential The inherent ambiguity of the physiological measures used in the polygraph suggest that further investments in improving polygraph technique and interpretation will bring only modest improvements in accuracy.
Evidence of Polygraph Accuracy
Source of Evidence The evidence for polygraph validity lies primarily in atheoretical, empirical studies showing associations between summary scores derived from polygraph measures and independent indicators of truth or deception, in short, in studies that estimate the accuracy of polygraph tests. Accuracy—the ability to distinguish deceptive from truthful individuals or responses—is an empirical property of a test procedure administered under specific conditions and with specific examinees. Consequently, it may vary with a number of factors, such as the population of examinees, characteristics of individual examinees or examiners, relationships established in the interview, testing methods, and the use of countermeasures. Despite efforts to create standardized polygraph testing procedures, each test with each individual has significant unique features.
Realism of Evidence The research on polygraph accuracy fails in important ways to reflect critical aspects of field polygraph testing, even for specific-incident investigation. In the laboratory studies focused on specific incidents using mock crimes, the consequences associated with lying or being judged deceptive almost never mirror the seriousness of those in real-
world settings in which the polygraph is used. Polygraph practitioners claim that such studies underestimate the accuracy of the polygraph for motivated examinees, but we have found neither a compelling theoretical rationale nor a clear base of empirical evidence to support this claim; in our judgment, these studies overestimate accuracy. Virtually all the observational field studies of the polygraph have been focused on specific incidents and have been plagued by measurement biases that favor over-estimation of accuracy, such as examiner contamination, as well as biases created by the lack of a clear and independent measure of truth.
Overestimation For the reasons cited, we believe that estimates of polygraph accuracy from existing research overestimate accuracy in actual practice, even for specific-incident investigations. The evidence is insufficient to allow a quantitative estimate of the size of the overestimate.
Estimate of Accuracy Notwithstanding the limitations of the quality of the empirical research and the limited ability to generalize to real-world settings, we conclude that in populations of examinees such as those represented in the polygraph research literature, untrained in countermeasures, specific-incident polygraph tests for event-specific investigations can discriminate lying from truth telling at rates well above chance, though well below perfection.
Accuracy may be highly variable across situations. The evidence does not allow any precise quantitative estimate of polygraph accuracy or provide confidence that accuracy is stable across personality types, sociodemographic groups, psychological and medical conditions, examiner and examinee expectancies, or ways of administering the test and selecting questions. In particular, the evidence does not provide confidence that polygraph accuracy is robust against potential countermeasures. There is essentially no evidence on the incremental validity of polygraph testing, that is, its ability to add predictive value to that which can be achieved by other methods.
Utility Polygraph examinations may have utility to the extent that they can elicit admissions and confessions, deter undesired activity, and instill public confidence. However, such utility is separate from polygraph validity. There is substantial anecdotal evidence that admissions and confessions occur in polygraph examinations, but no direct scientific evidence assessing the utility of the polygraph. Indirect evidence supports the idea that a technique will exhibit utility effects if examinees and the public believe that there is a high likelihood of a deceptive person being detected and that the costs of being judged deceptive are substantial. Any technique about which people hold such beliefs is likely to exhibit utility, whether or not it is valid. For example, there is no evidence to suggest that admissions and
confessions occur more readily with the polygraph than with a bogus pipeline—an interrogation accompanying the use of an inert machine that the examinee believes to be a polygraph. In the long run, evidence that a technique lacks validity will surely undercut its utility.
Criterion of Truthfulness There are inherent difficulties in assessing the accuracy of polygraph testing in the screening situations of greatest concern to this study. Although the criterion of truthfulness is easy to establish in laboratory simulations, we have seen no indication of a clear and stable agreement on what criteria are used in practice for assessing the accuracy of security screening polygraph tests in any federal agency that uses the tests. In particular, there is inconsistency about whether the polygraph test is being judged on its ability to detect major security violations or on its ability to elicit admissions of security violations of any magnitude. Moreover, the federal agencies that use the polygraph for screening do not collect data in a form that allows data from the ongoing administration of polygraph programs to be used to assess polygraph accuracy.
Generalizing from Research Because the studies of acceptable quality all focus on specific incidents, generalization from them to uses for screening is not justified. For this reason, uncertainty about the accuracy of screening polygraphs is greater than for specific-incident polygraph testing.
Estimate of Accuracy Because actual screening applications involve considerably more ambiguity for the examinee and in determining truth than arises in specific-incident studies, polygraph accuracy for screening purposes is almost certainly lower than what can be achieved by specific-incident polygraph tests in the field. Accuracy can be expected to be lower because of two major differences between screening and specific-incident polygraph testing. First, because a screening examiner does not know what specific transgressions an examinee may be concealing, it is necessary to ask generic questions rather than specific ones. Such questions create considerably more ambiguity for examinees than specific questions, such that two examinees who have committed the same minor infraction might have very different interpretations of its relevance to a test question, and very different emotional and physiological reactions. Instructions to examinees may reduce, but will not eliminate such variations, which can only degrade the accuracy of a test. Second, the appropriate criteria for judging accuracy are different in the two situations. In the typical screening situation, it is difficult in principle to assess whether a negative answer is truthful, and therefore it is much harder to establish truth and estimate accuracy than
in event-specific testing. Moreover, the experimental studies that somewhat approximate screening situations all have serious methodological flaws. These studies typically involve mock-crime simulations very much like those used in other polygraph research; consequently, we believe these studies have more relevance for real-world specific-incident settings than for real-world screening settings.
Preemployment Screening The relevance of available research to preemployment polygraph screening is highly questionable because such screening involves inferences about future behavior on the basis of polygraph evidence about past behaviors that are probably quite different in kind. The validity for such inferences depends on specifying and testing a plausible theory that links evidence of past behavior, such as illegal drug use, to future behavior of a different kind, such as revealing classified information. We have not found any explicit statement of a plausible theory, let alone evidence appropriate for judging either construct or criterion validity for this application. Conclusions about polygraph accuracy for these applications must be drawn by educated extrapolation from research that addresses situations that differ systematically from the intended applications.
Locus of Deception Evidence from screening simulation studies is inconsistent concerning the ability of screening polygraph tests to identify which of several question areas is the correct locus of deception.
Effectiveness Basic science and polygraph research give reason for concern that polygraph test accuracy may be degraded by countermeasures, particularly when used by major security threats who have a strong incentive and sufficient resources to use them effectively. If these measures are effective, they could seriously undermine any value of polygraph security screening. All of the physiological indicators measured by the polygraph can be altered by conscious efforts through cognitive or physical means, and there is enough empirical research to justify concern that successful countermeasures may be learnable. Research does not clarify, however, whether users of countermeasures can be detected in contexts in which systematic efforts are made to detect and deter them. The available evidence does not allow us to determine whether innocent examinees can increase their chances of achieving nondeceptive outcomes by using countermeasures. It is possible that classified information on countermeasures and their detection exists; however, our specific requests to the relevant federal agencies for such information, including a classified briefing, did not reveal any such research. Thus, we cannot verify its existence or relevance.
Alternatives and Enhancements to the Polygraph
Alternative Techniques Some potential alternatives to the polygraph show promise, but none has yet been shown to outperform the polygraph. None shows any promise of supplanting the polygraph for screening purposes in the near term. Some potential alternatives may be useful as supplements, though the necessary research to explore that potential has not been done. Some, particularly techniques based on measurement of brain activity through electrical and imaging studies, have good potential on grounds of basic theory. However, research is at a very early stage with the most promising techniques, and many methodological, theoretical, and practical problems would have to be solved for these techniques to yield improvements on the polygraph. Not enough is known to tell whether it will ever be possible in practice to identify deception in real time through brain measurements.
Computerized Analysis Computerized analysis of polygraph records may be able, in theory, to improve test accuracy. This potential has not yet been demonstrated, however, either in research or in practice, and it is likely to be only modest. There have been major developments in computerized acquisition, summarization, display, and scoring of polygraph data, and further advances are likely. Computerized polygraph scoring procedures have the theoretical potential to increase the accuracy of polygraph interpretation because they allow analysis to use more information from the polygraph record and to weight different polygraph features more appropriately than do traditional scoring methods. Despite considerable government investment in computerized polygraph scoring methods, however, the existing approaches have at best an empirical base and are only loosely justified in terms of the features they extract from the polygraph record. These methods have a problematic statistical basis and have not been tested widely enough to generate confidence that their accuracy is any greater than that of traditional scoring methods. The difficulties that exist with computerized scoring of polygraph tests also exist, and may be multiplied, with possible expert systems for combining polygraph results with other forms of data.
Combining Information Sources It may be possible to improve the ability to identify major security risks by combining polygraph information with information from other screening techniques, for example, in serial screening protocols such as those used in medical diagnosis. We found no serious investigations of such multicomponent screening approaches.
DOE POLYGRAPH SCREENING POLICY
Every situation in which polygraph testing might be contemplated, including each security screening situation, has its own characteristics in terms of the types and magnitudes of the costs and benefits presented by polygraph testing. These costs and benefits are of many types, some of which are impossible to estimate quantitatively with available knowledge. The choices should therefore be evaluated for each application on the basis of the characteristics of that application, available scientific knowledge about the test’s performance, and informed judgments about the values at stake. We have carefully examined the situation of employee security screening at the DOE laboratories, and the conclusions below apply to that situation. They are likely also to apply to other situations in which the base rates of the target transgressions are extremely low, the costs of false negative results can be very high, and the costs associated with using a screening procedure that produces a large number of false positive results would be very high.
Limitations for Detection The polygraph as currently used has extremely serious limitations for use in security screening to identify security risks and to clear valued employees. In populations with extremely low base rates of major security violations, such an application requires greater accuracy than polygraph testing achieves. In addition, there is a realistic possibility that the polygraph might be defeated with countermeasures, at least by the most serious security violators. The potential that a polygraph policy may deter security threats and elicit admissions and confessions may justify using the polygraph in security screening, but this rationale does not rest on the validity of the polygraph for psychophysiological detection of deception. Rather, it rests on the expectation that examinees’ behavior will be shaped by their concerns that they may be judged (rightly or wrongly) to be deceptive on the polygraph. Because of these limitations, even if the polygraph has some accuracy in actual field use, it does not follow that it should be used for screening because of the potential costs of such use, including the possibilities that it will lower morale and productivity in national security organizations and deter people with scarce and highly valuable skills from working, or continuing to work, in these organizations.
False Positives with “Suspicious” Thresholds Polygraph screening protocols that can identify a large fraction of serious security violators can be expected to incorrectly implicate at least hundreds, and perhaps thousands, of innocent employees for each spy or other serious security violator correctly identified. Given the range of scientifically plausible accuracy levels for poly-
graph testing, this conclusion applies to any population of examinees that has the very low base rates of major security violations, such as espionage, that almost certainly exist among the employees subjected to polygraph screening in the DOE laboratories. Because the innocent will be indistinguishable from the guilty by polygraph alone, investigative resources would have to be expended to investigate hundreds of cases in order to find whether there is indeed one guilty individual (or more) in a pool of many individuals who “fail” a polygraph test. The alternative is to terminate or interrupt the careers of hundreds of innocent and productive individuals in an attempt to prevent the activity of one potential spy or saboteur.
Failure to Detect with “Friendly” Thresholds Polygraph screening programs can reduce the costs associated with false positive findings by adopting techniques that reduce the likelihood that innocent examinees will “fail” a polygraph test. However, polygraph screening programs that produce very small proportions of positive results, such as those reported by DOE, the U.S. Department of Defense (DoD), and the Federal Bureau of Investigation (FBI), can do so only at the cost of failing to accurately identify the majority of deceptive examinees. This conclusion applies to any population with extremely low base rates of the target transgressions, and it holds true even if none of the deceptive examinees uses countermeasures.
Use in DOE Employee Security Screening Polygraph testing yields an unacceptable choice for DOE employee security screening between too many loyal employees falsely judged deceptive and too many major security threats left undetected. Its accuracy in distinguishing actual or potential security violators from innocent test takers is insufficient to justify reliance on its use in employee security screening in federal agencies. If polygraph screening is considered because of its potential utility for such purposes as deterrence and elicitation of admissions, it should be remembered that a policy with a relatively friendly threshold that might enhance these forms of utility cannot be counted on to detect more than a small proportion of major security violators.
Danger of Overconfidence Overconfidence in the polygraph—a belief in its accuracy not justified by the evidence—presents a danger to national security objectives. A false faith in the accuracy of polygraph testing among potential examinees may enhance its utility for deterrence and eliciting admissions. However, we are more concerned with the danger that can arise from overconfidence in polygraph accuracy among officials in security and counterintelligence organizations, who are themselves potential examinees. Such overconfidence, when it affects counterintelligence and
security policy choices, may create an unfounded, false sense that because employees have appeared nondeceptive on a polygraph, security precautions can be relaxed. Such overconfidence can create a false sense of security among policy makers, employees in sensitive positions, and the public that may in turn lead to inappropriate relaxation of other methods of ensuring security. It can waste public resources by devoting to the polygraph funds that would be better expended on developing or implementing alternative security procedures. It can lead to unnecessary loss of competent or highly skilled individuals because of suspicions cast on them as a result of false positive polygraph exams or because they avoid or leave employment in federal security organizations in the face of such prospects. And it can lead to credible claims that agencies that use polygraphs are infringing on individuals’ civil liberties for insufficient benefits to national security.
Broader Approaches The limited usefulness of the polygraph for security screening justifies efforts to look more broadly for ways to improve security. Modifications in the overall security strategies used in federal agencies, such as have been recommended by the Hamre Commission for the U.S. Department of Energy (Commission on Science and Security, 2002), deserve consideration. Ways of improving the accuracy of screening, including alternatives and supplements to the polygraph and innovative ways to combine information sources, also deserve consideration.
Recent Policy Recommendations on Polygraph Screening Two recent reports that advocate continued use of polygraph tests for security screening in federal agencies are partly, but not completely, consistent with the scientific evidence on polygraph accuracy. The Hamre Commission report recommends more restricted use in DOE; the Webster Commission report (Commission for the Review of FBI Security Programs, 2002) recommends expanded polygraph testing in the FBI. Both reports recommend using the polygraph only on individuals who are in positions where they could gravely threaten national security, a stance consistent with the objective of reducing the total costs of false positive errors in testing.
Both reports presumably based their recommendations at least in part on a belief in the utility of the polygraph that goes beyond issues regarding the scientific validity and accuracy.
Neither report explicitly addresses two inherent problems of using a test with the approximate accuracy of the polygraph for screening in populations with very low base rates of spies and terrorists. One is the false positive problem created by the likelihood that the great majority of positive test results will come from innocent examinees. The other, potentially more serious problem, is the false negative problem created by
the likelihood that with polygraph screening programs such as are being operated at both DOE and FBI, which yield a very low proportion of negative results, the majority of spies are likely to “pass” at least one polygraph test without being detected, even if they do not use countermeasures. Thus, as we note above, a policy of screening that may be justified on the basis of utility for deterrence and elicitation of admissions cannot be counted on to detect more than a small proportion of major security violators.
Federal officials need to be careful not to draw the wrong conclusions from negative polygraph test results. Our discussions with polygraph program and counterintelligence officials in several federal agencies suggest that there is a widespread belief in this community that someone who “passes” the polygraph is “cleared” of suspicion. Acting on such a belief with the results of security screening polygraph tests could pose a danger to national security because a negative polygraph test result in a population with a low base rate, especially when the test protocol produces a very small percentage of positive test results, provides little information on deceptiveness beyond what was already known prior to the test, that the probability of true transgression is very low.
Although the scientific base for detecting deception remains weak, scientific analysis remains the best way for government agencies to assess techniques that are presented as useful for detecting and deterring criminals and national security threats and to develop improved methods. This section suggests ways that federal agencies should evaluate purported techniques for detection of deception or of concealed information. The next section recommends a program of research aimed at improving the capability for detection and deterrence.
Evaluating Methods for Detecting Deception
Need for Scientific Evaluation Techniques for detecting deception should be subjected to independent scientific evaluation before any agency relies on them. Government agencies will continue to seek accurate ways to detect deception by criminals, spies, terrorists, and others who threaten public safety and security interests. These agencies need to be able to make objective evaluations of new techniques offered to them by entrepreneurs who claim that these techniques are based on science. Recent experience suggests that many such techniques are likely to be developed in the coming years and that many of them will be oversold. In particular, proponents are likely to present evidence that a technique discriminates
accurately between truthfulness and deception in a particular sample of examinees as proof of the overall validity of the technique. As Chapters 2 and 3 make clear, such evidence is insufficient to demonstrate general validity.
Our efforts in conducting this study may be useful in suggesting what kinds of scientific evaluation are needed for future claims of scientific detection of deception. We offer a set of questions that indicate the kinds of studies that would provide credible evidence for supporting techniques for the detection of deception. We have also identified a set of characteristics of high-quality studies that address issues of accuracy. We present these questions and characteristics with the hope that they may help government agencies to use solid independent evidence as the basis for their judgments about proposed techniques for the scientifically based detection of deception, including some that may not yet have been developed.
Questions for Assessing Validity
Does the technique have a plausible theoretical rationale, that is, a proposed psychological, physiological, or brain mechanism that is consistent with current physiological, neurobiological, and psychological knowledge?
Does the psychological state being tested for (deception or recognition) reliably cause identifiable behavioral, physiological, or brain changes in individuals, and are these changes measured by the proposed technique?
By what mechanisms are the states associated with deception linked to the phenomena the technique measures?
Are optimal procedures being used to measure the particular states claimed to be associated with deception?
By what mechanisms might a truthful response produce a false positive result with this technique? What do practitioners of the technique do to counteract or correct for such mechanisms? Is this response to the possibility of false positives reasonable considering the mechanisms involved?
By what means could a deceptive response produce a false negative result? That is, what is the potential for effective countermeasures? What do practitioners of the technique do to counteract or correct for such phenomena? Is this response to the possibility of false negatives and effective countermeasures reasonable considering the mechanisms involved?
Are the mechanisms purported to link deception to behavioral, physiological, or brain states and those states to the test results universal for all people who might be examined, or do they operate differently in
different kinds of people or in different situations? Is it possible that measured responses do not always have the same meaning or that a test that works for some kinds of examinees or situations will fail with others?
How do the social context and the social interactions that constitute the examination procedure affect the reliability and validity of the recordings that are obtained?
Are there plausible alternative theoretical rationales regarding the underlying mechanisms that make competing empirical predictions about how the technique performs? What is the weight of evidence for competing theoretical rationales?
Research Methods for Demonstrating Accuracy
Claims that a technique is valid for the detection of deception should be accompanied by evidence of accuracy. The broader the range of examinees, examiners, situations, and social contexts in which accuracy is demonstrated, the greater the confidence that a technique will perform well across various applications. Agencies assessing claims of accuracy should consider the degree to which the studies offered to support the claims embody a number of features shared by good validation research in this area.
Randomized Experimentation In analog studies, this means that examinees are randomly assigned to be truthful or deceptive. It is also useful to have studies in which examinees are allowed to decide whether to engage in the target behavior. Such studies gain a degree of realism for what they lose in experimental control.
Manipulation Checks If a technique is claimed to measure arousal, for example, there should be independent evidence that experimental manipulations actually create different levels of arousal in the different groups.
Blind Administration and Blind Evaluation of the Technique Whoever administers and scores tests based on the technique must do so in the absence of any information on whether the examinee is truthful or deceptive.
Adequate Sample Sizes Most of the studies we examined were based on relatively small sample sizes that were sometimes adequate to allow for the detection of statistically significant differences but were insufficient for accurate assessment of accuracy. Changing the results of only a few cases might dramatically affect the implications of these studies.
Appropriate Comparison Conditions and Experimental Controls These conditions and controls will vary with the technique. A suggestion
of what may be involved is the idea in polygraph research of comparing a polygraph examination with a bogus polygraph examination, with neither the examiner nor the examinee knowing that the test output might be bogus.
Cross-Validation of Any Exploratory Data Analytic Solution on Independent Data Any standardized or computerized scoring system for measurements from a technique cannot be seriously considered as providing accurate detection unless it has been shown to perform well on samples of examinees different from those on whom it was developed.
Examinees Masked to Experimental Hypotheses if Not to Experimental Condition It is important to sort out precisely what effect is being measured. For example, the results of a countermeasures study would be more convincing if examinees were instructed to expect that the examiner is looking for the use of countermeasures, among other things, rather than being instructed explicitly that this is a study of whether countermeasures work and can be detected.
Standardization An experiment should have sufficient standardization to allow reliable replication by others and should analyze the results from all examinees. It is important to use a technique in the same way on all the examinees, which means: clear reporting of how the technique was administered; sharply limiting the examiner’s discretion in administering the technique and interpreting its results; and using the technique on all examinees, not only the ones whose responses are easy to classify. If some examinees are dropped from the analysis, the reasons should be stated explicitly. This is a difficult test for a procedure to pass, but it is appropriate for policy purposes.
Analysis of Sensitivity and Specificity or Their Equivalents Data should be reported in a way that makes it possible to calculate both the sensitivity and specificity of the technique, preferably at multiple thresholds for diagnostic decision making or in a way that allows comparisons of the test results with the criterion on other than binary scales.
A PROGRAM OF RESEARCH
Our conclusions make clear that polygraph testing, though exhibiting accuracy considerably better than chance under a variety of conditions, has characteristics that leave it far short of what would be desirable for screening programs to distinguish individuals who pose threats to national security from innocent examinees. The research base for precisely quantifying the accuracy of polygraph testing is also far from what would be desirable. During our deliberations we repeatedly discussed how polygraph research might have been done better, what alternatives to the
current instruments and tests would most sensibly take modern psychophysiological understanding into account, and what evidence we ourselves would find compelling as support for a technique for the physiological detection of deception. We also asked ourselves whether there would be much practical or scientific gain from incremental research on polygraph testing and scoring techniques and on the other detection techniques discussed throughout this report.
Expanded Research Effort We recommend an expanded research effort directed at methods for deterring and detecting major security threats, including efforts to improve techniques for security screening. Research offers one promising strategy for meeting the national need to deter and detect security threats. It is not, of course, the only appropriate strategy. Traditional methods of maintaining the security of classified material, controlling and monitoring access, investigating security threats, and so forth, continue to be extremely important. In fact, to the extent that techniques of detecting deception are likely to remain imperfect, such other security strategies gain in importance because they decrease the burden that detection techniques must carry in meeting security objectives.
We cannot guarantee that research related to techniques for detecting deception will yield valuable practical payoff for national security, even in the long term. However, given the seriousness of the national need, an expanded research effort appears worthwhile.
Objectives The research program we envision would seek any edge that science can provide for deterring and detecting security threats. It would have two major objectives: (1) to provide federal agencies with methods of the highest possible scientific validity for protecting national security by deterring and detecting espionage, sabotage, terrorism, and other major security threats; and (2) to make these agencies fully aware of the strengths and limitations of the techniques they use.
Deterring and Detecting Security Threats
If the government continues to rely heavily on the polygraph in the national security arena, some of this research effort should be devoted to developing scientific knowledge that could either put the polygraph on a firmer scientific foundation or lead to its supplementation or replacement. We have identified a considerable number of open scientific questions about the polygraph throughout this report that could be addressed as part of the research program. We do not think, however, that national security is best served by a narrow focus on polygraph research.
Scope The research program should have a far broader scope than polygraph testing, broader even than psychophysiological detection of deception and specific alternative approaches to detecting deception (discussed in Chapter 6). It should include, but not necessarily be limited to, approaches involving testing, interrogating, and investigating individuals. For instance, the recommendations of the Hamre Commission (Commission on Science and Security, 2002) suggest the need for research on approaches to deterrence and detection that can be implemented at the organizational level as well as through the testing of individuals. Research on such approaches would be appropriate for consideration and support under the program. It is important that the research program be broadly conceived and open to supporting alternative ways of looking at these problems because there is no single research approach that clearly holds the most promise for meeting national security objectives. Thus, the research program might support research ranging from very basic work on fundamental psychological, physiological, social, and organizational processes related to deterring and detecting security threats, on one hand, to applied studies implementing scientifically sound methods in practical situations, on the other.
We have investigated only a part of this large domain. We present below some ideas about potentially promising lines of research in the areas we have investigated, and our expectations about what concerted research efforts along each line of research might yield.
Scientifically based efforts could be made to develop, define, and validate improved indicators derived from polygraph measurements for use in computerized scoring. These efforts would have to improve on the approaches currently being used. They might lead to marginal improvements in the overall performance of polygraph testing over several years, but major increases in accuracy are unlikely to be achieved.
Serious investigations could be focused on explaining the variation in accuracy estimates from polygraph research. This might yield more confident estimates of accuracy, which would help inform decisions about the conditions under which polygraph testing is useful and about how much reliance to place on the results when it is used.
The previous line of investigation would have to be supplemented by research into the major threats to polygraph validity. Two that deserve special attention are polygraph performance with stigmatized populations and as a function of examiners’ expectancies. Such studies would resolve concerns that polygraph accuracy may be seriously reduced with certain examinees or under certain conditions. It is possible that such research would result in reduced confidence in the scientific value of the
polygraph. In our judgment, even such a result would be positive because it would help agencies make more accurate interpretations of the information they have.
Research could be conducted on the effectiveness of polygraph countermeasures and on their detectability. Great progress can be made in learning how polygraph measures respond to different kinds of countermeasures, how much effort is needed to learn effective countermeasures, and how otherwise effective countermeasures can be detected. The value of this research depends on the usefulness of the polygraph for detection in particular contexts, which could be made clearer with the other suggested research.
Careful documentation of polygraph examinations as they are being administered, combined with individual background information and reports on subsequent outcomes, would generate a valuable body of epidemiological data that could provide better estimates of the accuracy of field polygraph testing, both generally and with specific populations.
Planned experiments, embedded in the operation of an ongoing polygraph program, in which examiners might potentially be experimental subjects uninformed about certain aspects of the research design, might be used to separate the effects of different components of the polygraph examination, elucidate the impact of expectancies, and more generally improve understanding of the polygraph examination process in real-world populations of examinees on whom the outcome has potentially serious impact.
Other Approaches to Detection of Deception in Individuals
Research on indicators of deception from demeanor have not been given much systematic attention, even though some of them might yield measures of comparable or perhaps greater accuracy than the polygraph. This line of research might yield practical supplements or complements to the polygraph in the relatively near term because demeanor indicators may yield indicators of deception that are somewhat different from those measured by the polygraph.
Investigations of brain activity through electrical and imaging studies may yield basic understanding of neural processes in deception. Such investigations, especially if theoretically grounded in central nervous system psychophysiology, have the potential in principle to yield techniques of deception more accurate than the polygraph, as well as to supplement information from polygraph and other sources and to identify signatures in the brain of particular polygraph countermeasures. Not enough is known, however, to tell whether it will ever be possible in practice to identify deception in real time through brain measurement. We are con-
fident that it will not happen within the next decade. Moreover, brain-based indicators will not necessarily be resistant to countermeasures.
Research could be conducted to seek physiological measures other than brain measures, developed since the advent of the polygraph, that might have greater validity than the polygraph or yield improvements in accuracy when combined with polygraph or other measures. Such research will be most promising if it is guided by empirically supported theory about the underlying psychological and physiological mechanisms. We anticipate that research on such measures will, at best, yield incremental improvement over the performance of the polygraph.
Investigation of statistical and computer-based ways to combine diverse indicators of truthfulness or deception might yield composite indicators or serial testing protocols that would noticeably improve accuracy of detection beyond what the polygraph achieves with general populations. This strategy may be the most promising way to achieve noticeable improvements in the accuracy of detection of deception in the fairly short run. We caution, however, that this research is likely to be atheoretical, so that it will be very important to investigate carefully threats to validity, including the threat of countermeasures, for both composite indicators and serial testing protocols.
Explicit research on policies for detection of deception would help agencies make better informed decisions on how to use uncertain information. This research might address questions of the incremental validity of new information, the policy implications of setting thresholds for tests of deception, and the estimation of tradeoffs involved in alternative detection policies.
Systematic research on the bogus pipeline phenomenon can help with deterring and detecting security threats in more than one way. It can clarify the extent to which the practical value of the polygraph (or analogous techniques) for eliciting admissions results from test validity or merely from examinees’ beliefs and concerns. This will help agencies better interpret the information they get from using the polygraph and analogous techniques. It may also help improve interrogation techniques. We note that ethical issues will arise with some uses of interrogation techniques that rely on elements known to be bogus.
The problem of deterrence of security threats might be addressed explicitly with research. It is, after all, an empirical question how polygraph policies or other security policies affect the behavior of federal employees and potential employees—both those who may act against the national security and those who will not, but whose productivity or em-
ployment futures may be affected by security policies aimed at deterring breaches of security. Better understanding of such effects could give valuable insight to decision makers in the near term.
Various lines of organizational research may also be useful in developing effective policies for deterring and detecting security threats. We have not considered the possibilities, but are convinced that useful research can be done on deterrence and detection from the perspective of policy design and implementation.
Potential Payoff We cannot predict with confidence that an investment by the federal government in the kind of research program envisioned here will yield substantial improvement in the ability to deter and detect threats to the national security. We would expect at least marginal improvement in this ability and more significant improvement in the government’s ability to evaluate the information available from techniques for detecting deception. The basic research may have large practical value in the long run, as well as spillover effects through contributions to basic science, but these cannot be foreseen with any confidence.
The approaches that have the greatest overall promise for detecting deception, such as direct measurement of brain activity, will take a long time to produce any practical payoff. Even then, we have much more confidence that they will advance cognitive and social psychophysiology than that they will advance practical detection of deception. They constitute a long-term speculative investment. At the other extreme, research on the polygraph may have quick benefits, but they are likely to be small. Such research may also undermine confidence in the technique, leaving the government with the task of finding new instruments and new approaches to deterrence and detection. It is because of this real possibility that we advocate a program that has a broad vision: some of the best practical ideas may be ones that have not yet been researched. Some of them may not even directly involve efforts to detect deception.
Organization of a Research Program
Organizational Emphasis A substantial portion of our recommended expanded research program should be administered by an organization or organizations with no operational responsibility for detecting deception and no institutional commitment to using or training practitioners of a particular technique. The research program should follow accepted standards for scientific research, use rules and procedures designed to eliminate biases that might influence the findings, and operate under normal rules of scientific freedom and openness to the extent possible while protecting national security.
We recommend this organizational emphasis because many past research efforts on detection of deception in the U.S. government, though
well intentioned, have suffered from a separation from mainstream scientific thinking and from their organizational location within agencies strongly committed to one technique. This has hampered progress in polygraph research and largely prevented the government from giving adequate attention to alternative and supplementary approaches.
We wish to note explicitly that in recent years, the DoD Polygraph Institute (DoDPI) has been working to put polygraph research on a more scientific footing. For example, technical reports are being submitted to peer-reviewed journals, and outside academic reviewers are providing advice on improving the scientific quality of DoDPI-funded research. These are salutary developments for polygraph science and should be commended, but they have not gone far enough. The effectiveness of DoDPI as a source of solid scientific knowledge on detecting deception is significantly undermined by two structural/institutional factors: (1) that its mission is narrowly defined in terms of the polygraph rather than the larger purpose of detecting deception; and (2) that the research activities are housed in an organization whose mission involves promoting and training personnel in a specific technique of detecting deception. These factors create real and perceived conflicts of interest with respect to research that might question polygraph validity or support an alternative method as superior.
The organizations that carry out the expanded research program should support both basic and applied research. They should follow standard scientific advisory and decision-making procedures, including external peer review of proposals, and they should support research that is conducted and reviewed openly in the manner of other scientific research. Classified and restricted research should be limited only to matters of identifiable national security.
The fundamental research sponsored in the research program should not be totally separate from other related scientific efforts (for example, research on brain imaging supported by basic science and health research agencies), but some separation is essential to ensure that mechanisms are in place for periodically assessing progress toward national security goals and for assuring that promising approaches move from the laboratory to testing in applied settings.
Expanding basic research on deception and deterrence as outlined above does not lessen the need the for government to review and assess the implications and uses of the research for defense and homeland security, and specifically to develop and test operational versions of procedures that can enhance such security and to train those who will be charged with implementing these procedures. Thus, at least some of the applied research in the expanded program should be sponsored by or linked to organizations with operational responsibilities for national se-
curity to ensure its relevance to these missions. Mission-oriented agencies should continue to conduct implementation-focused research, such as studies of quality control, examiner training effectiveness, and so forth. In addition, mission-oriented agencies should be encouraged and even mandated to cooperate with the broader research effort, for example, by providing archival data and cooperating in field research.
Countermeasures and Classified Research The problem of countermeasures highlights some important questions about how future research on detecting deception should be structured. Concerns about countermeasures arise in all lie detection contexts, not only polygraph testing. Research on countermeasures poses the prospect of discovering techniques that might be exploited by the very people lie detectors seek to catch. Thus, many people have argued that research on countermeasures should be classified or otherwise conducted outside the public domain. It is true that removing countermeasures research from public view may lessen the danger that these techniques will fall into the wrong hands, but such removal would also carry with it certain possible negative consequences. Classification would limit the number and, in all likelihood, the quality of the scientists available to study countermeasures. The more robust the scientific exploration of the subject, the more likely the dangers of countermeasures can be identified and nullified. Interestingly, the decision on whether to classify this research is not entirely unrelated to the physiological character of countermeasure techniques. If countermeasures have unique physiological signatures that cannot be masked or otherwise concealed, then classifying this research would be unnecessary. Lie detection would invariably identify countermeasures by these signatures whenever they were used, and potential examinees would learn to expect that countermeasures would be detected. Unfortunately, until the research is done, one cannot know whether countermeasures have such signatures. Ultimately, therefore, the decision whether to classify such research is a policy choice. Policy makers must weigh the danger of public knowledge of countermeasure techniques against the benefits of a robust research program that could be expected (though not guaranteed) to be more successful at identifying and nullifying countermeasure techniques.