Evidence from Polygraph Research: Qualitative Assessment
The basic science relevant to the polygraph suggests that it can at best be an imperfect instrument, but it leaves unclear the degree of imperfection. In this and the next chapter we evaluate the empirical evidence on error rates from scientific studies of polygraph testing. Our dual purposes are to gauge the levels of accuracy (in technical terms, criterion validity) that have been observed in research contexts and to assess the extent to which results of past empirical polygraph research can be relied upon for estimates of the test’s accuracy in real-world settings. We undertook this task through a systematic literature review (detailed in Appendix G). The literature review includes studies of specific-incident as well as screening polygraph testing, even though the main purpose of this study is to draw conclusions about screening. We examined the broader literature because the empirical research on polygraph screening is too limited to support any judgments and because it is possible to gain useful insights about the potential value of polygraph screening from examining the evidence on polygraph test accuracy in specific-incident applications.
This chapter provides a qualitative assessment of research on polygraph validity. The next chapter discusses the collective quantitative findings of the studies we reviewed and the empirical data pertaining to specific issues, including questioning technique, subpopulations of examinees, and countermeasures.
There have been a number of previous reviews of the validity of the polygraph and related techniques (e.g., Levey, 1988; U.S. Office of Technology Assessment, 1983; see also Lykken, 1981; Murphy, 1993), each of which has examined partially overlapping sets of studies, though it is unlikely that any review (including ours) covers every study done. What is remarkable, given the large body of relevant research, is that claims about the accuracy of the polygraph made today parallel those made throughout the history of the polygraph: practitioners have always claimed extremely high levels of accuracy, and these claims have rarely been reflected in empirical research. Levey’s (1988) analysis suggests that conclusions about the accuracy of the polygraph have not changed substantially since the earliest empirical assessments of this technique and that the prospects for improving accuracy have not brightened over many decades.
We used several methods to gather as many polygraph validation studies for review as possible (see Appendix G). Our search resulted in 217 research reports of 194 separate studies (some studies appeared in more than one report). The committee next determined which studies were of sufficient quality to include in our review. We agreed on six minimal criteria for further consideration:
documentation of examination procedures sufficient to allow a basic replication;
independently determined truth;
inclusion of both guilty and innocent individuals as determined by truth criteria;
sufficient information for quantitative estimation of accuracy;
polygraph scoring conducted blind to information about truth; and,
in experimental studies, appropriate assignment to experimental groups germane to estimating accuracy (mainly, guilt and innocence).
Our detailed review by staff selected 102 studies that deserved further examination by the committee because they met all the criteria or were of sufficient interest on other grounds. Each of these studies was assigned to two committee members for coding on 16 study characteristics that the committee judged to be potentially relevant to an assessment of the polygraph’s accuracy. (Appendix G provides details on the committee’s process.)
We conducted a systematic review of research but not a meta-analysis for two basic reasons.1 First, the studies of adequate quality are too het-
erogeneous and the numbers of each type too few to allow us to deal with the heterogeneity in an adequate statistical way. Second, because most of the available studies bear only indirectly on applications to security screening, using precise statistical models to summarize the findings would not contribute much to our purpose. Rather than developing and testing meta-analytic models, we have taken the simpler and less potentially misleading approach of presenting descriptive summaries and graphs. Because the studies vary greatly in quality and include several with extreme outcomes due to small size, sampling variability, bias, or nongeneralizable features of their study designs, we did not give much weight to the studies with outcomes at the extremes of the group. Instead, we focused on outcomes in the middle half of the range in terms of accuracy. For the purpose of this study, this focus reveals what the empirical research shows about the accuracy of polygraph testing.
The polygraph studies that met our criteria for consideration do not generally reach the high levels of research quality desired in science. Only 57 of the 194 studies (30 percent) that we examined both met minimal standards of scientific adequacy and presented useful data for quantifying criterion validity. Of these 57, only 18 percent and 9 percent, respectively, received average internal validity and salience ratings of 2 or better on a 5-point scale (on which 1 is the best possible score; see Appendix G for the rating system). These ratings mean that relatively few of the studies are of the quality level typically needed for funding by the U.S. National Science Foundation or the U.S. National Institutes of Health. This assessment of the general quality of this literature as relatively low coincides with the assessments in other reviews (e.g., U.S. Office of Technology Assessment, 1983; Levey, 1988; Fiedler, Schmid, and Stahl, 2002). It partly reflects the inherent difficulties of doing high-quality research in this area. The fact that a sizable number of polygraph studies have nevertheless appeared in good-quality, peer-reviewed journals probably reflects two facts: the practical importance of the topic and the willingness of journals to publish laboratory studies that are high in internal validity but relatively low in salience to real-world application.
The types of studies that are most scientifically compelling for evaluating a technology with widespread field application are only lightly represented in the polygraph literature. Laboratory or simulation studies are most compelling when they examine the theoretical bases for a technique or when they provide information on its performance that can be extrapolated to field settings on the basis of a relevant and empirically supported theoretical foundation. Field studies are most valuable when they involve controlled performance comparisons, where either the field system is experimentally manipulated according to the subtraction principle (see Chapter 3) or where observational data are collected systematically from
the field system to develop models suggesting what actual manipulation might produce.
The relevance of the available research to security screening applications is far less than would be desirable. Only one flawed study investigates a real polygraph screening program, and the simulated screening studies are too closely tied to specific mock crimes to simulate adequately the generic nature of polygraph screening questions. Moreover, all of the studies available to us were conducted on samples with base rates of guilt far above the extremely low rates typical of employee security screening programs, so that generalization from those studies to screening applications is quite problematic. (We address the base rate problem in detail in Chapter 7.)
For a variety of understandable practical reasons, the great majority of polygraph validation studies have been laboratory based. This research has consisted predominantly of efforts to measure test accuracy in simulated settings or compare accuracy across methods of testing or test interpretation. There has been relatively little attention to issues of theory, as noted in Chapter 3. For instance, very few studies have investigated threats to validity that seem potentially important on theoretical grounds, such as effects of stigma and expectancy. As a result, serious open questions remain about the basis for generalizing beyond the laboratory situations. The laboratory studies are also inconsistent regarding their attention to methodological controls. We found numerous studies that provide tight control in one or more respects but omit control in others. In addition, most studies have presented the data in terms of one or two cutoff points for scoring, preventing exploration of how the tradeoff between false positives and false negatives might vary with slightly different applications of the same testing approach. Although valuable laboratory studies have been done, they are relatively few in number and leave us with limited enthusiasm for this body of research as a whole.
EXPERIMENTAL FIELD STUDIES
The most compelling type of field validation study is an experimental field study, one in which a variable of interest is manipulated among polygraph examinations given in a real-life polygraph testing context, for example, the context of an actual security screening program. The variable of greatest interest is usually guilt/innocence or deception/truthfulness on relevant questions, a variable that is difficult, though not impossible, to manipulate in a field setting. Other variables are also of
considerable interest, including whether the polygraph leads are connected to a polygraph or a bogus source of chart output, how the physiological responses are translated into chart output (e.g., electrodermal response measured as resistance or conductance), how the questions are asked, and how often screening is done. We found no such field experiments in the entire literature on polygraph validity.
Significant obstacles to high-quality polygraph field research are readily apparent. Good field research may require substantial funding, interagency cooperation, and enough time to resolve major logistical, ethical, interprofessional, and political problems, especially when experimental manipulation is intended. Nevertheless, so long as these obstacles are allowed to impede research, the scarcity of good field studies will remain a substantial impediment to appraising the scientific validity of the polygraph.
Some of these obstacles could be overcome. For the sake of discussion, we suggest what field studies of polygraphy would be like if they adhered to the highest standards of scientific rigor. Experimental studies would randomly assign subjects to one of two or more methods for detection of deception. These might be selected using the subtraction principle: e.g., one method might be the Test of Espionage and Sabotage (TES), conducted according to current U.S. Department of Energy practice, while the other might be the same test using polygraph tracings fed into the instrument from another subject, perhaps in an adjacent room, a bogus pipeline. Or one method might be a specific-incident control question polygraph test that represented electrodermal response as either skin conductance or skin resistance, with all other factors totally comparable. In either case, research subjects and, to the extent feasible, polygraph examiners and quality control chart scorers would be blinded to which form of testing was used. Subsequently, information would be obtained about test accuracy for each individual by some method that assesses truth independently of the polygraph test result. (Perhaps the test results would be filed away and not acted on.) The data to support the truth categorization would be collected uniformly and in a standard fashion over time, without regard to which form of polygraph test the subject had taken or the test results. After testing a large number of examinees and observation over a sufficient period to determine truth, the best procedure would be determined according to some predetermined criterion, such as the method that identifies the most spies for each false positive result. If randomized experimentation could not be done, data would be collected in a uniform fashion on whatever testing was performed and compared against truth, determined in a uniform fashion independent of test results.
It is easy to see from an organizational point of view why such re-
search has not been done. The logistics of blind administration of alternative polygraph tests would require a large staff, would be technically complex, and might even require the use of custom-designed physical facilities. A method for ultimately assessing truth independently of the polygraph test may not be readily available, or it may be unavailable at an acceptable cost (or even at any cost). Moreover, polygraph examiners and the law enforcement and intelligence agencies that employ them are confident from experience in the value of polygraph testing. They might therefore find any research that might degrade test performance or that requires withholding of the test results from use to be ethically unacceptable. Furthermore, in today’s litigious environment, errors made under research conditions might expose individual researchers and government agencies to a liability risk. In combination, these are powerful impediments to high-quality experimental field research on polygraph testing.
However, polygraph testing leads to important, even life and death, decisions about the examinee, and it also affects families, associates, and national security; consequently, it is worth making an effort to use the best feasible research designs to evaluate it. All of the above obstacles have close counterparts in clinical medical research, and research methods have been developed over half a century to largely overcome them or limit their effects. Billions of dollars are now spent annually on medical clinical trials because the importance of high-quality research is clear, and researchers have developed effective ways of dealing with the obstacles. During this period the federal government, through the U.S. National Institutes of Health, promoted the development of an entire field of methodological research for medical science that now has its own professional societies and journals and provides the scientific basis for an evidence-based medicine movement that is growing rapidly worldwide. Important, related progress has been made in other fields of practice, such as education and public health. We do not mean to conclude that a methodologically clean, definitive “clinical trial” of polygraph testing is now or necessarily ever will be possible. The problems of designing experiments that randomly assign examinees to be truthful or deceptive in a situation with stakes high enough to approach those in a criminal investigation or employee security screening situation are extreme, and they may be insurmountable. For example, examinees assigned to be deceptive could be expected to differentially withdraw from the experiment. Nevertheless, the medical research experience shows that major scientific advances occur even in the face of methodological limitations similar to those affecting polygraph research and that such limitations can often be successfully addressed. Some polygraph researchers appreciate the potential gains from using stronger research designs, but the lesson has not been applied to field experimentation.
OBSERVATIONAL FIELD STUDIES
Observational field studies are useful when laboratory experimentation has limited external validity, and they are necessary when field experiments are impossible or impractical. Methodology for the design and interpretation of observational research has seen extensive development over many decades by researchers in the social sciences and public health. As with clinical experimentation, issues once addressed only with qualitative methods, such as causal inference from observational data, are now the focus of competing quantitative mathematical models.
In typologies of observational studies, the top rungs of a generally accepted quality hierarchy are occupied by studies that, despite the absence of experimental control, do incorporate controls for potential biases and for confounding by extraneous factors that most closely mimic those of designed experiments. The highest rated among these studies are prospective cohort studies, often termed quasi-experimental studies, in which a cohort (a sample that is scientifically chosen from a carefully defined population) is followed over time with data collected by a design specified in advance. Such studies differ from actual experiments in a single respect: the exposure of subjects to respective levels of the experimental variable of interest is not randomly assigned and is outside of the experimenter’s control. In other respects, such studies incorporate uniform observational protocols designed to minimize measurement biases and to detect and allow statistical adjustment for inequities, due to selection biases or serendipity, that might distort (confound) statistical relationships of primary interest. Thus, measurement and collection of appropriate research data is under the control of the experimenter even though the experimental variable is not. For the polygraph, an example would be a screening program in which the decision about how often employees are retested is made by agency staff rather than assigned at random. It would be possible, at least in principle, to assess the deterrent value of polygraph rescreening by comparing the rates of independently verified security violations among subgroups that have been retested at different intervals.
Lower in the quality hierarchy are observational studies in which the selection, implementation, and recording of measurements, and hence data quality and potential for bias, are less subject to the experimenter’s control. Since the timing of observation and data collection correspond less closely to those of an experiment, there is the possibility of inconsistency in the temporal sequencing of events and, thus, confusion between causes and effects. As a general rule, the best such studies are retrospective cohort studies, that is, cohort studies with data collection after the events of interest, and population-based case-control studies. An example
is the comparison of past performances on screening polygraph examinations between a group of employees later found to have violated computer security protocols and another group of employees of the same agency, similarly observed, for whom no violations were found.
Below these in the hierarchy are case-control studies not linked to a defined population; cross-sectional surveys, in which correlations are observed among multiple variables ascertained at the same time (e.g., polygraph tests and intensive security investigations); case series without comparison groups for control; and finally, individual case studies. All these can provide useful information, especially for generating hypotheses, but they are vulnerable to error from too many sources to be considered scientifically reliable on their own except in very rare circumstances. We note that no matter how well they are conducted, none of these study designs is capable of estimating the probability of any future event, because they do not observe forward in time a representative group of individuals to determine the actual probability of the target events occurring in subgroups of interest (e.g., people given or not given polygraph examinations as part of a security investigation).
Two additional observations are necessary to place our views on polygraph field studies in perspective. First, the scientific value of any observational study assessing the connection between two variables, such as polygraph result and deception, or medical therapy and survivaland therefore the study’s position in the above hierarchyis critically dependent on the manner in which the study sample is assembled. In particular, if inclusion in the sample is related to both variables in the study design, there is a serious risk of major distortion of the statistical accounting process and of spurious scientific results. An example is the common procedure in polygraph field research of defining truth by confession of the polygraph examinee or someone else. Such research necessarily omits cases in which there was no confession. This procedure probably yields an upward bias in the estimates of polygraph accuracy because the relationship between polygraph results and guilt is likely to be stronger in cases that led to confessions than in the entire population of cases. This bias can occur because definitive polygraph results can influence the likelihood of confession and the direction taken by criminal investigations (see Iacono, 1991, for a discussion; we offer a quantitative example below).
Second, the effectiveness of opportunistic studies that do not control the data collection process is largely determined by the degree of completeness, objectivity, and accuracy with which relevant variables are recorded by individuals with no awareness of the research process. The reliability of clinical and administrative data tends to vary in proportion to the relevance and immediacy of their use to the staff recording the data
(or their supervisors). In medical charts, for example, observations of the variables critical to immediate patient care are generally accurate while others perhaps needed later for retrospective research are often omitted or present only by implication. Polygraph research would present a similar situation.
We appreciate the inherent difficulty of determining the truth for observational polygraph field studies. Although we applaud the labor of those investigators who have undertaken such studies, we are unable to place a great deal of faith in this small body of work, especially regarding its implications for screening. We found only one field study of polygraph screening with verifiable outcome data relevant to assessing accuracy; its results and limitations are discussed in Chapter 5. The annual reports that polygraph programs provide to Congress do not provide a basis for assessing the accuracy of polygraph testing, as we have discussed.
We found no specific-incident field investigations at the higher levels of the research hierarchy outlined above. The literature revealed no experiments and no cohort or case-control studies that were prospectively designed and implemented. The best criminal field investigations we reviewed were observational case-control studies using data on truth obtained retrospectively from administrative databases. In these studies, the past polygraph judgments (or reevaluations of past polygraph records) with respect to individuals whose deceptiveness or nondeceptiveness had subsequently been established were reviewed, tabulated, and compared. This case-control approach is an observational research design of intermediate strength, weakened in most of these studies by heterogeneity of polygraph procedure; lack of prospective, research-oriented data collection; and the probable contamination of sample selection by the polygraph result. Data were generally not provided on whether confessions occurred during the polygraph examination or subsequently as a direct consequence of being judged deceptive on the polygraph examination. Neither were data provided on the extent to which a suspect’s polygraph results led an investigation to be redirected, leading to the determination of the truth. Both these outcomes of the polygraph examination are good for law enforcement, but they lead to overestimates of polygraph accuracy.
Although we excluded studies that lack independent evidence of truth, field study procedures still tend to overestimate the accuracy of the polygraph. The problem, in technical terms, is that these studies use the probabilities of past truthful or deceptive polygraph outcomes among subsets of examinees later proven to be truthful or deceptive to estimate the probabilities of future polygraph outcomes among all examinees, including those for whom the truth cannot be independently established.
The failure to establish truth independently and the consequent reliance on the easy cases can lead to seriously distorted inferences.
We provide an example to show how this might occur. Suppose, for instance, that in a certain city (a) the polygraph correctly detects deception in two-thirds of guilty suspects; and (b) due to belief of both police and suspects in the polygraph’s accuracy, police are three times as likely to elicit a confession from guilty suspects who appear deceptive on the polygraph as from those who appear truthful. For instance, suppose that of 300 guilty suspects, 200 fail the polygraph and 100 pass it, and that 30 percent of guilty suspects who fail the polygraph confess, compared with only 10 percent for guilty subjects who have passed. Then 10 percent of the 100 passing suspects, or 10 suspects, would be expected to confess, as would about 30 percent of the 200 failing suspects, or 60 suspects. If none of the remaining 230 guilty suspects is definitively proven innocent or guilty, only the 70 confessed suspects enter the population of a case-control study as guilty cases. Although only 67 percent of all guilty suspects appeared deceptive on the polygraph, the case-control study would show that 60 out of 70, or 86 percent of the guilty cases confirmed by confessions, had given deceptive polygraph results. A validity study that uses cases confirmed by confession would therefore estimate a sensitivity of 86 percent, while the sensitivity under actual field conditions is only 67 percent. If, instead of 67 percent, we suppose that the polygraph has a sensitivity of 80 percent, a similar calculation shows that the case-control study would include 78 guilty suspects and would overestimate the sensitivity as 92 percent. A similar bias could exaggerate the test’s specificity and any other measures of polygraph accuracy estimated from the case-control sample.
In summary, we were unable to find any field experiments, field quasi-experiments, or prospective research-oriented data collection specifically designed to address polygraph validity and satisfying minimal standards of research quality. The field research that we reviewed used passive observational research designs of no more than moderate methodological strength, weakened by the admittedly difficult problem that truth could not be known in all cases and by the possible biases introduced by different approaches to dealing with this problem. In addition, because field examiners normally have background information about the examinees before the test begins, there is the possibility that their expectations have direct or indirect effects on the polygraph test data that cannot be removed even if the charts are independently scored. Thus, field studies contain a bias of potentially serious magnitude toward overestimating the accuracy that would be observed if the truth were known for everyone who took a polygraph test.
AN APPROACH FOR PLANNED FIELD RESEARCH
Polygraph field research poses difficult design issues, and we readily acknowledge the lack of a template for dealing simultaneously with all the problems and obtaining rapid, definitive results. Nevertheless, it is possible to do better field research than we have found in the literature and, over time, to use admittedly imperfect research designs, both experimental and observational, to advance knowledge and build methodological understanding, leading to better research design in the future. To accomplish these ends requires a key ingredient that has been missing from polygraph field research: active, prospective research planning. Prospectively planned field research generally produces better information than that obtained from opportunistic samples. As is true in most areas of human activity, higher quality comes at higher cost. Such research would require extensive participation by agencies that currently use polygraph testing and a dramatically higher level of research funding than is currently available for polygraph investigations.
We provide a few examples of the types of planned approaches that might be considered, but that we have not found in the publicly available polygraph research literature.
Prospective, research-oriented polygraph logs might be recorded for an extended series of routine field examinations. These logs would include information on exactly which question or questions produced responses indicating deception, precisely when in the polygraph examination admissions were made (in particular, whether these were before, during, or after testing), and whether admissions were made in response to an examiner’s claim of deception supported by a polygraph chart, or to other stimuli.
Actors or other mock subjects could be trained to be deceptive or nondeceptive, much as in laboratory mock crime experiments but more elaborately, and inserted sporadically for polygraph testing in field settings: for example, they could be presented to polygraph examiners as applicants for sensitive security positions.
Selected physiological responses of genuine polygraph subjects could be concealed from the examiner in favor of dummy tracings, for instance, of an alternate subject listening to the same questions in another room. The genuine responses of the examinee could be retained and still used to guide a follow-up interrogation or investigation if the charts indicate such a need.
Polygraph machines that can record a physiological response in more than one way (e.g., electrodermal response presented as conductance or resistance or presented as a bogus signal) might be used in field
or laboratory testing. The form of chart output provided to the examiner could be varied randomly, and the examiners’ conclusions compared. In the example of electrodermal response, polygraph theory and basic physiology imply that conductance should give superior performance. This sort of test would bear on the construct validity of electrodermal response as an indicator of deception.
“Blind” scorers might be used to score sets of polygraph charts, including charts of confessed foreign espionage agents whose activities were uncovered by methods independent of the polygraph and charts of other randomly selected individuals who underwent examinations in the same polygraph programs but who are not now known to be spies. While the bias issue raised above in connection with criminal incident field studies is also of concern here, its importance would be diminished by restricting the analysis to agents uncovered without the polygraph, by random selection of the comparison group, and by appropriately narrow interpretation of the results.
This list is not offered as a set of research recommendations, but as examples of the kinds of research activities that might be considered in a program of actively designed field research on methods for the psychophysiological detection of deception. Such a program would not be expected to yield dramatic short-term results, nor would its long-term evolution be predictable. Experience in many areas of science suggests, however, that a program of actively designed field research would lead to innovations and improvements in methodology and to observations that might justify the effort. (We discuss research priorities in Chapter 8.)
BIAS, CONFLICT OF INTEREST, AND UNSCIENTIFIC DECISION MAKING
In the course of our study we have seen or heard numerous disturbing allegations about the way polygraph research decisions have been made, particularly in federal agencies that have supported this research. We have seen or heard reports of researchers being prohibited from presenting studies at professional society meetings (see, e.g., Honts, 1994: Note 5); a report of a researcher being required to remove his name from a refereed journal article, apparently because the content displeased his employer (Furedy, 1993); a report of potentially inflammatory findings being suppressed and recalled from distribution; and various reports of researchers having been removed summarily from their duties or their positions, with reasons to believe that this might have been done because of the directions or results of their research. These reports are not ancient history, though they are not current either: most appear to have dated
from the early 1990s.2 We have not investigated these reports to determine their veracity—this was not our charge—but they appear to us to be sufficient in number and credibility to deserve mention. It is important that polygraph research be organized so as to minimize the possibility of such situations in the future.
We have also experienced difficulty in gaining access to material necessary to evaluate reports of polygraph research. We wrote to all federal agencies that use the polygraph for employee screening to request studies and other information necessary to conduct a scientific evaluation of polygraph validity, including both unclassified and classified information. In some ways, the agencies were highly responsive. We received large amounts of useful information, and we learned that the kinds of data we wanted on some topics are not collected by any of the agencies in the desired form. In other instances, though, we were left unsatisfied. Two agencies did not provide us with specific unclassified research reports that we requested.3 Also, we were advised by officials from DOE and DoDPI that there was information relevant to our work, classified at the secret level, particularly with regard to polygraph countermeasures. In order to review such information, several committee members and staff obtained national security clearances at the secret level. We were subsequently told by officials of the Central Intelligence Agency and DoDPI that there were no completed studies of polygraph countermeasures at the secret level; we do not know whether there are any such studies at a higher level of classification. Accordingly, our analyses of research on countermeasures are based only on unclassified studies.
These experiences leave us with unresolved concerns about whether federal agencies sponsoring polygraph research have acted in ways that suppress or conceal research results or that drive out researchers whose results might have questioned the validity of current polygraph practice. If the agencies have done or are doing these things, the result would be to introduce a pro-polygraph bias into polygraph research in general, as well as to raise doubts about whether it is advisable for reviewers to apply the usual practice of trusting in the accuracy and completeness of reports in the scientific literature. In addition, any review of the literature, including this one, would be subject to question on the grounds of bias in the entire body of polygraph research.
Such bias is possible because a large segment of polygraph research in the United States has been supported by a small number of agencies that depend on the polygraph in their counterintelligence work. The effect might be something like the “file-drawer effect” commonly noted in meta-analytic research (Rosenthal, 1979, 1980). The nature of the file-drawer problem is that studies that fail to find significant effects or associations are believed to be less likely to be published because journals are
disinclined to publish studies that lack clear findings. Thus, they are not submitted for publication or are rejected, and the published literature is, in effect, incomplete. This effect biases the literature in the direction of appearing to show stronger relationships than would otherwise be evident. If research funding agencies are suppressing research, the effects would be similar, though for a different reason. Studies that call the validity of polygraph testing into question, whether by failing to find accurate detection or by finding that accuracy is not robust across the range of situations in which polygraph tests are used, would fail to appear in literature searches.
We have not investigated the various allegations, so we are not in a position to evaluate the extent to which the alleged activities may have biased the literature. In Chapter 5 we do compare the polygraph accuracy estimates that come from studies with different sources of funding as a way of shedding some light on the possible effect of bias on the research literature, and find little difference. However, the distinctions between funding sources of these studies were often blurred.
Issues of conflict of interest reflect a serious structural problem with polygraph research. For the most part, the scientists involved in this area and the agencies involved in sponsoring and funding this research have a vested interest in supporting particular sets of conclusions about the reliability and validity of the polygraph (Levey, 1988). For example, U.S. agencies charged with initiating and sponsoring polygraph research (e.g., the U.S. Department of Defense Polygraph Institute) are also charged with the mission of training polygraph examiners and developing new polygraph applications. The dual mission of acting as a sponsor for polygraph research and as a sponsor for polygraph practice creates an obvious conflict of interest. Any reasonable investigator would anticipate that certain research questions (e.g., those that question the theory or logic of the polygraph) or certain patterns of results (e.g., those that suggest limited validity or strong susceptibility to countermeasures) will be less welcome by such research sponsors than empirical demonstrations that the polygraph “works.”
Because the great bulk of polygraph research has been funded by agencies that rely on the polygraph for law enforcement or counterintelligence purposes, there is a significant potential for bias and conflict of interest in polygraph research. Serious allegations suggest that this potential has at some times been realized. This possibility raises warnings that the entire body of research literature may have a bias toward claims of validity for the polygraph. Using a crude classification method (see Chapter 5), we did not see systematic differences in outcomes of polygraph validation studies between those conducted at or funded by polygraph-related agencies and those with a greater presumed degree of inde-
pendence. However, this issue remains a concern because of the insularity and close connections among polygraph researchers in government and academia, the associations between some prominent researchers and manufacturers of polygraph equipment, and the limited accessibility of field polygraph data to researchers independent of the organizations that conduct polygraph tests. The credibility of future polygraph research would be enhanced by efforts to insulate it from such real or perceived conflicts of interest (see Chapter 8).
We find the general quality of research on the criterion validity of the polygraph to be relatively low. This assessment agrees with those of previous reviewers of this field. This situation partly reflects the inherent difficulties of doing high-quality research in this area, but higher quality research designs and methods of data analysis that might have been implemented have generally not been used. Laboratory studies, though important for demonstrating principles, have serious inherent limitations for generalizing to realistic situations, including the fact that the consequences associated with being judged deceptive are almost never as serious as they are in real-world settings. Field studies of polygraph validity have used research designs of no more than moderate methodological strength and are further weakened by the difficulties of independently determining truth and the possible biases introduced by the ways the research has addressed this issue.
Our definition of meta-analysis is presented in Appendix G, along with a more detailed discussion of our rationale for not conducting one.
In recent years, the U.S. Department of Defense Polygraph Institute has been working to put polygraph research on more of a scientific footing by adopting a number of standard procedures for scientific quality control that can only serve to improve research management at the institute and that may already be having such an effect.
One of these agencies informed us that it could not provide the requested report in order to protect its sources and methods. The other agency informed us that it would handle our request under the Freedom of Information Act and advised us that its response would not be received until January 2003 at the earliest, well after the scheduled completion of our study. Both of these unclassified reports have been cited in the open literature.