graph, and it is likely that some of this research would be classified. Elsewhere, we advocate open public research on the polygraph. In areas for which classified research is necessary, it is reasonable to expect that the quality and reliability of this research, even if conducted by the best available research teams, will necessarily be lower than that of unclassified research, because classified research projects do not have access to the self-correcting mechanisms (e.g., peer review, free collaboration, data sharing, publication, and rebuttal) that are such an integral part of open scientific research.
Theoretical considerations and data suggest that any single-value estimate of polygraph accuracy in general use would likely be misleading. A major reason is that accuracy varies markedly across studies. This variability is due in part to sampling factors (small sample sizes and different methods of sampling); however, undetermined systematic differences between the studies undoubtedly also contribute to variability.
The accuracy index of the laboratory studies of specific-incident polygraph testing that we found that had at least minimal scientific quality and that presented data in a form amenable to quantitative estimation of criterion validity was between 0.81 and 0.91 for the middle 26 of the values from 52 datasets. Field studies suggest a similar, or perhaps slightly higher, level of accuracy. These numerical estimates should be interpreted with great care and should not be used as general measures of polygraph accuracy, particularly for screening applications. First, none of the studies we used to produce these numbers is a true study of polygraph screening. For the reasons discussed in this chapter, we expect that the accuracy index values that would be estimated from such studies would be lower than those in the studies we have reviewed.7
Second, these index values do not represent the percentage of correct polygraph judgments except under particular, very unusual circumstances. Their meaning in terms of percent correct depends on other factors, particularly the threshold that is set for declaring a test result positive and the base rate of deceptive individuals tested. In screening populations with very low base rates of deceptive individuals, even an extremely high percentage of correct classifications can give very unsatisfactory results. This point is illustrated in Table 2-1 (in Chapter 2), which presents an example of a test with an accuracy index of 0.90 that makes 99.5 percent correct classifications in a hypothetical security screening situation, yet lets 8 of 10 spies pass the screen.