erogeneous and the numbers of each type too few to allow us to deal with the heterogeneity in an adequate statistical way. Second, because most of the available studies bear only indirectly on applications to security screening, using precise statistical models to summarize the findings would not contribute much to our purpose. Rather than developing and testing meta-analytic models, we have taken the simpler and less potentially misleading approach of presenting descriptive summaries and graphs. Because the studies vary greatly in quality and include several with extreme outcomes due to small size, sampling variability, bias, or nongeneralizable features of their study designs, we did not give much weight to the studies with outcomes at the extremes of the group. Instead, we focused on outcomes in the middle half of the range in terms of accuracy. For the purpose of this study, this focus reveals what the empirical research shows about the accuracy of polygraph testing.
The polygraph studies that met our criteria for consideration do not generally reach the high levels of research quality desired in science. Only 57 of the 194 studies (30 percent) that we examined both met minimal standards of scientific adequacy and presented useful data for quantifying criterion validity. Of these 57, only 18 percent and 9 percent, respectively, received average internal validity and salience ratings of 2 or better on a 5-point scale (on which 1 is the best possible score; see Appendix G for the rating system). These ratings mean that relatively few of the studies are of the quality level typically needed for funding by the U.S. National Science Foundation or the U.S. National Institutes of Health. This assessment of the general quality of this literature as relatively low coincides with the assessments in other reviews (e.g., U.S. Office of Technology Assessment, 1983; Levey, 1988; Fiedler, Schmid, and Stahl, 2002). It partly reflects the inherent difficulties of doing high-quality research in this area. The fact that a sizable number of polygraph studies have nevertheless appeared in good-quality, peer-reviewed journals probably reflects two facts: the practical importance of the topic and the willingness of journals to publish laboratory studies that are high in internal validity but relatively low in salience to real-world application.
The types of studies that are most scientifically compelling for evaluating a technology with widespread field application are only lightly represented in the polygraph literature. Laboratory or simulation studies are most compelling when they examine the theoretical bases for a technique or when they provide information on its performance that can be extrapolated to field settings on the basis of a relevant and empirically supported theoretical foundation. Field studies are most valuable when they involve controlled performance comparisons, where either the field system is experimentally manipulated according to the subtraction principle (see Chapter 3) or where observational data are collected systematically from