vided by the study. Typically, these data could be conveyed in simple tabular form to show test outcomes for deceptive and nondeceptive examinees. If studies included multiple conditions or internal comparisons, either a primary summary table was extracted, or tables were reported for each of several conditions or subgroups. This process yielded from one to over a dozen datasets from the individual studies, depending on the number of conditions and subpopulations considered. Often, multiple datasets reflected the same subjects tested under different conditions or different scoring methods applied to the same polygraph examination results.
To gain a baseline impression of empirical polygraph validity, we used data primarily from the studies that passed the six first-stage screening criteria. After committee review of the reports passed on by staff with unresolved status in this regard, 74 were determined to satisfy the initial criteria. Those criteria were relaxed to allow 6 others that failed no more than one criterion, either on grounds of documentation or impracticality in a field context, and that came either from a source of particular relevance (U.S. Department of Defense Polygraph Institute, DoDPI) or exhibited features of special interest (e.g., field relevance). During this process, we identified redundant reports of the same study, and used the report with the most comprehensive data reporting or that reported data in a form most suitable for our purpose.
Some studies that had passed our screen and initially appeared suitable for ROC analysis were not ultimately used for this purpose. Specifically, studies that exclusively reported polygraph decisions made on the basis of averaging raw chart scores of multiple examiners were excluded. While this approach shares with computer scoring the laudable intent of reducing errors due to examiner variability, to our knowledge such a scoring method is never used in practice, and it will often exaggerate the validity of a single polygraph examination.
We also excluded, for this particular purpose, data from an otherwise interesting research category: studies of concealed information tests using subjects as their own controls that did not also include subjects who had no concealed information about the questions asked. These studies compared responses of research subjects to stimuli about which they had information to responses to other stimuli, in various multiple-choice contexts. In them, each examinee was deceptive to some questions and nondeceptive to others. Some of these studies were quite strong in the sense of controlling internal biases and quite convincing in demonstrat-