Process for Systematic Review of Polygraph Validation Studies
“Systematic review” describes a relatively formal approach to evaluating a body of research literature that has over the past two decades gradually been supplanting the classical “expert summary” review article. The latter, while often an intellectual tour de force, is nevertheless prone to idiosyncratic literature selection and overemphasis on the reviewer’s experience and predispositions. Systematic reviews incorporate a common set of steps, conducted and documented so that, as with primary scientific studies, it is possible for other researchers to replicate the systematic review process to confirm its results. The five common steps, each of which may be elaborated in a variety of ways, are question formulation, literature search and compilation, critical characterization and data extraction, integration of results, and contextual evaluation. Our systematic review was less formal than many, due largely to the breadth of the task and the scope of available resources, but we are confident in the approach and the resulting primary scientific conclusions.
The questions addressed by this review were largely dictated by the committee’s charge. These are:
How strong is the correspondence between polygraph test results and actual deception in empirical studies that allow such assessment?
Does the strength of correspondence vary substantially across different test settings, questioning methods, study populations, or other variables of potential practical importance?
To what degree are the quality and generalizability of the polygraph research literature sufficient to support policy decisions regarding use of the polygraph, with particular emphasis on national security screening applications?
LITERATURE SEARCH AND COMPILATION
Many thousands of works have been written on the polygraph. An extensive bibliography compiled two decades ago (Ansley, Horvath, and Barland, 1983) listed some 3,400 references, and there have certainly been thousands of works on the subject since then. Our interest for this review was in the small proportion of this literature that includes polygraph validation studies, that is, studies that (a) report measurements of one or more of the physiological responses measured by the polygraph and (b) link these physiological responses to the respondent’s truth or deception. Only such studies offer empirical evidence that can be used to assess the criterion validity of the polygraph.
We used several approaches in an effort to obtain as much as possible of the entire corpus of polygraph validation studies. One was a normal literature search using computerized bibliographic databases such as PsycInfo, Social Science Citation Index, Medline, and so forth, using relevant keywords. In addition, we sent requests by regular or electronic mail to a variety of individuals and organizations that we believed might have, or be able to lead us to, research reports useful for this study. These requests went to all U.S. government agencies that do security screening by polygraph, to polygraph websites known to us, and to leading researchers of all persuasions in the polygraph controversy. All contacted were additionally asked to forward our request to others who might also have information potentially useful to us. Finally, we periodically checked our growing bibliography against major published and unpublished bibliographies and reviews of the polygraph literature (e.g., Ansley, Horvath, and Barland, 1983; U.S. Office of Technology Assessment, 1983; Kircher, Horowitz, and Raskin, 1988; Urban, 1999; Ben-Shakhar, personal communication; Defense Information Systems Agency, 2001; U.S. Department of Defense Polygraph Institute, personal communication; Ben-Shakhar and Elaad, 2002). We sought out validation studies regardless of whether or not they had undergone peer review. Through this procedure, we attempted to be as inclusive as possible in collecting material to review, in order to limit publication bias and make our own judgments of research quality.
CRITICAL CHARACTERIZATION AND DATA EXTRACTION
The many documents we collected included 217 reports of 194 unique polygraph validation studies. These varied greatly in quality of research design, choice and standardization of measurement techniques, thoroughness of control for confounding variables, statistical analyses, and various other factors that affect their scientific value.
We used a four-stage process to select studies from the polygraph validation literature for qualitative evaluation and to extract data from those studies for quantitative summarization. The process involved: (1) initial staff screening of collected research reports by a set of basic criteria for acceptability and for special interest; (2) detailed reading of reports by committee members, with characterization by a larger set of criteria; (3) resolution of unresolved issues from initial staff screen and elimination of remaining redundant reports and those without appropriate data for baseline receiver operating characteristic (ROC) curve assessment; and (4) extraction of datasets for ROC assessment from remaining study reports. Stages (3) and (4) were performed by a subgroup of committee statisticians and staff.
Initial Staff Screen
Polygraph validation reports were reviewed by staff for conformity to six basic criteria of scientific acceptability and potential usefulness for baseline ROC assessment. The criteria were initially discussed by all involved staff and a committee research methodologist. Multiple reviewers evaluated a substantial selection of the reports and discussed and collectively resolved discrepancies, in the process clarifying policies for classification. The rest of the reports were evaluated by two staff coders, who discussed any discrepancies and agreed on classifications. We used six screening criteria:
Documentation of examination procedures sufficient to allow a basic replication. To meet this criterion, a study had to pass all the following tests:
Question selection. Studies passed if they specified the questions used for each polygraph test, provided a superset of questions from which the questions used were selected and a reproducible selection process, or otherwise provided enough detail about the method of question selection
or construction, as for instance in field application of a comparison question technique, to allow for essential replication of the process.
Physiological measures used. Studies passed if they specified the measures recorded (even if these were of questionable value).
Instrumentation. Studies passed if they specified the equipment used to collect physiological measures.
Scoring method. Studies passed if they specified how the physiological measures were converted to the dependent-variable measures that were compared to truth.
Independently determined truth. Studies passed if (a) guilt or innocence was predetermined by the conditions of an experiment or (b) in a nonexperimental study, if truth was defined by a confession, adjudication by a legal process, or review of the case facts by a panel who were uninformed about the results of the polygraph examination. An experimental study was defined as one in which guilt or innocence is manipulated by the researcher. Such studies may be carried out either in laboratories or in more realistic settings. In nonexperimental studies, examinees are tested with regard to crimes or transgressions committed in the world outside the laboratory, of which they are innocent or guilty.
Inclusion of both guilty and innocent individuals, as determined by criterion 2 (truth). Studies also passed this screen if they used a within-subjects design in which the same individual provided truthful and deceptive responses to highly similar questions.
Sufficient information for an accuracy analysis. Studies passed if: (a) scores were classified as deceptive, nondeceptive, and inconclusive (or the equivalent) for both innocent and guilty respondents; (b) inconclusive cases were absent because of an explicit decision rule that forced a definite choice on all cases; or (c) data were recorded on an ordinal, interval, or ratio scale, allowing for accuracy analysis with multiple cutoff points. Studies failed if charts that were scored inconclusive were rejected from the data analysis and not reported.
Scoring conducted with masking to truth. Experimental studies passed the screen if they stated or showed that both the polygraph examiners and scorers were kept unaware of the examinee’s guilt or innocence, even if the procedures to achieve this masking might have been flawed. Nonexperimental studies passed if scorers were kept uninformed about all case information relevant to determining truth, even if the original polygraph examiners were not uninformed. Studies using scoring methods that left no room for individual judgment (e.g., automated scoring methods) also passed.
Appropriate assignment to experimental groups germane to validity assessment (mainly, guilt and innocence). This criterion was ap-
plied only to experimental studies, and they passed if (a) they stated that or explained how subjects were randomly assigned to groups; (b) they compared truthful and deceptive responses of the same individual in a within-subjects design (e.g., concealed information technique studies); or (c) they put subjects in a situation that tempted them to guilty action and allowed them to choose whether or not to commit the acts or to deceive.
In applying the above criteria, staff were instructed to err in the direction of inclusiveness, by forwarding to the committee for full examination and resolution reports with ambiguities about whether the criteria were all met. In addition, reports that appeared to have uniquely interesting design features, that seemed particularly relevant to screening, or that considered other issues of particular importance to the committee’s charge (e.g., countermeasures, effects of examinee differences) were also forwarded to the committee even if they failed the above screening criteria.
Of the 217 reports reviewed in this initial screen, 23 were later found to be duplicate reports of the same research, leaving 194 unique studies.1 Staff forwarded 102 unique reports to the entire committee, which conducted a detailed review of them. Of the total, there were 61 studies that clearly satisfied the six criteria above. Reports of 41 other studies also received detailed review because they either (a) appeared to fail only one screen, with the possibility that the failure was due only to omission of a detail in the written report or, for observational field studies, an inherent logistical limitation; (b) considered factors of particular relevance on which literature was sparse (e.g., countermeasures); or (c) exhibited special features of research design that staff judged potentially important enough to justify further examination, despite failing the screen. These studies were provided to all committee members along with information on how they had been classified according to the screening criteria. Additional studies from the full list of 189 were made available to members as requested.
All committee members read many studies, with choices dictated by their particular interests and areas of expertise, testimony to the committee, and background readings. Committee meetings included comprehensive discussions of the body of literature and specific subsets of it. Designated subgroups reviewed and commented upon all reports in special categories of studies (e.g., of countermeasures).
Two members were assigned to review each of the 115 reports for-
warded from the initial screen. The assigned committee members classified each study with regard to 16 study characteristics.
Setting. Studies were categorized as laboratory or field studies. “Laboratory” refers to studies in a controlled environment using polygraph examinations conducted specifically for research purposes. “Field” refers to studies of polygraph performance using examinations conducted to detect deception primarily for practical purposes other than research, e.g., in criminal investigations, civil litigation, or employee screening.
Test format. Studies were classified as using comparison question, concealed information, relevant-irrelevant, or other techniques. Comparison question techniques include both probable-lie and directed-lie variants.
Question range. Studies were classified as to whether relevant questions referred to knowledge of specific facts or participation in specific events or, instead, addressed only categories of events, as is commonly the case with screening polygraphs.
Skin measurement. Studies that measured electrodermal response were classified as to whether skin conductance or skin resistance was recorded.
Primary outcome scale. Studies were classified in terms of whether polygraph results were reported in two categories (e.g., deception indicated or no deception indicated), in three categories (including an inconclusive category), in multiple categories indicating degrees of evidence pointing to deception or truthfulness, or as summary scores on numerical scales.
Masking to base rate. Studies were classified as to whether polygraph examiners or scorers knew the base rate of deceptive individuals in the examinee population.
Scoring reliability. Studies were placed in one of three categories based on the stringency of control for observer variability: human scoring without data on inter-rater reliability; multiple human scorers with inter-rater reliability data; or automated (computerized) scoring.
Consequences of test. Studies were classified according to the seriousness (trivial, moderate, or severe) of the reward for appearing nondeceptive and, separately, of the punishment for appearing deceptive.
Case selection. Scorers noted whether or not the examinees came from a defined population potentially allowing replication (e.g., military recruits, people tested in criminal investigations in a particular jurisdiction).
Truth. Field studies were classified by how truth was determined:
by confession, retraction, judicial procedures (including jury trials and jury-like panels), or other methods.
Documentation of research protocol. Scorers rated the research report as providing detailed and clear, adequate, or minimal documentation of the study procedures covered in screening criterion 1. “Detailed and clear” required use of generally sound methods.
Quality of data analysis. Scorers rated the quality of procedures used for analyzing the polygraph data as high, adequate, or low.
Internal validity. Scorers rated each study comprehensively on a 1-5 scale, with 1 representing the highest scientific standards for research methodology and 5 representing the minimum standards consistent with the initial screening criteria. Scorers considered the above factors, additional potential sources of bias and confounding, sample size, and discussion of their ramifications for conclusions.
Overall salience to the field. Each study was similarly rated 1-5, incorporating internal validity as well as broader issues. For experimental studies, considerable weight was given to external validity, including how well an experiment mimicked actual polygraph testing situations with regard to choices of engaging in or refraining from the target activity and to be deceptive or forthcoming and the consequences of being found deceptive on the test. Scorers also considered the importance of the measures and variables examined to the major practical questions concerning polygraph validity.
Funding. Studies were classified on the basis of information in the research reports as follows: intramural research funded by an agency with a polygraph program; extramural research funded by such an agency; extramural research funded by another source; research locally funded by an academic or research organization; and other or unable to determine.
Comparative analyses. Reviewers noted whether each study included internal comparisons on variables of special interest: examinees’ age, gender, or race; type of crime or transgression; levels of motivation or “stakes”; examinees’ psychopathology; use of countermeasures; or other internal comparisons.
Disagreements between qualitative categorizations were resolved by a third committee member acting as a judge or, in some cases, through discussion by the raters. Ordinal numerical scores within one unit on a five-point scale were averaged. Disparities of more than one unit were resolved by discussion among the reviewers, by averaging if such discussion did not produce a consensus or, in a few cases where this discussion was difficult to arrange, by adjudication by a third committee member.
Reviewers also extracted the basic data on polygraph accuracy pro-
vided by the study. Typically, these data could be conveyed in simple tabular form to show test outcomes for deceptive and nondeceptive examinees. If studies included multiple conditions or internal comparisons, either a primary summary table was extracted, or tables were reported for each of several conditions or subgroups. This process yielded from one to over a dozen datasets from the individual studies, depending on the number of conditions and subpopulations considered. Often, multiple datasets reflected the same subjects tested under different conditions or different scoring methods applied to the same polygraph examination results.
Resolution of Unresolved Issues and Extraction of Datasets for ROC Analysis
To gain a baseline impression of empirical polygraph validity, we used data primarily from the studies that passed the six first-stage screening criteria. After committee review of the reports passed on by staff with unresolved status in this regard, 74 were determined to satisfy the initial criteria. Those criteria were relaxed to allow 6 others that failed no more than one criterion, either on grounds of documentation or impracticality in a field context, and that came either from a source of particular relevance (U.S. Department of Defense Polygraph Institute, DoDPI) or exhibited features of special interest (e.g., field relevance). During this process, we identified redundant reports of the same study, and used the report with the most comprehensive data reporting or that reported data in a form most suitable for our purpose.
Some studies that had passed our screen and initially appeared suitable for ROC analysis were not ultimately used for this purpose. Specifically, studies that exclusively reported polygraph decisions made on the basis of averaging raw chart scores of multiple examiners were excluded. While this approach shares with computer scoring the laudable intent of reducing errors due to examiner variability, to our knowledge such a scoring method is never used in practice, and it will often exaggerate the validity of a single polygraph examination.
We also excluded, for this particular purpose, data from an otherwise interesting research category: studies of concealed information tests using subjects as their own controls that did not also include subjects who had no concealed information about the questions asked. These studies compared responses of research subjects to stimuli about which they had information to responses to other stimuli, in various multiple-choice contexts. In them, each examinee was deceptive to some questions and nondeceptive to others. Some of these studies were quite strong in the sense of controlling internal biases and quite convincing in demonstrat-
ing a statistical association between polygraph responses and deception in uncontaminated laboratory settings. However, various design features of these studies seriously limited the relevance of their measurements of polygraph accuracy. Some of them designated deception or truthfulness based on relative rankings within a group of examinees rather than for an individual or used extremely artificial stimulus sets (e.g., playing cards or family names). Most importantly, these studies lacked uncontaminated nondeceptive control subjects, so that their assessments of accuracy are based on a priori assumptions about how such subjects would have responded, and do not account for the possibility that nondeceptive examinees may respond differentially to stimuli that commonly have emotional connotations even for nondeceptive individuals.
Since our purpose was to use multiple studies to get a general sense of polygraph accuracy, we excluded from this analysis studies in which examinees came only from population subgroups distinguished by psychological aberration. Finally, we excluded from the quantitative analysis any study with fewer than five individuals in either the deceptive or nondeceptive groups, on the grounds that results from such studies were inherently too statistically unstable to provide much useful information.
This winnowing process left 57 unique studies (listed below) judged useful for gaining a general sense of polygraph validity through ROC analysis. Most of these studies reported multiple datasets. To avoid implicitly weighting studies by the multiplicity of conditions and subgroups considered, in all but two instances (noted in 3 below) we extracted only one set of validation data for further examination from each study from which reviewers had reported multiple datasets. These datasets were determined by one or more committee members and the consultant, working under the following rules:
Multiple polygraph channels. In studies that evaluated polygraph tracings from separate channels independently and reported the results separately, we used the results based on the composite of all tracings if these were reported, and the results based on skin conductance/resistance if no composite results were provided. Studies comparing the contributions of skin resistance, cardiovascular, and respiratory responses have generally found skin resistance to have the most discriminating power of the polygraph channels and most have found the additional contributions of cardiovascular and respiratory responses to be modest.
Demographically distinct subgroups. Results from demographic subgroups tested under the same conditions were pooled, after excluding subgroups selected for extreme deviancy, such as psychopaths. While deviant subgroups were potentially of interest in their own regard, they
were considered inappropriate for a core evaluation of polygraph validity.
Subgroups tested under different conditions. Results from subgroups tested under different conditions (e.g., variants of the same questioning method, different sets of instructions or methods of psychological preparation, modestly different mock crime scenarios) were pooled. Statistically important differences in results of such variants were rare. For studies contrasting major variants against testing under a standard paradigm, such as a probable lie comparison question test, we used data from the control group tested using the standard paradigm. Two reports included data from administration of comparison question and concealed information polygraph tests to different groups of subjects. We extracted one dataset for each type of testing procedure from each of these two studies. In studies of countermeasures in which certain groups were instructed to use countermeasures, we used data only from examinees who were not instructed to use countermeasures. In studies of “spontaneous” countermeasure use by examinees who were not instructed to use countermeasures, we pooled all examinees.
Different scoring methods or scorers. Data from human scorers masked to information about truth were selected in preference to those from human scorers not so masked, such as the original examiner. Results from masked scorers separate the information in the polygraph charts from other information present during the examination (e.g., demeanor) or in the examinee’s history (e.g., past offenses) that might influence expectations of the scorer and hence scoring results.
Despite the fact that computer scoring shares these advantages with masked human scoring, we chose the results of a human scorer over those of computer scoring when both were available, even when the human scorer was not masked. Computers are not commonly used for primary scoring in current polygraph practice. In the studies we reviewed, computer scoring was not noticeably superior to human scoring except on data used to train the computer, where computer success rates are known to be spuriously elevated. (See Appendix F for more detailed discussion of issues involving studies of computer scoring.)
Some studies reported separate results of multiple human scorers in the same generic category, e.g., three masked scorers. In such cases, the proportions of examinees allocated to each decision category were averaged across examiners to form a single dataset. Some studies reported results of different methods of scoring, for instance, variations in the cutoffs applied to summary scores from charts to distinguish those that suggested deception from nondeceptive or inconclusive charts. Often these scoring methods were applied to the same set of charts. In such instances,
we chose data reflecting the “control,” that is, the most widely accepted scoring paradigm.
Indistinguishable datasets. In a very few (< 5) instances, multiple (usually two) datasets remained with none taking precedence on the above grounds. In these instances, the dataset most favorable to polygraph testing was used.
This stage of review was accomplished by a small subgroup of committee members, staff, and the consultant, under oversight of a committee member specializing in research methodology.
INTEGRATION OF RESULTS AND CONTEXTUAL EVALUATION
We have conducted a systematic review but not a meta-analysis. A meta-analysis is a systematic review that integrates the compiled results of either the totality of selected studies or homogeneous subgroups of them into one or a few simple numerical summaries, each of which usually addresses both statistical significance (e.g., p-value) and the magnitude of an observed relationship (effect size). The best meta-analyses also include a search for systematic explanations of heterogeneity in the results of the studies compiled. Initially proposed to overcome the sample size limitations of individual studies and misinterpretations of negative statistical hypothesis tests, meta-analysis has seen widespread application as a general tool for research synthesis in the social and health sciences. Others have made efforts to do meta-analyses for all or part of the literature on the use of the polygraph for the detection of deception or the presence of concealed information (e.g., see Kircher et al., 1988; Urban, 1999; and Ben-Shakhar and Elaad, 2002). We have not attempted such numerical reduction here. In view of the widespread expectation that critical literature reviews lead to such comprehensive summaries, we offer some explanation for this decision.
There are both technical and substantive reasons for not using meta-analytic methods in this report. We do not use these methods in part because the literature does not allow us to deal adequately with the heterogeneity of the available studies. The laboratory studies employ instruments measuring different physiological parameters, multiple scales of measurement and systems of scoring, varying methods of interviewing, examiners of different levels of experience, and multiple study populations. The field studies present all these kinds of heterogeneity and more: they include variation within studies in the deceptions of concern, in examiners’ expectancies, and in multiple unrecorded aspects of the social interaction during the polygraph examination. Appropriate meta-analytic summaries would handle this diversity either by hypothesizing that
these variations do not affect the relationship between polygraph measurement and deception and empirically testing this hypothesis, or by modeling heterogeneity across the studies as a random effect around some central measure of the relationship’s strength, perhaps also correcting estimates of the observed variability in effect sizes for sampling error, which is likely to be a serious concern in a research literature where small samples are the norm. However, the literature contains too few studies of adequate quality to allow meaningful statistical analysis of such hypotheses or models. Without such analysis, it is not clear that there is any scientific or statistical basis for aggregating the studies into a single population estimate. Were such an estimate obtained, it would be far from clear what combination of population and polygraph test conditions it would represent.
Our main substantive concern is with the relevance of the available literature to our task of reviewing the scientific evidence on polygraph testing with particular attention to national security screening applications. There is only a single study that provides directly relevant data addressing the performance of the polygraph in this context (Brownlie et al., 1998), and because it uses global impressionistic scoring of the polygraph tests, its data do not meet our basic criteria for inclusion in the quantitative analysis. The great majority of the studies address the accuracy of specific-issue polygraph tests for revealing deception about specific criminal acts, real or simulated. Even in the few studies that simulate security screening polygraph examinations, the stakes are low for both the examiners and the examinees, the base rate for deception is quite high (that is, the examiners know that there is a high probability that the examinee is lying), and there is little or no ambiguity about ground truth (both examiners and examinees know what the specific target transgression is, and both are quite clear about the definitions of lying and truthfulness). Given the dubious relevance to security screening of even the closest analog studies, as well as the heterogeneity of the literature, we do not believe there is anything to be gained by using precise distributional models to summarize their findings.
Rather than developing and testing meta-analytic models, we have taken the simpler and less potentially misleading approach of presenting descriptive summaries and graphs. The studies vary greatly in quality and include several with extreme outcomes due to sampling variability, bias, or non-generalizable features of the study design. Thus, we do not give much weight to the studies with outcomes at the extremes of the group, and summarize the sample of studies with values of the accuracy index (A) that are most representative of the distribution of study outcomes—the median and the interquartile range. As Chapter 5 and Appendix I show, such a tabulation reveals sufficiently for our purposes the
main things the empirical research shows about the accuracy of polygraph testing, particularly inasmuch as the literature does not adequately represent the performance of polygraph tests in screening contexts.
The duplications usually involved master’s theses, Ph.D. dissertations, or agency reports that were subsequently published.
STUDIES INCLUDED IN QUANTITATIVE ANALYSIS
Barland, G.H., and D.C. Raskin 1975 An evaluation of field techniques in detection of deception. Psychophysiology 12(3):321-330.
Barland, G.H., C.R. Honts, and S.D. Barger 1989 Studies of the Accuracy of Security Screening Polygraph Examinations. Report No. DoDPI89-R-0001. Fort McClellan, AL: U.S. Department of Defense Polygraph Institute.
Ben-Shakhar, G., and K. Dolev 1996 Psychophysiological detection through the Guilty Knowledge technique: Effects of mental countermeasures. Journal of Applied Psychology 67(6):701-713.
Blackwell, N.J. 1996 PolyScore: A Comparison of Accuracy. Fort McClellan, AL: U.S. Department of Defense Polygraph Institute.
1998 PolyScore 3.3 and Psychophysiological Detection of Deception Examiner Rates of Accuracy When Scoring Examinations from Actual Criminal Investigations. Report DoDPI97-R-0006. Fort McClellan, AL: U.S. Department of Defense Polygraph Institute.
Bradley, M.T. 1988 Choice and the detection of deception. Perceptual and Motor Skills 66(1):43-48.
Bradley, M.T., and M.C. Cullen 1993 Polygraph lie detection on real events in a laboratory setting. Perceptual and Motor Skills 76(3/Pt. 1):1051-1058.
Bradley, M.T., and M.P. Janisse 1981 Accuracy demonstrations, threat, and the detection of deception: Cardiovascular, electrodermal, and papillary measures. Psychophysiology 18(3):307-315.
Bradley, M.T., and K.K. Klohn 1987 Machiavellianism, the Control Question Test and the detection of deception. Perceptual and Motor Skills 64:747-757.
Bradley, M.T., and J. Rettinger 1992 Awareness of crime-relevant information and the Guilty Knowledge Test. Journal of Applied Psychology 77(1):55-59.
Bradley, M.T., V.V. MacLaren, and S.B. Carle 1996 Deception and nondeception in Guilty Knowledge and Guilty Actions polygraph Tests. Journal of Applied Psychology 81(2):153-160.
Craig, R.A. 1997 The Use of Physiological Measures to Detect Deception in Juveniles: Possible Cognitive Developmental Influences. A Ph.D. dissertation submitted to the faculty of the Department of Psychology, The University of Utah.
Davidson, P.O. 1968 Validity of the guilty-knowledge technique: The effects of motivation. Journal of Applied Psychology 52(1):62-65.
Dawson, M.E. 1980 Physiological detection of deception: Measurement of responses to questions and answers during countermeasure maneuvers. Psychophysiology 17(1):8-17.
Driscoll, L.N., C.R. Honts, and D. Jones 1987 The validity of the positive control physiological detection of deception technique. Journal of Police Science and Administration 15(1):46-50.
Elaad, E., and M. Kleiner 1990 Effects of polygraph chart interpreter experience on psychophysiological detection of deception. Journal of Police Science and Administration 17:115-123.
Giesen, M., and M.A. Rollison 1980 Guilty knowledge versus innocent associations: Effects of trait anxiety and stimulus context on skin conductance. Journal of Research in Personality 14:1-11.
Hammond, D.L. 1980 The Responding of Normals, Alcoholics and Psychopaths in a Laboratory Lie-Detection Experiment. A Ph.D. dissertation submitted to the California School of Professional Psychology, San Diego.
Honts, C.R. 1986 Countermeasures and the Physiological Detection of Deception: A Psychophysiological Analysis. A Ph.D. dissertation submitted to the faculty of the Department of Psychology, The University of Utah.
Honts, C.R., and S. Amato 1999 The Automated Polygraph Examination: Final Report to the Central Intelligence Agency. Applied Cognition Research Insititute. Boise, ID: Boise State University.
Honts, C.R., and B. Carlton 1990 The Effects of Incentives on the Detection of Deception. Report No. DoDPI90-R-0003. Fort McClellan, AL: U.S. Department of Defense Polygraph Institute.
Honts, C.R., and M.K. Devitt 1992 Bootstrap Decision Making for Polygraph Examinations. Report No. DoDPI92-R-0002. Fort McClellan, AL: U.S. Department of Defense Polygraph Institute.
Honts, C.R., and D.C. Raskin 1988 A field study of the validity of the directed lie control question. Journal of Police Science and Administration 16(1):56-61.
Honts, C.R., S. Amato, and A. Gordon 2000 Validity of Outside-Issue Questions in the Control Question Test. Final Report on Grant No. N00014-8-1-0725. Boise, ID: The Applied Cognition Research Institute.
Honts, C.R., R.L. Hodes, and D.C. Raskin 1985 Effects of physical countermeasures on the physiological detection of deception. Journal of Applied Psychology 70(1):177-187.
Honts, C.R., D.C. Raskin, and J.C. Kircher 1987 Effects of physical countermeasures and their electromyographic detection during polygraph tests for deception. Journal of Psychophysiology 1(3):241-247.
Honts, C.R., M.K. Devitt, M. Winbush, and J.C. Kircher 1996 Mental and physical countermeasures reduce the accuracy of the concealed knowledge test. Psychophysiology 33:84-92.
Horowitz, S.W. 1989 The Role of Control Questions in Physiological Detection of Deception. A Ph.D. dissertation submitted to the faculty of the Department of Psychology, The University of Utah.
Iacono, W.G., G.A. Boisvenu, and J.A. Fleming 1984 Effects of diazepam and methylphenidate on the electrodermal detection of guilty knowledge. Journal of Applied Psychology 69(2):289-299.
Iacono, W.G., A.M. Cerri, C.J. Patrick, and J.A.E. Fleming 1992 Use of antianxiety drugs and countermeasures in the detection of guilty knowledge. Journal of Applied Psychology 77(1):60-64.
Ingram, E.M. 1994 Effects of Electrodermal Lability and Anxiety on the Electrodermal Detection of Deception with a Control Question Technique. Report No. DoDPI94-R-0004. Fort McClellan, AL: U.S. Department of Defense Polygraph Institute.
1996 Test of a Mock Threat Scenario for Use in the Psychophysiological Detection of Deception: I. Report No. DoDPI96-R-0003. Fort McClellan, AL: U.S. Department of Defense Polygraph Institute.
1996 Test of a Mock Threat Scenario for Use in the Psychophysiological Detection of Deception: III. Report No. DoDPI97-R-0003. Fort McClellan, AL: U.S. Department of Defense Polygraph Institute.
1998 Test of a Mock Theft Scenario for Use in the Psychophysiological Detection of Deception: VI. Report No. DoDPI98-R-0002. Fort McClellan, AL: U.S. Department of Defense Polygraph Institute.
1998 Test of a Mock Theft Scenario for Use in the Psychophysiological Detection of Deception: VII. Report No. DoDPI98-R-0003. Fort McClellan, AL: U.S. Department of Defense Polygraph Institute.
Jayne, B.C. 1990 Contributions of physiological recordings in the polygraph technique. Polygraph 19(2):105-117.
Kircher, J.C., and D.C. Raskin 1988 Human versus computerized evaluations of polygraph data in a laboratory setting. Journal of Applied Psychology 73(2):291-302.
Lykken, D.T. 1959 The GSR in the detection of guilt. Journal of Applied Psychology 43(6):385-388.
Matte, J.A., and R.M. Reuss 1989 A field validation study of the quadri-zone comparison technique. Polygraph 18(4):187-202.
O’Toole, D., J.C. Yuille, C.J. Patrick, and W.G. Iacono 1994 Alcohol and the physiological detection of deception: Arousal and memory influences. Psychophysiology 31:253-263.
Patrick, C.J., and W.G. Iacono 1989 Psychopathy, threat, and polygraph test accuracy. Journal of Applied Psychology 74(2):347-355.
1991 Validity of the control question polygraph test: The problem of sampling bias. Journal of Applied Psychology 76(2):229-238.
Podlesny, J.A., and J.C. Kircher 1999 The Fianpres (volume clamp) recording method in psychophysiological detection of deception examinations. Forensic Science Communications 1(3).
Podlesny, J.A., and D.C. Raskin 1978 Effectiveness of techniques and physiological measures in the detection of deception. Psychophysiology 15(4):344-359.
Podlesny, J.A., and C.M. Truslow 1993 Validity of an expanded-issue (Modified General Question) polygraph technique in a simulated distributed-crime-roles context. Journal of Applied Psychology 78(5):788-797.
Raskin, D.C., and R.D. Hare 1978 Psychopathy and detection of deception in a prison population. Psychophysiology 15(2):126-136.
Raskin, D.C., and J.C. Kircher 1990 Development of a Computerized Polygraph System and Physiological Measures for Detection of Deception and Countermeasures: A Pilot Study (Preliminary Report). Contract No. 88-L655330-000. Salt Lake City, UT: Scientific Assessment Technologies, Inc.
Reed, S. no date TES Expansion Study. Unpublished paper. U.S. Department of Defense Polygraph Institute , Fort McClellan, AL.
Rovner, L.I. 1979 The Effects of Information and Practice on the Accuracy of Physiological Detection of Deception. A Ph.D. dissertation submitted to the faculty of the Department of Psychology, The University of Utah.
Stern, R.M., J.P. Breen, T. Watanabe, and B.S. Perry 1981 Effect of feedback of physiological information on responses to innocent associations and guilty knowledge. Journal of Applied Psychology 66:677-681.
Suzuki, A., K. Ohnishi, K. Matsuno, and M. Arasuma 1979 Amplitude rank score analysis of GSR in the detection of deception: Detection rates under various examination conditions. Polygraph 8:242-252.
U.S. Department of Defense Polygraph Institute 1998a Test of a Mock Theft Scenario for Use in the Psychophysiological Detection of Deception: IV. Report No. DoDPI97-R-0007. Ft. McClellan, AL: U.S. Department of Defense Polygraph Institute.
1998 Test of a Mock Theft Scenario for Use in the Psychophysiological Detection of Deception: V. Report No. DoDPI98-R-0001. Fort McClellan, AL: U.S. Department of Defense Polygraph Institute.
Waid, W.M., S.K. Wilson, and M.T. Orne 1981 Cross-modal physiological effects of electrodermal lability in the detection of deception. Journal of Personality and Social Psychology 40(6):1118-1125.
Waid, W.M., E.C. Orne, M.R. Cook, and M.T. Orne 1981 Meprobamate reduces accuracy of physiological detection of deception. Science 212:71-72.
Yankee, W.J. 1993 An Exploratory Study of the Effectiveness of Event-Related Stimuli as a Control Procedure in the Psychophysiological Detection of Deception. Report No. DoDPI93-R-0003. Fort McClellan, AL: U.S. Department of Defense Polygraph Institute.
Yankee, W.J., and D. Grimsley 2000 Test and retest accuracy of a psychophysiological detection of deception test. Polygraph 29(4):289-298.
Ansley, N., F. Horvath, and G.H. Barland 1983 Truth and Science: A Bibliography (Second Edition). Lithicum Heights, MD: American Polygraph Association.
Ben-Shakhar, G., and E. Elaad 2002 The Validity of Psychophysiological Detection of Information with the Guilty Knowledge Test: A Meta-analytic Review. Unpublished manuscript. Hebrew University, Israel.
Brownlie, C., G.J. Johnson, and B. Knill 1998 Validation Study of the Relevant/Irrelevant Screening Format. Unpublished paper. National Security Agency, Baltimore, MD.
Defense Information Systems Agency 2001 Technical Report Bibliography: Polygraph. Search Control No. (T95332 01/06/ 12 – BIB). Unclassified. U.S. Department of Defense.
Kircher, J.C., S.W. Horowitz, and D.C. Raskin 1988 Meta-analysis of mock crime studies of the control question polygraph technique. Law and Human Behavior 12(1):79-90.
Urban, G.D. 1999 A Meta-Analysis of the Validity of the Polygraph for the Detection of Deception. Unpublished manuscript. Northern Michigan University.
U.S. Office of Technology Assessment 1983 Scientific Validity of Polygraph Testing: A Research Review and Evaluation, A Technical Memorandum. OTA-TM-H-15, NTIS order #PB84-181411. Washington, DC: U.S. Government Printing Office.