In practice, test-retest reliability can be affected by memory effects, the effects of the experience of testing on the examinee, the effects of the experience on the examiner, or all of these effects.


In most applications of the comparison question technique, for example, examiners select comparison questions on the basis of information gained in the pretest interview that they believe will produce a desired level of physiological responsiveness in examinees who are not being deceptive on the relevant questions. It is plausible that tests using different comparison questions—for example, tests by different examiners with the same examinee—might yield different test results (compromising test-retest reliability). Little research has been done on the test-retest reliability of comparison question polygraph tests. Some forms of the comparison question test, notably the Test of Espionage and Sabotage used in the U.S. Department of Energy’s security screening program, offer examiners a very limited selection of possible relevant and comparison questions in an attempt to reduce variability in a way that can reasonably be expected to benefit test-retest reliability in comparison with test formats that allow an examiner more latitude.


The polygraph examination for preemployment or preclearance screening may have other purposes than the diagnostic purpose served by the test. For example, an employer may want to gain knowledge of information about the applicant’s past or current situations that might be used to “blackmail” the individual into committing security violations such as espionage, but that could not be used in this way if the employer already had the information.


Policies for use of the polygraph in preemployment screening vary considerably among federal agencies.


We were told that the FBI administered approximately 27,000 preemployment polygraph examinations between 1994 and 2001. More than 5,800 of these tests (21 percent) led to the decision that the examinee was being deceptive. Of these, almost 4,000 tests (approximately 69 percent of “failures”) involved obtaining direct admissions of information that disqualified applicants from employment (about 2,300 tests) or of information not previously disclosed in the application process that led to a judgment of deceptiveness (about 1,700 tests). More than 1,800 individuals who did not provide direct admissions also were judged deceptive; the proportion of these attributed to detected or suspected countermeasures is not known. Thus, only the remainder of those judged deceptive—less than 1,800—resulted from the direct and unambiguous result of readings of the polygraph chart.


The false positive index is not commonly used in research on medical diagnosis but seems useful for considering polygraph test accuracy.


Many statistics other than the ROC accuracy index (A) might have been used, but they have drawbacks relative to A as a measure of diagnostic accuracy. One class of measures of association assumes that the variances of the distributions of the two diagnostic alternatives are equal. These include the d’ of signal detection theory (also known as Cohen’s d). These measures are adequate when the empirical ROC is symmetrical about the negative diagonal of the ROC graph, but empirical curves often deviate from the symmetrical form. The measures A and d’ are equivalent in the special case of symmetric ROCs, but even then A has the conceptual advantage of being bounded, by 0.5 and 1.0, while d’ is unbounded. Some measures of association, such as the log-odds ratio and Yule’s Q, depend only on the internal four cells of the 2-by-2 contingency table of test results and true conditions (e.g., their cross product) and are independent of the table’s marginal totals. Although they make no assumptions about

The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement