adjudication setting, when other evidence indicates that the likelihood of a true positive outcome is high, because any set percentage of false positive errors will cost less when there are few negative cases to get wrong. In a screening setting, when the base rate of truly positive cases is low, a suspicious cutoff like S will lead to a very large number of false positives.

It is important to note here that accuracy and decision thresholds have very different practical implications depending on the base rate of the target population being tested. A test that may be acceptable for use on a population with a high base rate of deceivers (e.g., criminal suspects) may look much less attractive for use with a low base-rate population (e.g., employees in a nuclear weapons laboratory, because of the inherent properties of accuracy and thresholds.) This generalization, which holds true for all diagnostic techniques, is illustrated in Table 2-1 for a test with an accuracy of A = 0.90 and deceivers in two base rates of deception (see Chapter 6 for more detailed discussion). Table 2-1A shows the results of using this test with a threshold that correctly identifies 80 percent of deceivers on two hypothetical populations. In a population of 10,000 criminal suspects of whom 5,000 are expected to be guilty, the test will identify 4,800 examinees (on average) as deceptive, of whom 4,000 would actually be guilty. The same test, used to screen 10,000 government employees of whom 10 are expected to be spies, will identify an average of 1,606 as deceptive, of whom only 8 would actually be spies. Table 2-1B and Table 2-1C show that the high number of false positives in the screening situation can be reduced by changing the threshold, but the result is that more of the spies will get through the screen.

Empirical Variation in Decision Threshold

As already noted, polygraph examiners may vary considerably in the decision thresholds they apply. A study by Szucko and Kleinmuntz (1981) gives an idea of the variation in threshold that can occur across experienced polygraph interpreters under controlled conditions. In their mock crime study, six interpreters viewed the physiological data (the charts) of 30 individuals (15 guilty and 15 innocent) and made judgments on an eight-category scale of their confidence that a given subject was guilty or not. The eight-category scale allows for seven possible thresholds for dividing the charts into groups judged truthful or deceptive. An indication of the results of using different decision thresholds among polygraph interpreters is the false positive proportions that would result if each interpreter had set the threshold at the fifth of the seven possible thresholds and had made yes/no, binary judgments at that cutoff. Then the proportion of false positives would have varied across interpreters by almost 0.50—from 0.27 for the most conservative interpreter to 0.76 for

The National Academies of Sciences, Engineering, and Medicine
500 Fifth St. N.W. | Washington, D.C. 20001

Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement