The following HTML text is provided to enhance online
readability. Many aspects of typography translate only awkwardly to HTML.
Please use the page image
as the authoritative form to ensure accuracy.
The Polygraph and Lie Detection
and Axciton Systems, Inc., provided the CPS developers with a text-formatted version of the data (see below).
Dollins and associates (Dollins, Krapohl, and Dutton, 2000) report that there were no statistically significant differences in the classification powers of the algorithms. All programs agreed in correctly classifying 36 deceptive and 16 nondeceptive cases. And all incorrectly classified the same three nondeceptive cases, but there was not a single case that all algorithms scored as inconclusive. CPS had the greatest number of inconclusive cases and the least difference between the false positive and false negative rates. Four other algorithms all showed tendencies toward misclassifying a greater number of innocent subjects. The results, summarized in Table F-3, show false negative rates ranging from 10 to 27 percent and false positive rates of 31 to 46 percent (if inconclusives are included as incorrect decisions).
As Dollins and colleagues (Dollins, Krapohl, and Dutton, 2000) point out, there are a number of problems with their study. The most obvious is a sampling or selection bias associated with the cases chosen for evaluation. The data were submitted by various federal and nonfederal agencies to the DoDPI and most of these were correctly classified by the original examiner and are supported by confessions. This database is therefore not representative of any standard populations of interest. If the analyzed cases correspond, as one might hypothesize given that they were “correctly” classified by the original examiner, to the easy classifiable tests, then one should expect all algorithms to do better on the test cases than in uncontrolled settings. Because all algorithms produce relatively high rates of inconclusive tests even in such favorable circumstances, performance with more difficult cases is likely to degrade. There was no control over the procedures that the algorithm developers used to classify these cases, and they might have used additional editing and manual
TABLE F-3 Number of Correct, Incorrect, and Inconclusive Decisions by Subject’s Truth