The Scientific Basis for Polygraph Testing
Evidence relevant to the validity of polygraph testing can come from two main sources: basic scientific knowledge about the processes the polygraph measures and the factors influencing those processes, and applied research that assesses the criterion validity or accuracy of polygraph tests in particular settings. This chapter considers the first kind of evidence; the second is considered in Chapters 4 and 5.
We begin by discussing the importance of establishing a solid scientific basis, including empirically supported theory, for detection of deception by polygraph testing. We then present the main arguments that have been used to provide theoretical support for polygraph testing and evaluate them in relation to current understanding of human psychological and physiological responses. We also consider arguments based on current knowledge of psychology and physiology that raise questions about the validity of inferences of deception made from polygraph measures. We conclude with an assessment of the strength of the scientific base for polygraph testing.
THE SCIENTIFIC APPROACH
To an investigator interested in practical lie detection, basic science may seem irrelevant. The essential question is whether a technique works in practice: whether it provides information about guilty or deceptive individuals that cannot be obtained from other available techniques. As Chapter 2 makes clear, however, it can be very difficult in field situations
to determine scientifically whether or how well the polygraph (or any other technique for the psychophysiological detection of deception) “works.” The appropriate criterion of validity can be slippery; truth is often hard to determine; and it is difficult to disentangle the roles of physiological responses, interrogators’ skill, and examinees’ beliefs in order to make clear attributions of practical results to the validity of the test. Given all these confounding factors in the case evidence, even the most compelling anecdotes from practitioners do not constitute significant scientific evidence.
Evidence of scientific validity is essential to give confidence that a test measures what it is supposed to measure. Such evidence comes in part from scientifically collected data on the diagnostic accuracy of a test with certain examiners and examinees. Evidence of accuracy is critical to test validation because it can demonstrate that the test works well under specific conditions in which it is likely to be applied. Evidence of accuracy is not sufficient, however, to give confidence that a test will work well across all examiners, examinees, and situations, including those in which it has not been applied. This limitation is important whenever a test is used in a situation or on a population of examinees for which accuracy data are not available and especially when scientific knowledge suggests that the test may not perform in the same way in the new situation or with the new population. This limitation of accuracy data is particularly serious for polygraph security screening because the main target populations, such as spies and terrorists, have not been and cannot easily be subjected to systematic testing. Confidence in polygraph testing, especially for security screening, therefore also requires evidence of its construct validity, which depends, as we have noted, on an explicit and empirically supported theory of the mechanisms that connect test results to the phenomenon they purport to be diagnosing. A test with good construct validity is one that uses methods that are defensible in light of the best theoretical and empirical understanding of those mechanisms, the external factors that may alter the mechanisms and affect test results, and the measurement issues affecting the ability to detect the signal of the phenomenon being measured and exclude extraneous influences. Only to the extent that a diagnostic test meets these construct validity criteria can one have confidence that it will work well in new situations and with different kinds of examinees.
A well supported theory of the test is also essential to provide confidence that the test will work well in the face of efforts examinees may make to produce a false negative result. Spies and terrorists may be strongly motivated to learn countermeasures to polygraph tests and may develop potential countermeasures that have not been studied. To have confidence that such measures will fail or will be detected requires basic
understanding of the physiological measures used in polygraph testing and of the ways they respond to various intentional activities of examinees. Issues of construct validity such as these are likely to arise in courts operating under Daubert and the Federal Rules of Evidence or under analogous state rules, which require that the admissibility of evidence be judged on the basis of the validity of the underlying scientific methods (see Saxe and Ben-Shakhar, 1999).
For polygraph lie detection, scientific validity rests on the strength of evidence supporting all the inferential links between deception and the test results. Inferences from polygraph tests presume that deception on relevant questions uniquely causes certain psychological states different from those caused by comparison questions, that those states are tied to certain physiological concomitants, that those physiological responses are the ones measured by the polygraph instrument, that polygraph scoring systems reflect the deception-relevant aspects of the physiological responses, and that the interpretation of the polygraph scores is appropriate for making the discrimination between deception and truthfulness.1 Inferences also presume that factors unrelated to deception do not interfere with this chain of inference so as to create false test results that misdiagnose the deceptive as truthful or vice versa.
A knowledge base to support the scientific validity of polygraph testing is one that adequately addresses those inferences. It would include evidence that answers such questions as the following:
Are the procedures used to measure the physiological changes said to be associated with deception standardized and scientifically valid?2
Does the act of deception reliably cause identifiable changes in the physiological processes the polygraph measures (e.g., electrodermal, cardiovascular)?
Is deception the only psychological state that would cause these physiological changes in the context of the polygraph test?
Does the type of lie (rehearsed, spontaneous) affect the nature of the physiological changes?
If the correlation between deception and the physiological response is not perfect, what are the mechanisms by which a truthful response can produce a false positive?
Considering such mechanisms, how can the test procedure minimize the chances of false positive results?
If the correlation between deception and the physiological response is not perfect, what are the mechanisms by which a deceptive response could produce a false negative result (i.e., mechanisms that would allow for effective countermeasures)?
Considering such mechanisms, how can the test procedure minimize the chances of false negative results?
Are the mechanisms relating deception to physiological responses universal for all people who might be examined, or do they operate differently in different kinds of people or in different situations? Is it possible that measured physiological responses do not always have the same meaning or that a test that works for some kinds of examinees or situations will fail with others?
How might the test results be affected by the examinee’s personality or frame of mind? For example, can recent stress change the likelihood that an examinee will be judged deceptive?
How might expectancies and personal interactions between an examiner and an examinee affect the reliability and validity of the physiological measurements? For example, might a test result have been different if a different examiner had given the test?
How might the wording or presentation of the relevant or comparison questions affect an examinee’s differential physiological responses? For example, if a test procedure gives the examiner latitude in formulating relevant or comparison questions, might the test results be affected by the particular questions that are used?
Which theory of psychophysiological detection of deception has the strongest scientific support? Which testing procedures are most consistent with this theory?
These questions are central to developing an approach to the psychophysiological detection of deception that is scientifically justified and that deserves the confidence of decision makers. Although many of the questions are in the realms of basic science in psychology, physiology, and measurement, answering them also has major practical importance. For example, a well-supported theory of the physiological detection of deception can clarify how much latitude, if any, examiners can be given in question construction without undermining the validity of the test. It may also specify countermeasures by which an examinee can act intentionally to create false readings that lead to misinterpretations of polygraph results and thus can help examiners anticipate their use and develop counterstrategies. Research focused only on establishing accuracy does not provide an adequate basis for confidence in a test because it inevitably leaves many critical questions unanswered. Consider, for example, some inherent limitations of a standard research approach in which some individuals are asked to lie about a mock crime they have committed and the polygraph is used to distinguish those examinees from others who have only witnessed the mock crime or who have no knowledge of it. If the polygraph performs well in this experiment, one can only
conclude that it “works” for people like the examinees in situations like the mock crime. There would be many unanswered questions, including:
Would the physiological responses be the same if the crime had been real?
Would the test procedure perform as well if the deceptive examinees had been coached in ways to make it difficult for examiners to discriminate between their responses to relevant and comparison questions?
Would the test procedure have performed as well if the examinees had been from different cultural backgrounds?
Would the test procedure work as well for the people most likely to commit the target infractions as for other people (for example, are there systematic differences between these groups of people that could affect test results)?
Would a polygraph test procedure that performs well in specificevent investigations perform as well in a screening setting, when the relevant questions must be asked in a generic form?
Would different examiners who constructed the relevant and comparison questions in slightly different ways have produced equally good results?
Such questions can sometimes be answered by additional research, for instance, using different kinds of examinees or training some of them in countermeasures. But it is never possible to test all the possible kinds of examinees or countermeasures. A solid theoretical and scientific base is also valuable for improving a test because it can identify the most serious threats to the test’s validity and the kinds of experiments that need to be conducted to assess such threats; it can also tell researchers when further experiments are unlikely to turn up any new knowledge. In such ways, a solid scientific base is important for developing confidence in any technique for the psychophysiological detection of deception and critical for any technique that may be used for security screening.
THEORIES OF POLYGRAPH TESTING
Polygraph specialists have engaged in extensive debate about theories of polygraph questioning and responding in the context of a controversy about the validity of comparison question versus concealed information test formats. We are more impressed with the similarities among polygraph testing techniques than with the differences, although some of the differences are important, as we note at appropriate places in this and the following chapters. The most important similarities concern the physiological responses measured by the polygraph instrument, which are es-
sentially the same across test formats. Factors that affect these physiological responses, including many factors unrelated to deception or attempts to conceal knowledge, have similar implications for the validity of all tests that measure those responses.
Polygraph practice is built on comparing physiological responses to questions that are considered relevant to the investigation at hand, which evoke a lie from someone who is being deceptive, with responses to comparison questions to which the person responds in a presumably known way (e.g., tells the truth or a probable or directed lie). The responses are compared only for one individual because it is recognized that there are individual differences in basal physiological functioning, physiological reactivity, and physiological response hierarchies (for more information, see Davidson and Irwin, 1999; Cacioppo et al., 2000; Kosslyn et al., 2002). Because of individual differences, the absolute magnitude of an individual’s physiological response to a relevant question cannot be a valid indicator of the truthfulness of a response.
According to contemporary theories of polygraph questioning, individuals who are being deceptive or truthful in responding to relevant questions show different patterns of physiological response when their reactions to relevant and comparison questions are compared. In the relevant-irrelevant test format, the theory is that a guilty person, who is deceptive only to the relevant questions, will react more to those questions; in contrast, an innocent person, who is truthful about all questions, will not respond differentially to the relevant questions. In the comparison question format, a guilty person lies both to the relevant and the comparison questions (which are constructed to generate probable or directed lies), while the innocent person lies to the comparison but not the relevant question. The theory is that the innocent person will show equal or less physiological responsiveness to relevant than comparison questions and that the guilty person will show greater responsiveness to relevant than comparison. In the concealed information format, the theory is that examinees will respond most strongly to questions related to their actual knowledge and experience, so that concealed information will be revealed by a stronger response to questions that touch on that information than to the comparison questions. Examinees without special information to conceal will not respond differentially across questions.
The specific nature of the relevant and comparison questions depends on the purpose and type of test. In specific-incident tests using the relevant-irrelevant format, the relevant question(s) focus on specifics of the target event about which a guilty individual would have to lie to conceal
guilt. The typical comparison questions are very unlikely to yield deceptive responses (e.g., “Is today Friday?”).
Specific-incident polygraph tests using comparison question test formats look like those in the relevant-irrelevant format. The comparison questions are specially formulated during a pretest interview with the intent to make an innocent examinee very concerned about them and either lie with high likelihood (a probable lie comparison question) or lie under instruction (a directed lie comparison question, such as, “During the first 18 years of your life did you ever steal something from someone who trusted you?”). Such comparison questions are often very similar to those used in lie scales or validity scales on personality questionnaires, except that the polygraph examiner is usually given latitude in choosing questions, so that different examinees may be asked different comparison questions at the same point in the test. The comparison questions tend to be more generic than the relevant questions in that they do not refer to a specific event known to the examiner.
Concealed knowledge specific-incident tests ask about specific details of the target event that the examinee would be unlikely to know unless present at the scene (e.g., “Was the victim wearing a red dress? A yellow dress? A blue dress?”). The relevant questions are those that note accurate details; the comparison questions present false details of the same aspect of the event. If the stimuli that produce the strongest responses consistently correspond to actual details of the incident, the respondent is judged to have concealed information about the incident.
In employee and preemployment screening tests, the relevant questions focus on generic acts, plans, associations, or behaviors (e.g., “Have you engaged in an act of sabotage?”) because the examiner does not know of a specific event. Comparison questions are typically also generic, but unrelated to the target event, and may in fact be the same questions used in specific-incident testing using the comparison question format. The concealed information format cannot be used if the examiner lacks specific knowledge that can be used in formulating relevant questions.
Polygraph testing is based on the presumptions that deception and truthfulness reliably elicit different psychological states across examinees and that physiological reactions differ reliably across examinees as a function of those psychological states. Comparison questions are designed to produce known truthful or deceptive responses and therefore to produce physiological responses that can be compared with responses to relevant questions to detect deception or truthfulness. To have a well-supported theory of psychophysiological detection of deception, it is therefore nec-
essary to identify the relevant psychological states and to understand how those states are linked to characteristics of the test questions intended to create the states and to the physiological responses the states are said to produce.
Marston (1917), Larson (1922), and Landis and Gullette (1925) all found elevated autonomic (blood pressure) responses when individuals engaged in deception. Marston (1917) described the underlying psychological state as fear; other writers have conceived it as arousal or excitement. The idea that fear or arousal is closely associated with deception provides the broad underlying rationale for the relevant-irrelevant test format.3 Subsequent research has confirmed that the polygraph instrument measures physiological reactions that may be associated with an examinee’s stress, fear, guilt, anger, excitement, or anxiety about detection or with an examinee’s orienting response to information (see below) that is especially relevant to some forbidden act.
The comparison question test and related formats are presumed to establish a context such that an examinee who is innocent of the acts identified in the relevant questions will be at least as concerned and reactive, if not more so, in relation to lying on the comparison questions as about giving truthful answers to the relevant questions. In contrast, the examinee guilty of some forbidden acts is assumed to be more fearful, anxious, or stressed about being detected for lying—and, therefore, more reactive—to the relevant questions than the comparison questions. Several theoretical accounts have been offered to lend support to these assumptions. Although there is evidence bearing on some of the propositions underlying some of these theories, none of them has been subjected to detailed investigation in the polygraph context.
According to the theory of conflict (Davis, 1961), two incompatible reaction tendencies aroused at the same time produce a large physiological reaction that is greater than the reaction to either alone. A life of answering questions straightforwardly would create one reaction tendency, and the circumstances that would motivate an examinee to deny the truth would create an incompatible reaction tendency. The assumption underlying variants of the comparison question technique is that a stronger reaction tendency (and, hence, greater reaction tendency incompatibility) will be aroused in response to relevant than control questions in guilty individuals than in others. Ben-Shakhar (1977) noted that the conflict hypothesis has trouble accounting for responses that are seen even when participants do not respond verbally to questions (e.g.,
Gustafson and Orne, 1965; Kugelmass, Lieblich, and Bergman, 1967). Moreover, a conflict between an examinee and examiner, for instance, about persistent questioning of a response to a relevant question or an expectation of being falsely accused, could in theory also create especially large and repeatable responses to relevant questions even in wrongly accused examinees.
Conditioned Response Theory
The conditioned response theory (Davis, 1961) holds that the relevant questions play the role of conditioned stimuli and evoke in deceptive individuals an emotional (and concomitant physiological) response with which lying has been associated during acculturation. A variation of this theory holds that the stimuli associated with a major transgression serve as conditioned stimuli while the act itself (e.g., a homicide), an unconditioned stimulus, elicits a dramatic autonomic response (an unconditioned response) at the time of the transgression and produces single-trial emotional conditioning. Accordingly, the recollection of the act, elicited by the relevant question, acts as a conditioned stimulus for guilty individuals and elicits a minor autonomic response (conditioned emotional response). Innocent individuals, according to this theory, never undergo this conditioning and therefore do not show a conditioned emotional response to stimuli about the target act. There is substantial evidence that autonomic responses can be classically conditioned (Diven, 1937; Tursky et al., 1976; LeDoux, 1995).
If this theory is correct, there are significant possibilities for the polygraph to misinterpret an examinee’s truthfulness because in conditioned response theory, lying is not the only possible elicitor of an autonomic response, and innocent individuals may show a conditioned emotional response triggered by some other feature of the relevant question or the manner in which it is asked. For example, questions related to traumatic experiences may produce large conditioned physiological responses even if the examinee responds truthfully—consider the psychological state of a victim or an innocent witness asked to recall specifics of a violent crime— while a lie about a trivial matter may elicit a much smaller response. Also according to this theory, relevant questions might also produce large responses in innocent examinees who have in the past experienced unfounded accusations that were associated with upsetting or punitive consequences that elevated autonomic activity. In such an examinee, a relevant question might serve as a conditioned stimulus for anger or fear similar to that associated with false accusations in the past.
Psychological Set and Related Theories
Psychological set theory (e.g., Barland, 1981) holds that when a person being examined fears punishment or anticipates serious consequences should he or she fail to deceive, such fear or anticipation produces a measurable physiological reaction (e.g., elevation of pulse, respiration, or blood pressure, or electrodermal activity) if the person answers deceptively. A variation on this theory, the threat-of-punishment theory (Davis, 1961), posits that lying is an avoidance reaction with considerably less than 100 percent chance of success, but the only one with any chance of success at all. If a person anticipates there is a good likelihood and serious consequences of being caught in the lie, then the threat of punishment when the person tries to deceive will be associated with a large physiological response. Because the consequences of lying to the comparison questions are thought to be less than lying to the relevant questions, the theory is that lying to relevant questions will be associated with larger physiological responses than lying to control questions. These theories suggest that the detection of deception will be more robust in real-life situations involving strong emotions and punishment than in innocuous interrogations or laboratory simulations. In another variation of this theory, Gustafson and Orne (1963) suggest that an individual’s motivation to succeed in the detection task will be greater in real-life settings (because the consequences of failing to deceive are grave), and this elevated motivational state will also produce elevated autonomic activation.
This theoretical argument also leaves open significant possibilities for misinterpretation of the polygraph results of certain examinees. It is plausible, for instance, that a belief that one might be wrongly accused of deceptive answers to relevant questions—or the experience of actually being wrongly accused of a deceptive answer to a relevant question— might produce large and repeatable physiological responses to relevant questions in nondeceptive examinees that mimic the responses of deceptive ones.
The related arousal theory holds that detection occurs because of the differential arousal value of the various stimuli, regardless of whether or not there is associated fear, guilt, or emotion (Ben-Shakhar, Lieblich, and Kugelmass, 1970; Prokasy and Raskin, 1973). The card test illustrates this theory. The card test is an information test in which an examinee selects one item from a set of matched items (e.g., a card from a deck). This item produces a different response from the others, whether the examinee denies special knowledge about any of the items (i.e., lies about the selected item) or claims special knowledge about all of the items (i.e., lies about all but the selected item) (Kugelmass, Lieblich, and Bergman, 1967).
A related theory, Ben-Shakhar’s (1977) dichotomization theory, is built on the concepts of orienting, habituation, and signal value (Sokolov, 1963). According to dichotomization theory, stimuli are represented in terms of one of two categories—relevant and neutral—which habituate independently. A response to a given stimulus is an inverse function of the number of previous presentations of stimuli in its category and is unrelated to the number of previous presentations of stimuli in the other category (Ben-Shakhar, 1977). Dichotomization theory is seen as additive with rather than in competition with other theories. Thus, dichotomization theory emphasizes a “relevance” factor, based on the signal value of the stimulus (Sokolov, 1963), in which stimuli that are personally relevant for historical reasons yield stronger responses than neutral material made relevant in the experimental context.
The above theoretical accounts, all of which have been used as justification for the comparison question test format, predict that deceptive individuals will show stronger physiological reactions on relevant than on comparison questions; however, they also predict that truthful examinees, under certain conditions, will show physiological response patterns similar to those expected from deceptive examinees. They thus suggest that comparison question polygraph testing has a significant potential to lead to inferences of deception when none has occurred: that is, they suggest that the polygraph test may not be specific to deception because other psychological states that can result from stimuli arising during the test mimic the physiological signs of deception. The possibility that truthful examinees will occasionally exhibit stronger physiological responses to relevant than control questions based on chance alone also increases the possibility of false alarms.
To address this issue, Lykken (1959, 1998) devised the guilty knowledge test (called here the concealed information test), based in part on orienting theory. The notion of an orienting or “what-is-it” response emerged from Pavlov’s studies of classical conditioning in dogs. Pavlov (1927:12) observed that a dog’s conditioned response to a stimulus would fail to appear if some unexpected event occurred:
It is this reflex [the orienting response] which brings about the immediate response in men and animals to the slightest changes in the world around them, so that they immediately orientate their appropriate receptor organ in accordance with the perceptible quality in the agent bringing about the change, making a full investigation of it. The biological significance of this reflex is obvious.
An orienting response occurs in response to a novel or personally significant stimulus to facilitate a possible adaptive behavioral response to the stimulus (Sokolov, 1963; Kahneman, 1973). The phenomenon of orienting is illustrated in a cocktail party in which a person can converse with another, apparently oblivious to the din created by the conversations of others, yet the person stops and orients toward the source when his or her name is spoken in one of these other conversations. Lynn (1966) has summarized the physiological profile of an orienting response as decreased heart rate, increased sensitivity of the sense organs, increased skin conductance, general muscle tonus (but a decrease in irrelevant muscle activity), pupil dilation, vasoconstriction in the limbs and possibly vasodilation in the head, and more asynchronous, low-voltage electrical activity in the brain. There are individual differences in the presence and relative magnitude of these responses, however, and the orienting response is subject to habituation, which implies that false negatives may be particularly likely among the most sophisticated and well-prepared examinees.
The concealed information test format is designed to provide a quantitative specification of the relative probability of a given outcome based on the elicitation of an orienting response to a specific piece of information that differs from the other items only in the mind of an individual who is knowledgeable about details of a crime or other target incident. An innocent examinee would be expected to respond most strongly to the relevant item in a series of five similar items (e.g., “How much money was taken? $10, $20, $30, $40, $50”), by chance with a probability of 1 in 5 (0.20). Such a response on one question would not engender much confidence in the interpretation that the person had concealed knowledge of the true amount. However, if an examinee consistently responded most strongly to the one relevant item out of five, over five separate questions, then the probability of that combined outcome occurring by chance in the absence of concealed information is presumed to be 1 in 55 (0.00032).
It is important to keep in mind that there might be a distinction between physiological reactions to the stimuli (i.e., the questions) and reactions to the response (e.g., attempted deception). Arousal theory and orienting theory, both of which are commonly cited as justifications for the concealed information test format and related techniques, focus on reactions to the questions. From the perspective of these theories, it might not even be necessary for examinees to respond, and reactions might be the same regardless of whether the response is deceptive or honest. The theories that underlie the comparison question technique (e.g., set theory, theory of conflict, conditioned response theory) assume that it is the deceptive response that causes the reactions recorded by the polygraph.
Polygraph tests that use the comparison question technique are also
sometimes justified in terms of orienting theory. Such a justification has been offered for the Test of Espionage and Sabotage (TES) used for security screening in the U.S. Department of Energy (DOE) and some other federal agencies (U.S. Department of Defense Polygraph Institute, 1995a). Strong responses to relevant questions are taken to indicate an orienting response, in turn indicating “the significance of the stimulus”—though not necessarily deception (U.S. Department of Defense Polygraph Institute, 1995a:4). Responses to the TES are scored as “significant responding,” or “no significant responding” rather than the more traditional “deception indicated” or “no deception indicated.” Orienting theory has recently been offered as theoretical justification for polygraph testing in general (e.g., Kleiner, 2002).
The claim that orienting theory provides justification for the comparison question technique of polygraph testing is radically at odds with the practices of polygraph examiners using that technique. If it is the orienting response to the stimulus rather than the physiological response to deceptiveness that drives the responses, many of the procedures that are common practice in comparison question polygraph testing should be revised. First, the practice of previewing questions with examinees is problematic under orienting theory. Exposure to the relevant questions prior to the examination would tend to decrease the differential orienting response to the relevant and comparison questions and weaken the test’s ability to discriminate. Also, comparison questions would probably be constructed differently for a test based on orienting theory. Instead of designing them to induce reactions in nondeceptive subjects, they would probably be designed to be nonevocative, as they are in the relevant-irrelevant technique. Finally, a polygraph examination based on orienting theory would typically include multiple administrations of each class of questions (e.g., there would be several variations on an espionage question), to allow for a clear differentiation of orienting responses from others. Thus, we do not take very seriously the argument that the TES or other polygraph examination procedures based on the comparison question technique can be justified in terms of orienting theory.
It is possible that different theories are applicable in different situations. The dichotomization and orienting theories, for instance, may be more applicable to tests in which the signal value of the stimulus is more pertinent than the threat of severe consequences of detection: for example, when an investigation is aimed at identifying witnesses with knowledge about an incident even if they are innocent. The conflict, set, punishment, and arousal theories, in contrast, may be more applicable for identifying individuals guilty of serious crimes or those hiding dangerous plans or associations.
The early theoretical work assumed that polygraph responses associ-
ated with deception, or the fear of deception, were involuntary and quite large in comparison to other anxieties aroused by the test (Marston, 1917). Consistent with this line of thinking, theories of the psychophysiological detection of deception by polygraph assume that relevant, in contrast to comparison, questions are more stimulating to those giving deceptive than truthful answers. Interpretation of a polygraph test has typically been based on the relative size of the physiological responses elicited by relevant questions and the associated comparison questions (e.g., Podlesny and Raskin, 1977; Lykken, 1998). If the assumptions about large and involuntary responses to relevant questions are true, the polygraph test would be characterized by high sensitivity and specificity—it would discriminate very accurately between deception and truthfulness—and it would be immune to countermeasures.
Such assumptions are not tenable in light of contemporary research on individual and situational determinants of autonomic responses generally (Lacey, 1967; Coles, Donchin, and Porges, 1986; Cacioppo, Tassinary, and Berntson, 2000a) and on the physiological detection of deception in particular (e.g., Lykken, 2000; Iacono, 2000). There is no unique physiological response that indicates deception (Lykken, 1998). If deceivers in fact have stronger differential responses to relevant questions, it does not necessarily follow that an examinee who shows this response pattern was lying (see Strube, 1990; Cacioppo and Tassinary, 1990a) because differences in people’s anticipation of and responses to the relevant and comparison questions other than differences in truthfulness can also produce differential physiological reactions. For example, relevant questions are sometimes inherently more threatening than comparison questions. Asking a weapons scientist “Have you committed espionage?” might generate a stronger response in some innocent examinees than “Have you ever taken something that did not belong to you?” Also, as noted above, individuals who have experienced punitive outcomes from being wrongly accused in the past or who believe the examiner suspects them of being the culprit may, in theory, be more reactive to relevant than control questions even when responding truthfully. No independent evidence has been reported in mock crime studies to verify that relevant questions are more stimulating than comparison questions to those giving deceptive answers or that comparison questions are equally or more stimulating than relevant questions to those giving truthful responses.
Most comparison question testing formats face the difficult challenge of calibrating the emotional content of relevant and comparison questions to elicit the levels of response that are needed in order to correctly interpret the test results. It has been argued that an unethical examiner could manipulate the questions and the way they are presented to produce
desired test results (Honts and Perry, 1992), and if this can be done intentionally, it might also be done unintentionally by an examiner who holds a strong expectancy about the examinee’s guilt or innocence (we discuss the expectancy phenomenon later in this chapter). Even if this calibration is not influenced by an examiner’s intended or unintended bias, it may be tipped one way or another by subtle variations in the ways an examiner introduces or conducts the test (Abrams, 1999). This source of inconsistency and potential unreliability in test administration was a stimulus for developing comparison question testing techniques that standardize the relevant and comparison questions across examinations and examiners. For example, directed-lie comparison question test formats have been advocated as superior to probable-lie variants because in the latter format, “it is difficult to standardize the wording and discussion of the questions” (Raskin and Honts, 2002:22). Concealed information test formats have also been advocated as superior to comparison question formats in this respect.
While orienting theory appears somewhat more plausible than the theories that underlie comparison question approaches, using the theory in devising polygraph procedures is not without problems. In particular, it is not clear how differences in stimulus familiarity affect orienting responses. Descriptions of this theory usually start with the assumption that responses to familiar and important stimuli will be different from those to novel, irrelevant stimuli, but in fact, the characteristics of stimuli should be thought of as a continuum rather than a dichotomy. That is, some stimuli are highly familiar and relevant and attract strong orienting responses, while others are moderately familiar and might or might not attract these responses. Orienting responses to familiar and important stimuli might generalize to other similar stimuli in ways that would make it difficult to distinguish true orienting responses from those bought on by stimulus generalization. For example, suppose a murder is committed using a nickel-plated revolver, and suppose an examinee owns an unregistered pistol (a blue-steel semi-automatic). That examinee might show enhanced responses to a variety of questions about handguns, even though he has no concealed information about the actual murder weapon.
The possibility of systematic individual differences or variability in physiological response has not been given much attention in polygraph theories. For example, the unresolved theoretical questions about the basis of inferences from the polygraph leave open the possibility, discussed below, that responses may be sensitive to effects of examiner expectations or witting or unwitting biases or to examinees’ beliefs about
the polygraph’s validity. Polygraph theories have been largely silent about these possibilities, and empirical polygraph research has made little effort to assess their influence on polygraph readings or interpretation.
Most alternative technologies for the psychophysiological detection of deception that are being pursued (see U.S. Department of Defense, 2000; U.S. General Accounting Office, 2001) rest on similar theoretical foundations and are subject to the same theoretical limitations. This statement holds both for measures of brain function and for peripheral measures of autonomic activity. The underlying assumption remains that someone who is trying to hide something will respond differently (i.e., show “leakage,” physiological arousal, or orienting responses to specific questions) than someone who is not trying to hide something. The objective of the new approaches, therefore, continues to be to measure a naturally occurring physiological response or profile of responses that not only differentiates known deceptive from truthful answers but also allows accurate classification of answers as deceptive or truthful. Improvements have been and continue to be made in the design of transducers, amplifiers, data recording, and display techniques, and in the standardization of procedures and data reduction. Data interpretation, however, still depends on the validity of the assumption that relevant, in contrast to comparison, questions are more evocative to those giving deceptive answers and equally or less evocative to those giving true answers.
Screening uses of polygraph testing raise particular theoretical issues because when the examiner does not have a specific event to ask about, the relevant questions must be generic. If a comparison question testing format can meet the challenge of calibrating questions to elicit the desired level of response in a specific-incident test, it does not follow that the same format will meet the challenge in a screening application because the relevant questions do not refer to a specific event. It is reasonable to hypothesize that autonomic reactions are more intense, at least for guilty individuals, when a target event is described concretely than when it is merely implied by mention of a generic category of events. Nothing in current knowledge of psychophysiology gives confidence that a test format will work at the same level of accuracy in a screening setting that requires generic questioning as it does in a specific-incident application.
The theory of comparison question polygraph techniques as currently used for screening can be summarized as follows:
An examinee will respond differently when trying to hide something (i.e., show leakage or greater physiological arousal or orienting responses to relevant questions) than when not trying to hide something.4
Those who have nothing to hide will be less reactive to key (rel-
evant) questions than they are when lying on personally relevant (comparison) questions.5
Examinees will not respond more strongly to the relevant than comparison questions based on chance alone.
An examiner’s pursuit of an explanation of an anomalous response and the consequent activation of social norms and fear of having been detected will lead to explanations, admissions, or confessions one otherwise might not obtain but will not produce false confessions or a specific fear or anxiety in response to relevant questions on a follow-up test.
To the extent that these principles do not hold universally, an examiner’s rapport with the examinee, the desired understanding of the polygraph examination and questions, and the clinical skill in determining the person’s veracity (i.e., detection of deception from demeanor) are all important in distinguishing among individuals who have physiological responses not indicative of deception (e.g., anxiety or anger regarding relevant questions, insufficient emotionality about the comparison questions), those who have physiological responses indicative of relatively innocuous transgressions, and those who have physiological responses indicative of significant transgressions. These distinctions are made on the basis of clinical judgment, which, though sometimes accurate, does not stand on a good foundation of theory or empirical evidence. There is little basis for relying on the accuracy of clinical judgments, especially in individual cases, without such a foundation.
The scientific basis for polygraph testing rests in part on what is known about the physiological responses the polygraph measures—particularly, knowledge about how they relate to psychological states that may be associated with contemplating and responding to test questions and how they might be affected by other psychological phenomena, including conscious efforts at control. The polygraph machine usually measures three or four responses. Relative blood pressure is measured by a blood pressure cuff positioned over the biceps. Electrodermal activity (a measure of the activity of the eccrine sweat glands) is measured by electrodes placed on two fingers or the palm of the hand (Orne, Thackray, and Paskewitz, 1972). The rate and depth of respiration are measured by pneumographs positioned around the chest and abdomen. The contemporary scoring methods in most common use combine information from all these response systems under the assumption that each may provide a sensitive index of fear, arousal, or orienting response to a particular question in a given individual.6
The justification of these physiological measures was originally derived from arousal theory, which holds that the stronger the stimulus or event, the stronger the psychological reaction, and the more pronounced these particular physiological responses. In studies of the influence of emotional disturbances on what he termed the “emergency reaction,” Cannon (1929) advanced the hypothesis that there is a diffuse, nonspecific sympathetic outflow through the interconnections in the sympathetic ganglia during emergency states and that this sympathetic discharge is integrated with behavioral states—the so-called “fight-or-flight” reaction. In Cannon’s formulation, autonomic and neuroendocrine activation associated with emotional disturbances serves to mobilize metabolic resources to support the requirements of fight or flight, thereby promoting the protection and survival of the organism.7
Although the intensity of autonomic, electrocortical, and behavioral reactions does tend to covary with the intensity of the evocative stimulus, the prediction of a general and diffuse physiological activation has failed empirical tests. Correlations among autonomic measures both within and between individuals are commonly found to be weak. Moreover, negative correlations have been found to occur within individuals during some tasks (e.g., between heart rate and skin conductance responses; see Lacey et al., 1963). Negative correlations have also been reported between electrocortical and autonomic measures of activation and between facial expressiveness and autonomic responses. Contrary to the notion that sympathetic nervous activation is global and diffuse, highly specific regional sympathetic activation has been observed in response to stressors (Johnson and Anderson, 1990), even in extreme conditions such as panic attacks (Wilkinson et al., 1998). Research also shows that the same excitatory stimulus (e.g., stressor) can have profoundly different effects on physiological activation across individuals or circumstances (Cacioppo et al., 2000; Kosslyn et al., 2002).
Cardiovascular, electrodermal, and respiratory activity respond in different ways to various psychological states and behaviors. The cardiovascular system responds to stimuli that may be considered arousing, and even to the anticipation of such stimuli. The responses are multiply determined, however, and there are individual differences in the direction and extent of cardiovascular response. For example, active coping tasks (i.e., those that require cognitive responses, such as test taking or interrogation) tend to increase blood pressure, but through different mechanisms (i.e., cardiac activation or vasoconstriction) for different kinds of tasks; moreover, individuals differ in the reactivity of these mechanisms. The evidence does not support the assumption that cardiovascular signals of arousal are consistent across individuals.
Electrodermal activity can be measured by skin conductance between
two electrodes on the fingers or palm (skin resistance measurements can give misleading indications of magnitudes of response). Skin conductance responses can be elicited by so many stimuli that it is difficult to isolate specific psychological antecedents. Respiration is easily brought under voluntary control, so it is unlikely by itself to be a robust indicator of any psychological state an examinee is trying to conceal. Variations in respiration can produce changes in heart rate and electrodermal activity. Therefore, respiration needs to be monitored to determine whether cardiovascular and electrodermal responses to relevant and comparison questions are artifacts of other changes. (Appendix D provides more detail about current knowledge of cardiovascular, electrodermal, and respiratory response systems.)
The physiological responses measured by the polygraph do not all reflect a single underlying process such as arousal. Similarly, arousing stimuli do not produce consistent responses across these physiological indicators or across individuals. This knowledge implies that there is considerable lack of correspondence between the physiological data the polygraph provides and the underlying constructs that polygraph examiners believe them to measure. On theoretical grounds, it is therefore probable that any standard transformation of polygraph outputs (that is, scoring method) will correspond imperfectly with an underlying psychological state such as arousal and that the degree of correspondence will vary considerably across individuals. Little is known from basic physiological research about whether there are certain types of individuals for whom detection of arousal from polygraph measures is likely to be especially accurate—or especially inaccurate.
Polygraph theories assume that differences in physiological responses are closely correlated with psychological differences between examinees’ responses to relevant and comparison questions on the polygraph test. This assumption will be less plausible to the extent that a polygraph testing procedure gives an examiner discretion in selecting the relevant and comparison questions for each examinee. It is reasonable to expect that if a polygraph test procedure gives examiners more latitude in this respect, the results are likely to be less reliable across examiners, and more susceptible to examiner expectancies and influences in the examiner-examinee interaction.
INFERENCES FROM POLYGRAPH TESTS
Given the imperfect correspondence that can be expected between polygraph test results and the underlying state the test is intended to measure, inferences from polygraph tests confront both logical and empirical issues.
The Logic of Inference
When theory does not establish a tight link from the physiological responses to the psychological states presumably tied to deception, and particularly when theory raises the possibility that states other than deception may generate physiological responses from which deception is inferred, inference faces a major logical problem.8 This problem is not obviated by advances in neural and physiological measurement, which is now often highly sophisticated and precise. The logical problem is generic to inferences about psychological states from physiological indicators.
Inference commonly follows the subtractive method, in which experimental and control or contrast conditions differ by one element, stage, or process (Strube, 1990; Cacioppo, Tassinary, and Berntson, 2000b). Outcome differences between the experimental and control conditions are then considered to reflect the effect of that single component. This method allows the construction of physiological indices of the psychological phenomena that have been varied in experiments, which are then used to develop concepts and test theories about those phenomena.
The subtractive method underlies the interpretation of the polygraph chart and of other indicators used for the psychophysiological detection of deception. If there are sufficiently more or stronger “arousal” responses to relevant than control questions, the polygraph chart is interpreted as “deception indicated” or as showing “significant response.” This approach does not allow a strong inference (Cacioppo and Tassinary, 1990a).9 The confidence in such an interpretation would be enhanced if the particular result (e.g., relatively large skin conductance responses) could be shown to arise consistently under a wide range of conditions of deception, and if the result could not be attributable to some other aspect of the stimulus or context (e.g., fear of being suspected or anxiety over trivial or irrelevant transgressions). Even then, however, the autonomic responses could not be used definitively to infer the presence of deception, as other antecedent conditions (e.g., emotional reactions) may yield the same result.10
In most polygraph research, a psychological factor (deception) serves as the independent variable and a physiological factor serves as the dependent variable. This format provides information about the likelihood of a physiological response given a person who is being deceptive. Such evidence is commonly offered to address the question of how good the polygraph test is as a diagnostic of lying. However, a polygraph test, like other diagnostic instruments, is actually used to make the reverse inference: about the likelihood of deception given the physiological response
that is observed. The conditional probabilities on these two situations are not necessarily or typically equal; they are related as follows:
P(physiological activity given deception) × P(deception)
=P(deception given physiological activity) × P(physiological activity).11
A strong ability to distinguish deception from truthfulness on the basis of a positive polygraph result requires that the polygraph test have high specificity (a probability of physiological response given nondeception close to zero). For example, a positive result from a test with 50 percent sensitivity and 100 percent specificity implies the subject is deceptive, but 50 percent of deceptive subjects will not be caught. A strong inference of innocence from a negative polygraph result requires that the sensitivity of the test be very high. In that case, all the deceptive subjects are caught, but unless the specificity is also high, many nondeceptive subjects will also be “caught.” Only with a test with an accuracy similar to that of DNA matching—which has both very high sensitivity and very high specificity—could one be confident that the test results correspond closely to truth.12 However, as we have shown, the physiological measures used in polygraph testing do not have such close correspondence with deception or any other single psychological state (Davis, 1961; Orne, Thackray, and Paskewitz, 1972). Lacking a one-to-one correspondence between the psychological and physiological states, empirical evidence at the aggregate level showing that deception produces larger physiological responses than honest responding does not adequately address the validity of the reverse inference, that larger physiological responses can be caused only by deception. This misinterpretation of the import of the empirical evidence has been called the “fallacy of the transposed conditional” in the literature on legal decision making (the attribution is usually to the statistician Dennis Lindley; see, e.g., Balding and Donnelley, 1995; Fienberg and Finkelstein, 1996). It is also known as the prosecutor’s fallacy because of the way it can arise in the courts. A prosecutor may offer forensic evidence that establishes the probability that a positive test result (a DNA match or a polygraph test indicating deception) would be observed if the defendant is innocent, but a jury’s task is to determine the probability that the defendant is innocent, given a positive test result.13 At least one jury decision has been overturned because of the confusion between these two probabilities (see Pringle, 1994).
Empirical Sources of Error
Compounding the logical problems, many factors associated with polygraph testing itself may introduce substantial error, both random
and systematic, into the results of polygraph examinations. The implications of these errors for polygraph test interpretation depend on the nature of the error. If errors were known to be randomly distributed across individuals and physiological indicators, they would be reduced by multiple measurement across multiple channels—an approach commonly used in polygraph testing.
Of more serious concern are sources of error that may reflect consistent rather than random causes and that may lead guilty individuals to appear truthful on the test or innocent ones to appear deceptive, thus reducing the accuracy of the test. We have noted that one cannot rule out, on theoretical grounds, the possibility that polygraph responses vary systematically with characteristics of examiners, examinees, the test situation, the interview process, and so forth.14 Such factors may cause systematic error in polygraph interpretation and need careful consideration, especially if basic scientific knowledge suggests that a particular factor might systematically affect polygraph test results. It is convenient to distinguish two classes of potential sources of systematic error: those that derive from stable or transient characteristics of examinees or examiners (endogenous factors) and those that derive from factors in the social context of the polygraph examination.
Among the characteristics of examinees and examiners that could threaten the validity of the polygraph are personality differences affecting physiological responsiveness; temporary physiological conditions, such as sleeplessness or the effects of legal or illegal drug use; individual differences between examiners in the ways they conduct tests; and countermeasures. For such conditions to threaten the validity of the test, they would have to differentially affect responsiveness to relevant and comparison questions (e.g., by reducing a guilty examinee’s responsiveness to relevant questions). Although there have been studies of the effects of some personality variables and some drugs on polygraph detection of deception (see Chapter 5), there have been few systematic efforts to ascertain whether and how any such relationships might vary across the particular indicators used in polygraph testing. We have not seen persuasive scientific arguments that any specific personality variable would influence polygraph accuracy. If such effects were found to exist, however, it would be possible in principle to use information on the personality variable to adjust polygraph test scores.
An example of an endogenous factor that could be imagined to decrease the specificity of the polygraph, mentioned at our visit to the U.S. Department of Energy (DOE), is what was termed the “guilty complex”—
an individual attribute that may lead innocent people to respond physiologically as do guilty people. Certain chronic medical conditions (e.g., tachycardia) could be imagined to have similar effects. We have not found scientific studies investigating the effects of these factors on polygraph test performance. In general, too little attention has been paid to the factors that may reduce the specificity of the polygraph (i.e., produce false positive results). Research has been done on one endogenous factor that may reduce the sensitivity of the polygraph—the use of countermeasures. The empirical evidence from studies of countermeasures is discussed in Chapter 5.
Factors in the social context of the polygraph examination may also threaten the validity of the test and lower its sensitivity and specificity. The possibility of systematic physiological effects from the examiner-examinee interaction is particularly troublesome for two reasons: the effects would be hard to control or correct, and there are plausible psychophysiological mechanisms by which this interaction could degrade polygraph test validity. Social interaction effects would be hard to correct because manipulation of the examiner-examinee social interaction is an integral part of the polygraph test, particularly in the relevant-irrelevant and some control question test formats, and is normally done in a clinical manner that relies heavily on examiner judgment. Examiners are instructed to create emotional conditions designed to lead to differential levels of arousal and physiological responsiveness in innocent and guilty examinees. How this is done is not standardized in polygraph practice nor measured in polygraph research. This uncontrolled variation is likely to reduce the test-retest reliability of polygraph tests when different examiners are used for different tests and to make the accuracy of test results more variable in test formats that depend on creating an emotional climate based on the examiner’s judgment. It also creates extreme difficulty in correcting for the effects of social interaction factors on polygraph test results. Eliminating an examiner entirely from the polygraph test is likely to reduce some but not all of these effects.
Moreover, basic research in social psychophysiology gives reason for concern about important sources of systematic error that could arise in polygraph tests from social interactions in the examination situation. Over the past three decades or so, this research has demonstrated that individuals are quite autonomically sensitive to the characteristics of those with whom they interact (Cacioppo and Petty, 1983; Wagner, 1988; Gardner, Gabriel, and Diekman, 2000), especially in potentially threatening situations (e.g., Cacioppo and Petty, 1986; Hinton, 1988; Blascovich,
2000). This research suggests that at least two interpersonal phenomena might affect the sensitivity and specificity of polygraph tests: stigma and expectancies.
Stigmas mark individuals who are members of socially devalued groups. Stigmas may be easily visible (e.g., gender, skin color, deformations of the body); not necessarily visible (e.g., socioeconomic status, religion); or usually invisible (e.g., sexual orientation, metaphysical beliefs, having been suspected of espionage). Many theorists have argued that stigmas cause perceivers to feel a sense of uncertainty, discomfort, anxiety, or even danger during social interactions (Crocker, Major, and Steele, 1998). Much recent physiological work also suggests that bearers of stigma are threatened during interactions with members of nonstigmatized groups. Recently, research has confirmed experimentally that both stigma bearers and perceivers exhibit cardiovascular patterns of response associated with threat during performance situations that are not metabolically demanding (e.g., Mendes, Seery, and Blascovich, 2000; Blascovich et al., 2001b). This research typically demonstrates these effects during task performance but not during baseline or resting periods, suggesting the possibility that physiological responses to relevant and comparison questions might be differentially affected on polygraph tests.
Research on members of racially stigmatized groups (particularly, African Americans) suggests that such individuals exhibit heightened cardiovascular threat responses in situations in which negative stereotypes about racially stigmatized groups are likely to exist (Blascovich et al., 2001a). For example, members of racially stigmatized groups exhibit increased blood pressure reactivity during testing that requires their cognitive responses to difficult test items.
The experimental situations in which these stigma studies have occurred bear a striking resemblance to polygraph testing situations, particularly employee screening tests. Participants are told the kind of tasks that they will undertake. Their written consent is obtained. Participants are given physiological tests in recording rooms. In most of these studies, participants are asked to cooperate with each other. Autonomic physiological sensors, including blood pressure cuffs, are attached to participants, and so forth.
One important difference between the testing situations in these studies and polygraph testing situations is that participants are not asked to lie. Neither are they told that the purpose of the physiological recording equipment is to detect lying (which it is not). Nonetheless, both perceivers and bearers of stigma, including visible and nonvisible stigmas, have
been shown to exhibit cardiovascular patterns associated with threat, including increased myocardial contractility, decreased cardiac output, increased total peripheral resistance, and increases in blood pressure (Blascovich, 2000; Blascovich et al., 2001b).
These studies suggest that stigma may affect polygraph test accuracy. Specifically, they suggest that if either the examiner or the examinee bears a stigma, the examinee may exhibit heightened cardiovascular responses during the polygraph testing situation, particularly during difficult aspects of that situation such as answering relevant questions, independently of whether he or she is answering truthfully. Such responses would be likely to increase the rate of false positive results among examinees who are members of stigmatized groups, at least on relevant-irrelevant and comparison question tests.15 (In Chapter 4, we discuss the very limited empirical research examining the effects of stigma-related characteristics of examiners and examinees, such as race and gender, on the accuracy of polygraph diagnoses of deception.)
Expectancies have been a subject of social-psychological research for the past 40 years. In the early 1960s, Robert Rosenthal began one major line of research, examining the social psychology of the research situation; he hypothesized and verified the so-called experimenter expectancy effects. He demonstrated that experimenter biases affected the results of experimental psychological studies in many situations, even when the experimenters had no intention to do so. Expectancy effects have been tested outside the research situation hundreds of times in a variety of settings (e.g., Rosenthal and Jacobson, 1968; Rosenthal and Rubin, 1978; Harris and Rosenthal, 1985; Rosenthal, 1994; McNatt, 2000; Kierein and Gold, 2000). The most familiar example of expectancy effects is the so-called “Pygmalion effect,” in which teachers’ initial expectancies about specific students’ potential can affect the students’ future performance in the classroom and on standardized tests.
Expectancies in the polygraph testing situation have the potential to affect the validity of such testing.16 It is reasonable to assume, for instance, that an examiner’s belief, or expectancy, about examinees’ guilt or innocence in a criminal investigation setting may cause the examiner to behave differentially—for instance, in a more hostile manner—toward examinees believed to be guilty or deceptive. Such behavior would plausibly create differential emotional reactions in examinees that could affect physiological responses that are detected by the polygraph. These emotional reactions would plausibly be strongest in response to questions about which the examiner expects deceptive responses, thus possibly
causing physiological responses to those questions, regardless of the examinee’s truthfulness. It is also possible for an examiner’s expectancy to influence the way questions are selected, explained, or asked, to the extent that the test format is not standardized (Honts and Perry, 1992; Abrams, 1999). Basic research shows that expectancies can affect responses even when the responder does not know which responses are expected (e.g., Rosenthal and Fode, 1963). Consequently, examiner expectancies might influence responses even among innocent examinees on concealed information tests.
In employee screening, examiners may have expectancies not only about the truthfulness of individual examinees, but also about the base rates of true positives and true negatives in the population tested. In the DOE security screening program, for example, examiners reasonably believe that the likelihood of any individual examinee being a spy is very low. Their interactions with examinees might therefore be relatively low-key and unlikely to generate differential responses to relevant questions.
In both event-specific and screening applications, it is also quite plausible that examinees may vary in their expectancies about how the test will be used or about the particular examiner’s attitudes about them. Such responses, especially when specific to individuals, are very difficult to assess and take into account in interpreting polygraph charts.
It is easy to infer hypotheses from basic research in social psychology about the ways expectancies might affect polygraph test results. For example, examiners who have high expectancies of deceptive individuals among those they test may act in ways that elicit strong physiological responsiveness to relevant questions in their examinees, resulting in a high rate of false positives (lower specificity). Similarly, examiners with high expectancies of truthfulness might elicit weaker physiological responses, resulting in a high rate of false negatives (lower sensitivity). Or examiners who think an examinee is probably guilty can be hypothesized to elicit stronger emotional responses from the examinee than they would from the same examinee if they believed the person to be innocent. Expectancy research, as well as related research on behavioral confirmation (Snyder, Tanke, and Berscheid, 1977; Snyder, 1992; Snyder and Haugen, 1994), makes such hypotheses plausible, and polygraph theory provides no reasons to discount them as unreasonable. It therefore remains an empirical question whether polygraph test results and interpretations support such hypotheses and whether, in fact, test validity is diminished to any significant degree by examiner or examinee expectancies. (We discuss the limited empirical research on this question in Chapter 5.)
An important and somewhat special case of expectancies with great relevance to polygraph testing involves examinees’ expectancies regarding the validity of the polygraph test itself. Indeed, much of the utility
claimed for polygraph testing can be ascribed to the strength of the expectancy on the part of the examinee that any deception will be revealed by the polygraph. This expectancy can become so strong that it motivates the examinee to admit or confess to crimes or other transgressions. Such admissions are often counted as true positive results of polygraph examinations, even in the complete absence of physiological data or independent confirmation of the admissions. It seems plausible that a belief that is nearly strong enough to lead to a confession may lead to physiological response patterns indicative of deception if the examinee does not confess. If this hypothesis is correct, the polygraph would perform better with examinees who believe it is effective than with those who do not. This hypothesis is, in fact, the rationale for using stimulation tests during the pretest phase of the polygraph examination. Research on the effect of stimulation tests on polygraph accuracy gives mixed results, as is noted in Chapter 5.
Current knowledge about physiological responses to social interaction is consistent with the idea that certain aspects of the interaction in the polygraph testing context may constitute significant sources of systematic error in polygraph interpretation that can affect the specificity as well as the sensitivity of the test, reducing the test’s validity. The usual strategy for addressing systematic error resulting from a testing interaction is to standardize the interaction, perhaps by automating it. However, this strategy might be very difficult to implement effectively, especially with comparison question polygraph testing, because elements of the interaction are integral to creating the expectations and emotional states in the examinee that are said to be necessary for accurate comparison of responses to relevant and comparison questions. Some standardization can be achieved within the comparison question test format—for example, by limiting the examiner’s choice of questions, as is done in the Test of Espionage and Sabotage.
Although much of the knowledge relevant to expectancy effects is decades old, polygraph theory and practice have changed little in terms of their sensitivity to issues of social interaction in the examination setting. Polygraph theory does not give reason to discount the contextual hypotheses concerning possible systematic error.
THE STATE OF POLYGRAPH RESEARCH
Psychophysiological detection of deception is one of the oldest branches of applied psychology, with roots going back to the work of
Lombroso (1882, 1895) and with systematic applied research occurring at least since Marston’s (1917) efforts in support of the U.S. war effort in World War I. (Appendix E summarizes the history of Marston’s work, including his relationship to the National Research Council, as well as providing some historical context related to the use of polygraph tests in security screening.) Over more than a century of research, major advances have been made in fields of basic psychology, physiology, and measurement that are relevant to the psychophysiological detection of deception and have the potential to transform the field, possibly improving practice. Some of these advances have found their way into polygraph research. The applied field as a whole, however, has been affected relatively little by these advances.
A solid theoretical base is necessary to have confidence in tests for the psychophysiological detection of deception, particularly for security screening. This is the case, as we have noted, because theory suggests that polygraph tests may give systematically erroneous results in certain situations and with certain populations (e.g., expectancy and stigma effects); because purely empirical assessment of the accuracy of test procedures cannot be conducted in important target populations such as spies and terrorists; and because of the need to have tests that are robust against a variety of countermeasures, some of them unanticipated. A research effort appropriate to these challenges would have been characterized by a set of research programs, each of which would have attempted to build and test a theoretical base and to develop an associated set of empirically supported measures and procedures that could guide research and practice. It would have focused on the psychophysiology and neuroscience of deception and sought the best physiological indicators of deception and the best ways to measure each one.
There are a few research programs that exhibit some of these characteristics. However, for the most part, polygraph research has focused on a few physiological responses for which measures have been available since at least the 1920s and tried to make the best of them by testing variations of them in practice, without doing much to develop the underlying science. The research has tended to focus on the application without advancing the basic science. In recent years, the same sort of approach has been tried with newer measures (see Chapter 6). There has been no systematic effort to identify the best potential physiological indicators on theoretical grounds or to update theory on the basis of emerging knowledge in psychology or physiology.
There has not even been any systematic effort to develop theoretical
clarity regarding the mechanisms purported to cause differential responses to relevant and comparison question in relevant-irrelevant or comparison question polygraph tests. Various theoretical accounts have been advanced to explain differential psychological responses to relevant and comparison questions (differential arousal, stress, anxiety, fear, attention, or orienting). Although these theories all concur that a guilty individual responding to relevant question should evince a different psychological state than when responding to a comparison question, these theories differ with respect to the variety of psychological states that an innocent individual might experience in responding to relevant question and comparison questions. Although these differences are important for understanding the possibilities for false positive test results, we have found no studies reporting tests among the theories. Relatedly, various theories have been proposed to map the diverse psychological states presumed to be associated with deception to peripheral physiological responses. We found no tests among these theories, either. Indeed, most research on the comparison question polygraph has been atheoretical about the underlying mechanisms.
The situation is somewhat different with research on concealed information polygraph testing, which has consistently drawn on the theory of the orienting response. This research has emphasized developing and testing procedures that are resistant to threats to validity that can arise from differential reactions to relevant and comparison questions among examinees who have no event-related information to conceal. It uses the same physiological measures as other polygraph research, however, and in this respect shares the limitations of other polygraph test formats.
Polygraph research has not made adequate use of well-developed theoretical models of the physiological processes underlying the peripheral measurements taken by the polygraph. Those models are not reflected in the instruments or measurement procedures used in polygraph testing. Theoretical developments about the separable neurophysiological control of peripheral responses that appear similar (e.g., Dienstbier, 1989; Berntson, Cacioppo, and Quigley, 1991, 1993; Cacioppo, 1994) have seldom been considered in polygraph research, nor do the physiological measurement procedures and devices used in polygraph tests conform to the standards established by the scientific research community (e.g., Dawson, Schell, and Filion, 1990; Dawson, 2000). There is now an extensive body of literature on the sympathetic and parasympathetic influences on many organs that are in turn reflected in psychophysiological measures. Many of the measures used in polygraph testing, such as heart rate, reflect both sympathetic and parasympathetic influences. Several very different physiological mechanisms can result in identical changes in heart rate. There are now measures available that allow for the disentan-
gling of these separate contributions; however, few of these concepts and methods have been used in polygraph research. Moreover, applied polygraph research has not for the most part taken advantage of advances in the psychophysiology and neuroscience of emotion, motivation, attention, and other processes that can affect the measures taken in polygraph testing (see, e.g., Coles, Donchin, and Porges, 1986; Cacioppo and Tassinary, 1990b; Cacioppo et al., 2000).
Polygraph research has not paid sufficient attention to advances in inductive inference in psychophysiology that have underscored the need to examine the specificity as well as the sensitivity of the mapping between a psychological state and a physiological manifestation (Strube, 1990; Cacioppo and Tassinary, 1990a; Sarter, Berntson, and Cacioppo, 1996). Specificity of the polygraph is threatened by any physiological process unrelated to deception that can systematically affect polygraph test scores.17 We have found very little research on ways that conditions other than deceptiveness might produce records that are judged deceptive and no evidence of any systematic attention to threats to specificity. As discussed in more detail in Chapter 5, empirical validation studies of the polygraph continue to emphasize the ability to make physiological differentiation between known lying and known truth-telling.
A particularly important gap is the absence of any theoretical consideration of the social (e.g., interpersonal) and physical context of the polygraph test. As already noted, an extensive basic scientific literature in social psychology and sociology details the myriad effects of perceptible personal features (e.g., status, race, gender), dispositions (e.g., traits), and histories (e.g., examinee expectancies, cultural norms, and values) on social perception (e.g., examiner expectancies) and on psychological and physiological processes within individuals (e.g., Shapiro and Crider, 1969; Waid, 1983; Cacioppo and Petty, 1983; Gardner, Gabriel, and Diekman, 2000; Hicks, Keller, and Miller, 2000; Blascovich et al., 2001b). We found no study of the mechanisms by which such variables might affect polygraph test outcomes: for instance, of the effects they might have on the selection of comparison questions, on the examinee’s understanding of the questions and the examination, or on the examiner’s behavior, subtle and otherwise, during the examination.
In short, the bulk of polygraph research, including almost all the research conducted by federal agencies that use the polygraph, can be accurately characterized as atheoretical. Studies report on efforts to improve accuracy by changing methods of test administration, physiological measurement, data transformation, and the like, but they rarely address the underlying psychological and physiological processes and mechanisms that determine how much accuracy might be achieved. Thus,
for example, the field includes little or no research on the emotional correlates of deception; the psychological determinants of the physiological measures used in the polygraph; the robustness of these measures to demographic differences, individual differences, intra-individual variability, question selection, attempted countermeasures, or social interaction variables in the interview context; or the best ways of measuring and scoring each physiological response for tapping the underlying emotional states to be measured. Because empirical evidence of accuracy does not exist for polygraph testing on important target populations, particularly for security screening, the absence of answers to such theoretical questions leaves important questions open about the likely accuracy of polygraph testing with target populations of interest.
Relationships to Other Scientific Fields
Polygraph research has not been adequately connected to at least two major scientific literatures, other than basic psychophysiology, that are also of direct relevance to improving the psychophysiological detection of deception. One of these is the research on diagnostic testing. As noted in Chapter 2, polygraph researchers and practitioners do not generally conceive of the polygraph as a diagnostic test, nor does most of the field recognize the concept of decision thresholds that is central to the science of diagnostic testing. Researchers and practitioners rarely recognize that the tradeoff between false positives and false negatives can be made as a matter of policy by setting decision thresholds. As a result, practitioners seem to make this tradeoff implicitly, sometimes in the choice of which polygraph testing procedure to use and sometimes, perhaps, in judging the likelihood that a particular examinee will be deceptive. Polygraph research also does not consider systematically the possible use of the polygraph as part of a sequence of diagnostic tests, in the manner of medical testing, with tests given in a standard order according to their specificity, their invasiveness, or related characteristics. (This approach to interpreting information from polygraph tests is discussed further in Chapter 7.)
The other field that polygraph research has not for the most part benefited from is the science of psychological measurement. Psychological testing and measurement draws on nearly a century of well-developed research and theory (Nunnally and Bernstein, 1994), which has led to the development of reliable and valid measures of a wide range of abilities, personality characteristics, and other human attributes. There is substantial research dealing with the evaluation of objective tests, personality inventories, interviews, and other assessment methods, and clear
standards for assessing and interpreting the reliability, validity, and utility of tests and assessments have been articulated and adopted by test developers and users (see Society for Industrial and Organizational Psychology, 1987; American Psychological Association, 1999). The goal of virtually all evaluations of psychological tests and assessments is to provide evidence about their construct validity. A wide range of methods (e.g., factor analyses, correlations, laboratory experiments) and types of evidence are used in investigating construct validity.
Polygraph research and practice typically have not drawn on established psychometric theory or of current methods for developing and evaluating tests and measures. Some polygraph studies report inter-rater agreement in assessing charts and others report other types of reliability information, but there has been little serious effort to investigate the construct validity of the polygraph. Indeed, as already noted, it is rarely clear exactly what polygraph tests are designed to measure, or how the various pieces of data obtained from polygraph tests are thought to be linked to states or attributes of the examinee, making it difficult to even initiate the process of construct validation (Fiedler et al., in press). Despite several decades of polygraph research and practice, it is still difficult to determine the relationship, if any, between attributes of the examinee (e.g., deceptiveness, use of countermeasures) and the outcomes of a polygraph examination.
There has been substantial progress in the development of psychometric methods and theory in the last 30 years. Cronbach et al. (1972) developed generalizability theory, which provides a framework for assessing measurement methods that involve multiple components or facets (polygraph outcomes might be affected by the types of questions used, by the examiner, by the context in which the examination is carried out, and so forth). Item response theory (for an overview, see Hambleton, Swaminathan, and Rogers, 1991), the method of choice for modern psychometric theory and research, provides detailed information about the relationship between the attribute or construct a test is designed to measure and responses to items and tests. McDonald (1999) has proposed a unified test theory that links traditional psychometric approaches, item response theory, and factor analytic methods. Unfortunately, none of these developments has had a substantial effect on the administration, scoring, interpretation, or evaluation of the polygraph. Modern psychometric methods are rarely if ever cited or recognized in papers and reports dealing with the polygraph, and while some studies do attempt to estimate some aspects of the reliability of polygraph examinations, none focuses on the cornerstone of modern psychometric theory and practice— the assessment of construct validity.
Consequences for Practice
Partly as a consequence of the isolation of polygraph research from related fields, polygraph practice has been very slow to adopt new technologies and methods. For example, some polygraph equipment still displays electrodermal activity as skin resistance rather than conductance, despite the fact that it has been known for decades that the latter gives a more useful measure of electrodermal response (see Fowles, 1986; Dawson, Schell, and Filion, 1990).18 There has been no systematic effort to address the basic question of how best to detect deception in criminal investigation or national security contexts. Such an effort would have led to earlier and more serious investigation of emerging physiological and neurological measurement techniques that might be expected on theoretical grounds to have potential for lie detection, particularly measurements of brain activity. Instead, there appears to be inertia among practitioners about using the familiar equipment and techniques that rely on 1920-era science and a lack of impetus from national security or criminal justice agencies, until quite recently, to develop methods and measures that might have a stronger base in modern psychophysiology and neuroscience.
The field has also failed so far to make the best of knowledge about new and promising methods of data analysis that might do a better job of linking theory to measurement, for example, research on computer-based models for scoring polygraph charts. Early efforts, such as those reported by Kircher and Raskin (1988), focused on statistical discriminant analysis and used general notions (such as latency, rise, and duration) and other measures for each channel, drawing on general constructs that underlie psychophysiological detection of deception in the psychophysiology literature. But there appears to be limited justification for most specific choices of key parameters used in the formal models, and the operational measures one finds in this work often closely resemble what polygraph examiners claim to do in practice. This work was followed in the 1980s and 1990s by government-funded studies aimed at developing computer-based polygraph scoring systems that take advantage of advances in statistical and machine-learning algorithms capable of making the most of polygraph data (e.g., see Raskin et al., 1988; Raskin, Horowitz, and Kircher, 1989; Olsen et al., 1997). Those studies have not led to significant changes in practice. To the extent that the polygraph instrument measures physiological responses relevant to deception, this approach holds promise, but much of that promise has yet to be realized (see Appendix F). Unfortunately, the most recent and complex studies of this type, conducted at the Applied Physics Laboratory at Johns Hopkins University, appear to have taken a largely atheoretical approach, aiming to build a
logistic regression detection algorithm by purely empirical means from a subset of 10,000 features extracted from physiological signals. Those efforts have not apparently built on advances in psychophysiology that might have helped in selecting features with theoretical or empirical rationales for their relevance.
The above discussion might easily be read as a broad indictment of polygraph researchers; we do not intend that interpretation. Polygraph research has attracted and continues to attract well-trained and qualified scientists. We believe that the lack of progress in polygraph research is attributable not so much to the researchers as to the social context and structure of the work.
Polygraph research has been guided, for the most part, by the perceived needs of law enforcement and national security agencies and the demands of the courts, rather than by basic scientific approaches to research. In this respect, polygraph research is like many other fields of forensic science. The 1923 decision in Frye v. United States (293 F.1013) did not support work on validity issues in forensic science because under Frye, courts accepted the judgment of communities of presumed experts. After Frye, the courts did not demand validation research or efforts to find the most scientifically defensible methods for the psychophysiological detection of deception. Not until the 1993 Daubert decision were courts asked to judge the admissibility of expert testimony on the basis of the scientific validity of the expert opinion. That decision brought validity issues to the fore and is likely to increase the demand for solid scientific validation. So far, however, the overall enterprise of forensic science and the subfield of polygraph research have not changed much.
Meanwhile, promising young scientists from a number of relevant fields have not flocked to forensic science to make their careers. The questions being pursued have seemed far from the cutting edge of the fields in which those scientists were trained and unrelated to the major theoretical issues in those fields. Consequently, advisers in those fields have not steered their best students into forensic science, and a career in the area does not confer academic prestige. Psychophysiology and its relation to polygraph research is a case in point. Polygraph research, which has focused mainly on making incremental improvements in the way 1920s technology is used, would seem particularly unattractive to any young scientist wanting to advance understanding of modern psychology or physiology. As a result, there have been few new ideas for the research on the psychophysiological detection of deception.
Polygraph and related research has been supported primarily by law
enforcement and national security agencies whose concerns have been with practical detection of deception, not with advancing science. These concerns are perfectly valid, but they have impeded scientific progress. The fact that polygraph testing combines a diagnostic test and an interrogation practice in an almost inextricable way would be a major concern for any scientist seeking to validate the diagnostic test. The cultures of those parts of the agencies that deal with law enforcement and counterintelligence do not include traditions of scientific peer review, open exchange of information, and open critical debate that are common in scientific work. (The U.S. Department of Defense Polygraph Institute has, in the past few years, shown signs of becoming an exception to this generalization.) The culture of practice in security agencies, combined with the strong belief of practitioners in the utility of the polygraph, have made it easy for those agencies to continue their old practices. Thus, research has until quite recently focused almost exclusively on the polygraph and has been conducted within agencies that are committed to using the polygraph, believe strongly in its utility, and have seen little need to seek alternative techniques.
Our conversations with practitioners at several national security agencies indicate that there is now an openness to finding techniques for the psychophysiological detection of deception that might supplement or replace the polygraph. However, both these conversations and the recent research that these agencies have sponsored on alternatives to the polygraph show a continuing atheoretical approach that does not build on or connect with the relevant scientific research in other fields.
Criticisms of the scientific basis of polygraph testing have been raised since the earliest days of the polygraph. An indication of the state of the field is the fact that the validity questions that scientists raise today include many of the same ones that were first articulated in criticisms of Marston’s original work in 1917:19
My greatest reason for persistent skepticism as to the real use of the test, however, arises from the history of the subject. . . . The net result has been, I think to show that organic changes are an index of activity, of “something doing,” but not of any particular kind of activity . . . but the same results would be caused by so many different circumstances, anything demanding equal activity (intelligence or emotional) that it would be impossible to divide any individual case.
Another assessment remains as true today as when it was written a half century ago (Guertin and Wilhelm, 1954:153): “There has been rela-
tively little theoretical evaluation of the processes underlying the responses to lie detector procedure since lie detection instruments and techniques have been developed empirically in the field.”
That assessment was in the introduction to a study that used factor analysis to examine the relationships of ten indices of electrodermal response and reduced them to two factors believed to have different psychological significance—one related to deception and the other to “test fright” and adaptation. Their research goal, as appropriate now as then, was to reveal basic links between psychological and physiological processes and thereby build scientific support for the choice of particular indicators of deception. This style of research, aimed at building a theory of the psychophysiological detection of deception by careful evaluation of empirical associations, has been little pursued. The same can be said of other strategies of theory building that draw on direct measurement of physiological phenomena, the techniques for which have been revolutionized over the past several decades.
Essentially the same criticism was voiced two decades ago by the U.S. Office of Technology Assessment (1983:6):
The basic theory of polygraph testing is only partially developed and researched. . . . A stronger theoretical base is needed for the entire range of polygraph applications. Basic polygraph research should consider the latest research from the fields of psychology, physiology, psychiatry, neuroscience, and medicine; comparison among question techniques; and measures of physiological research.
More intensive efforts to develop the basic science in the 1920s would have produced a more favorable assessment in the 1950s; more intensive efforts in the 1950s would have produced a more favorable assessment in the 1980s; more intensive efforts in the 1980s would have produced a more favorable assessment now. A research strategy with better grounding in basic science might have led to answers to some of the key validity questions raised by earlier generations of scientists. Polygraph techniques might have been modified to incorporate new knowledge, or the polygraph might have been abandoned in favor of more valid techniques for detecting deception. As we have suggested, the failure to make progress seems to be structural, rather than a failure of individuals. We continue this issue in Chapter 8, where we offer some recommendations for redesigning the research enterprise that might address the structural impediments to progress.
One cannot have strong confidence in polygraph testing or any other technique for the physiological detection of deception without an ad-
equate theoretical and scientific base. A solid theoretical and scientific base can give confidence about the robustness of a test across examinees and settings and against the threat of countermeasures and can lead to its improvement over time. The evidence and analysis presented in this chapter lead to several conclusions:
The scientific base for polygraph testing is far from what one would like for a test that carries considerable weight in national security decision making. Basic scientific knowledge of psychophysiology offers support for expecting polygraph testing to have some diagnostic value, at least among naive examinees. However, the science indicates that there is only limited correspondence between the physiological responses measured by the polygraph and the attendant psychological brain states believed to be associated with deception—in particular, that responses typically taken as indicating deception can have other causes.
The accuracy of polygraph tests can be expected to vary across situations because physiological responses vary systematically across examinees and social contexts in ways that are not yet well understood and that can be very difficult to control. Basic research in social psychophysiology suggests, for example, that the accuracy of polygraph tests may be affected when examiners or examinees are members of socially stigmatized groups and may be diminished when an examiner has incorrect expectations about an examinee’s likely innocence or guilt. In addition, accuracy can be expected to differ between event-specific and screening applications of the same test format because the relevant questions must be asked in generic form in the screening applications. Accuracy can also be expected to vary because different examiners have different ways to create the desired emotional climate for a polygraph examination, including using different questions, with the result that examinees’ physiological responses may vary with the way the same test is administered. This variation may be random, or it may be a systematic function of the examiner’s expectancies or aspects of the examiner-examinee interaction. In either case, it places limits on the accuracy that can be consistently expected from polygraph testing.
Basic psychophysiology gives reason for concern that effective countermeasures to the polygraph may be possible. All of the physiological indicators measured by the polygraph can be altered by conscious efforts through cognitive or physical means, and all the physiological responses believed to be associated with deception can also have other causes. As a consequence, it is possible that examinees could take conscious actions that create false polygraph readings.
Available knowledge about the physiological responses measured by the polygraph suggests that there are serious upper limits in principle
to the diagnostic accuracy of polygraph testing, even with advances in measurement and scoring techniques. Polygraph accuracy may be reaching a point of diminishing returns. There is only limited room to improve the detection of deception from the physiological responses the polygraph measures.
Although the basic science indicates that polygraph testing has inherent limits regarding its potential accuracy, it is possible for a test with such limits to attain sufficient accuracy to be useful in practical situations, and it is possible to improve accuracy within the test’s inherent limits. These possibilities must be examined empirically with regard to particular applications. We examine the evidence on polygraph test performance in Chapters 4 and 5.
The bulk of polygraph research can accurately be characterized as atheoretical. The field includes little or no research on a variety of variables and mechanisms that link deception or other phenomena to the physiological responses measured in polygraph tests.
Research on the polygraph has not progressed over time in the manner of a typical scientific field. Polygraph research has failed to build and refine its theoretical base, has proceeded in relative isolation from related fields of basic science, and has not made use of many conceptual, theoretical, and technological advances in basic science that are relevant to the physiological detection of deception. As a consequence, the field has not accumulated knowledge over time or strengthened its scientific underpinnings in any significant manner.
There has been no serious effort in the U.S. government to develop the scientific base for the psychophysiological detection of deception by the polygraph or any other technique, even though criticisms of the polygraph’s scientific foundation have been raised prominently for decades. The reason for this failure is primarily structural. Because polygraph and other related research is managed and supported by national security and law enforcement agencies that do not operate in a culture of science to meet their needs for detecting deception and that also believe in and are committed to the polygraph, this research is not structured within these agencies to give basic science its appropriate place in the development of techniques for the physiological detection of deception.
validity of inferences of deception with certain populations and in certain situations that have not been resolved by empirical research. These issues are raised later in the chapter; the relevant empirical data are discussed in Chapter 5. The other is that in the case of polygraph security screening, the empirical record necessary for an atheoretical justification of the test does not exist, and is unlikely to be developed, because of the difficulty of building a large database of test results on active spies, saboteurs, or terrorists.
This is the case even when the response reflects a change in the activation of a specific region of cortical tissue (see Sarter, Berntson, and Cacioppo, 1996).
Converging evidence is always important in making inferences using the subtractive method because this method assumes that components or processes can be inserted or deleted without altering other components or processes (e.g., relevant and control questions differ only because the relevant questions have special meaning to deceptive individuals). This may not be true in relevant-irrelevant and comparison question polygraph tests. In concealed information tests, when only those with the information can identify the relevant items, a differential physiological response provides the basis for a stronger inference.
Both terms are equal to P(deception AND physiological activity). Conditional probabilities show what proportion of a restricted sample have a certain property; thus they are ratios. The two conditional probabilities have the same numerator P(deception AND physiological activity), but different denominators p(deception) and p(physiological activity). With low base rates of deception and somewhat inaccurate tests, p(deception) can be orders of magnitude smaller than p(physiological activity), and so p(deception given physiological activity) can be orders of magnitude smaller than p(physiological activity given deception).
Tests that are less accurate than DNA matching can have diagnostic value for detecting deception even though they are imperfect. Chapter 7 discusses the policy issues raised by using such tests, either alone or in combination with other sources of information, in security screening and other applications.
If a test is 100 percent specific, the prosecutor’s fallacy is not a fallacy. For example, given the current state of DNA matching, finding blood with DNA that matches the defendant’s on the victim means it is virtually certain that the defendant was there and constitutes strong evidence against the defendant unless the defense has another reasonable explanation of how the blood got there.
Some of these threats to validity can be ruled out if the test design provides adequate standardization or other controls. Efforts to standardize the interview process and the specific relevant and comparison questions across examinations can be helpful in this regard, and there is some such standardization in some tests, such as the Test of Espionage and Sabotage, that are used in federal employee screening programs. In addition, the concealed knowledge test approach rules out the possibility that extraneous factors may elicit differential responses to relevant and comparison questions by innocent examinees because they have no way of knowing which are the relevant questions.
The effect might be different on concealed information tests. Examinees who do not have concealed information would not be able to respond differentially to relevant questions on these tests because they do not have the information needed to recognize those questions. Examinees who have concealed information, however, might respond differentially to relevant questions, with the possible result that the rate of false negative errors would be lower for stigmatized than unstigmatized groups.
According to signal detection theory, it would be appropriate for expectancies about the probability that an examinee is deceptive to be reflected in the decision about what