Alternative Techniques and Technologies
Public officials responsible for maintaining national security should consider polygraph policies in relation to other policy options that rely on alternative means of detecting deception and deterring violations to security. Their decisions must consider the net benefits and costs of a range of options for achieving these objectives by using the polygraph and other techniques for detecting deception that may supplement or substitute for the polygraph.
This chapter considers some of those alternative techniques. It focuses in particular on the potential of recently emerging technologies, including those that measure brain activity, some of which have recently received considerable attention, and those that rely on measures of externally observable behaviors. In Chapter 7 we take up issues involved in making policy decisions about the use of these techniques, including ways of assessing the costs and benefits of using particular techniques and ways of combining techniques.
Techniques for detecting real and potential violations of security can be roughly divided into four classes. The first class includes, but is not restricted to, the polygraph itself. This class considers physiological indicators of autonomic and somatic activity that are not detectable without special sensing equipment. In this chapter we discuss some of the members of this class other than the polygraph. The second class includes techniques involving observations of brain function. This class is attractive on grounds of basic psychophysiology because of the possibility that appropriately selected brain measures might get closer than any auto-
nomic measures to psychological processes that are closely tied to deception. Brain activity can be measured with modern functional imaging techniques such as positron emission tomography (PET) and magnetic resonance imaging (MRI, often referred to as functional MRI or fMRI when used to relate brain function to behavior), as well as by recording event-related potentials, characteristics of brain electrical activity following specific discrete stimuli or “events.” The third class of techniques attempts to achieve detection of deception from demeanor: these techniques usually involve careful observation of specific behaviors of examinees (e.g., voice, facial expression, body movements, choice of words) that can be observed with human sense organs but may also be measured with scientific equipment. The fourth class is based on overt, direct investigations and includes employment questionnaires; background checks; and employee surveys, questionnaires, and paper-and-pencil tests. We consider each of these in turn.
The polygraph is the best-known technique for psychophysiological detection of deception. The goal of all of these techniques is to detect deception by analyzing signals of changes in the body that cannot normally be detected by human observation. The physiological phenomena recorded by the polygraph are only a few of the many physiological phenomena that have been characterized since the polygraph was first introduced and that might, in principle, yield signals of deception.
The polygraph relies on measurements of autonomic and somatic activity. That is, it analyzes signals of peripheral physiological activities associated with arousal and emotion. The traditional measures used in polygraph testing are cardiovascular (i.e., changes in heart rate and blood pressure), electrodermal (i.e., changes in the electrical properties of the skin that vary with the activity of the eccrine sweat gland), and respiratory (see Chapter 3). These are among the oldest measures used by psychophysiologists.
A wider variety of visceral events can now be recorded noninvasively, including myocardial contractility, cardiac output, total peripheral resistance, skin temperature (thermography), and vascular perfusion in various cutaneous tissue beds (Blascovich, 2000; Cacioppo, Tassinary, and Berntson, 2000a). Several of these measures provide clearer information than traditional polygraph measurements about the underlying neurophysiological events that produce visceral adjustments. Given appropriate measurement contexts and controls, for instance, respiratory sinus arrhythmia can be used to reflect cardiac vagal activation, and myocardial contractility (e.g., as assessed by pre-ejection period) can be used to
measure cardiac sympathetic activation (e.g., Berntson et al., 1994; Cacioppo et al., 1994).
Because some of these measures are closer than polygraph-based measures to the specific physiological processes associated with arousal, there are theoretical reasons to expect that they might offer better indicators of arousal than those used in polygraph testing. However, although some of these measures have advantages over polygraph measures on grounds of theoretical psychophysiology, they may not actually map more closely to psychological variables. Like the polygraph indicators, measures such as myocardial contractility and respiratory sinus arrhythmia are influenced by sundry social and psychological factors (e.g., Berntson et al., 1997; Gardner, Gabriel, and Diekman, 2000). These factors might result in false positive test results if an examinee is aroused by something other than deception (e.g., a concern about false accusations) or might provide a basis for countermeasures.
Despite these caveats, various researchers have proposed the use of some of these autonomic measurements as alternatives or adjuncts to the four basic channels that are part of the standard polygraph measurement instrument. The limited research on these measures does not offer any basis for determining where they may fit in the array of possible physiological measurements. The studies generally report on the accuracy of tests using a particular measure in small samples or in uncontrolled settings.
A recent report on thermal imaging illustrates the difficulties we have had in assessing whether these peripheral measures are promising and precisely how research on them should be pursued. In 2001, investigators at the U.S. Department of Defense Polygraph Institute (DoDPI), collaborating with outside researchers, carried out a pilot study (Pollina and Ryan, 2002) using a comparison question format polygraph for a mock crime scenario with 30 examinees who were trainees at an army base. Their goal was to investigate the possible utility of a new device for thermography that measures the radiant energy emitted from examinees’ faces, as an adjunct or alternative to the traditional polygraph measurements. Thermography has an important potential advantage over the polygraph in that it does not require an examinee to be hooked up to a machine.
Five of the original examinees in the study were dropped because they were uncooperative or had other problematic behavior. Of the remaining 25, 12 were programmed to be deceptive and 13 were programmed to be nondeceptive. The outside researchers published a report (Pavlidis, Eberhardt, and Levine, 2002) claiming that the thermal imaging results alone achieved higher accuracy than the polygraph on nondeceptive examinees (11 of 12 subjects correct for thermal imaging compared
with 8 of 12 for the polygraph) and equivalent accuracy on deceptive ones (6 of 8 correct). Unfortunately, the published report uses only a subset of the examinees and offers no information on the selection process. It also gives no information on the decision criteria used for judging deceptiveness from the thermographic data.
The DoDPI researchers were interested in the possibility of combining the new information with that from the traditional polygraph channels. This required a new effort at computer scoring, as well as an explicit effort at extracting statistical information from the thermal recordings. The DoDPI report indicates moderately high correspondence with experimental conditions for polygraph testing (an accuracy index [A] of 0.88), relatively low correspondence with thermal signals alone (A of 0.70), and some incremental information when the two sets of information are combined (A of 0.92). Despite the public attention focused on the published version of this study in Nature (Pavlidis, Eberhardt, and Levine, 2002), it remains a flawed and incomplete evaluation based on a small sample, with no cross-validation of measurements and no blind evaluation. It does not provide acceptable scientific evidence to support the use of facial thermography in the detection of deception.
MEASUREMENTS OF BRAIN FUNCTION
The polygraph and other measures of autonomic and somatic activity reflect the peripheral manifestations of very complex cognitive and affective operations that occur when people give deceptive or nondeceptive answers to questions. By their very nature, polygraph measurements provide an extremely limited and indirect view of the complex underlying brain processes. A reasonable hypothesis is that by looking at brain function more directly, it might be possible to understand and ultimately detect deception. This section discusses some brain measurement technologies that are beginning to be explored for their ability to yield techniques for the psychophysiological detection of deception.
Functional Brain Imaging
Over the past 15 years, the field of cognitive neuroscience has grown significantly. Cognitive neuroscience combines the experimental strategies of cognitive psychology with various techniques to actually examine how brain function supports mental activities. Leading this research are two new techniques of functional brain imaging: positron emission tomography (PET) and magnetic resonance imaging (MRI) (see Buxton  and Carson, Daube-Witherspoon, and Herscovitch  for comprehensive general reviews). Over the past 5 years, these techniques have
been used to study affective processes (see Davidson and Irwin, 1999), and there is a burgeoning literature on the neural correlates of cognitive and affective processes that is potentially relevant to psychophysiological detection of deception. Their use to study brain activity associated with deception is only beginning.
PET uses a measure of local blood flow, which invariably accompanies changes in the cellular activity of the brain of normal, awake humans and unanesthetized laboratory animals (for a review, see Raichle, 1987). More recently it has been appreciated that these changes in blood flow are accompanied by much smaller changes in oxygen consumption (Fox and Raichle, 1986; Fox et al., 1988). These changes lead to changes in the actual amount of oxygen remaining in blood vessels at the site of brain activation (i.e., the supply of oxygen is not matched precisely with the demand). Because MRI signal intensity is sensitive to the amount of oxygen carried by hemoglobin (Ogawa et al., 1990), this change in blood oxygen content at the site of changes in brain activity can be detected with MRI (Bandettini et al., 1992; Frahm et al., 1992; Kwong et al., 1992; Ogawa et al., 1992). The detection of these blood-oxygen-level-dependent (BOLD) signals with MRI has become known as functional magnetic resonance imaging or fMRI. Research with fMRI is now providing increasingly detailed maps of human brain function.
Several recent studies provide the beginnings of a scientific underpinning for using fMRI measures for detecting deception. These studies include research on knowledge and emotion. For example, some recent work (e.g., Shah et al., 2001; Tsivilis, Otten, and Rugg, 2001) suggests that seeing familiar names or faces produces distinctively different areas of brain activation than unfamiliar names or faces. In addition, to the extent that deception is associated with increased activation of circuitry associated with anxiety, activation of the amygdala and regions of the prefrontal cortex both reliably accompany certain forms of anxiety (Davidson, 2002). Such studies can help build a theory linking deception to psychological states and specific physiological correlates that might be applied in the future to develop neuroimaging methods for the detection of deception.
Other research is examining the connections between brain activity and phenomena that the polygraph measures. For example, at least five studies combining functional imaging (both PET and fMRI) with simultaneous measurements of the skin conductance response have investigated the brain basis of the conductance response (Critchley et al., 2000; Fredrikson et al., 1998; Raine, Reynolds, and Sheard, 1991; Williams et al., 2000, 2001). These studies show that it reflects a complex interplay in areas of the brain implicated in both emotion regulation and attention. These studies are complemented by parallel studies in patients with well-
characterized lesions (Tranel and Damasio, 1994; Zahn, Grafman, and Tranel, 1999). The results of these studies underscore the complexity of the circuitry involved and also illustrate how the relationship between brain function and behavior can be understood in more detail when information on the former is directly available.
More immediately relevant to the use of fMRI for the detection of deception are the very few recent studies that use fMRI to identify associations between deception and specific brain activity. One recent study adapted the guilty knowledge test format for use with fMRI (Langleben et al., 2001). In 23 normal subjects, it was possible to detect localized activity changes in the brain that were uniquely associated with deception. Remarkably, these changes occurred in areas of the brain known to participate in situations involving response conflict (Miller and Cohen, 2001). In the study, the conflict involved overriding one (correct) response and providing a second (false or deceptive) response to a question.
Another study (Spence et al., 2001) used fMRI to study deception in an autobiographical memory task in which examinees were instructed to be truthful or to lie. The findings from this experiment indicated that during lying, compared with truthful responding, examinees exhibited significantly greater activation in the ventrolateral prefrontal cortex and the medial prefrontal cortex. Activation in several additional regions differentiated less strongly between the experimental conditions. In yet another recent study, Lee and colleagues (2002) instructed some subjects to feign a memory problem and deliberately do poorly on two memory tasks. One involved memorizing a three-digit number and reporting its correspondence with another number presented 2.25 seconds later; the other involved memory for the answers to such autobiographical questions such as “Where were you born?” The researchers reported differential patterns of activation that held across the two tasks when feigned memory impairment was compared with control conditions. The findings from this study revealed a distributed set of activations that included several regions of the prefrontal, parietal, and temporal cortices, the caudate nucleus, and the posterior cingulate gyrus.
The above studies suggest what might in principle be achieved by using a technique such as fMRI for the detection of deception. They also suggest the kinds of information needed in brain-based studies of detecting deception. These investigations seek to identify signatures of particular kinds of cognitive activity in brain processes. Yet even if fMRI studies could eventually identify signatures of acts of deception, it would be premature to conclude that fMRI techniques would be useful in practice for lie detection. Applied fMRI studies of the kinds done so far have similar limitations to those of typical laboratory polygraph research. They have limited external validity: the experimental lies were not high-stakes
ones, and no penalty was presented for a failure to successfully deceive. They also have some similar limitations at the level of the basic science. For example, the brain regions activated by deception in the research on feigned memory impairment are activated not only during deception. Their activation probably reflects the very complicated constellation of cognitive and affective processes that are involved in particular kind of task. Identifying areas of brain activation that are specific to deception is not on the horizon, and it is by no means clear that such areas will ever be identified.
There are also several major methodological obstacles to be overcome in the use of fMRI for the detection of deception. First, studies with fMRI, including those mentioned here, involve the averaging of information over examinees. While such a strategy is enormously powerful for understanding general processes within the human brain, it ignores the need to obtain information on particular individuals that is central to the use of fMRI in the detection of deception. Only recently has work begun on the study of individual differences with fMRI, and much more will need to be done to optimize signal and reduce noise in such images so as to take individual differences into account. While this is very likely to be achieved in time, fMRI analysis is expensive and time-consuming (sometimes as long as 2 to 3 hours per examinee), and the analysis of these data is likely to remain complex for the foreseeable future. For these reasons, fMRI is not presently useful for the psychophysiological detection of deception in many applied settings, and the complexity of analysis may be a prohibitive factor for all applications, for quite some time. Nonetheless, much valuable new information can be learned from research using this powerful technique to advance theoretical understanding of the kinds of cognitive processes involved in deception and perhaps to identify the brain mechanisms underlying countermeasures designed to prevent its detection. Acquisition of such information will be important if new and more effective techniques for detecting deception are to be developed.
EEG and Event-Related Potentials
Caton (1875) was the first to show that electrical activity of the human brain can be detected from electrodes placed on the scalp. It was Berger’s invention of the electroencephalogram (EEG) some years later (Berger, 1929) that made recording of these signals a practical reality. Since then they have been successfully exploited for diagnostic as well as research purposes. Davis (1939) was the first to notice event-related changes in the EEG that have subsequently become known as event-related potentials.
He observed a large negative response in the EEG about 100 to 200 milliseconds after each presentation of an auditory stimulus.
Brain electrical activity is typically measured in terms of either frequency or time. In frequency analyses, the complex waveforms recorded from the scalp are decomposed into underlying frequencies (using a mathematical transformation, such as the Fourier transformation). Time analyses are often referred to as event-related potentials, which represent averages of the brain electrical signals in relation to an external stimulus or subject response after a certain time interval. There are many advantages and a number of distinct disadvantages of this method for measuring human brain function. One of the key advantages is that brain electrical activity measures have excellent time resolution, allowing researchers to resolve changes that occur in milliseconds. Another distinct advantage is that measurement is completely noninvasive and so can be used repeatedly in an individual and can be made relatively portable. The major disadvantage is that event-related potentials provide only coarse information about the neural sources of the activity that is measured at the scalp.
There is an established tradition of using measures of brain electrical activity to make inferences about neural correlates of cognitive and affective processes (see Hugdahl, 1995, for review). The fact that brain electrical activity can be clearly connected in time to the occurrence of discrete external events provides a potentially powerful tool for investigating the neural correlates of deception.
A number of studies have attempted to use event-related potentials to examine different aspects of deception. In one of the earliest applications of this methodology, Rosenfeld and his colleagues (1987) allowed examinees to choose an item to keep from a box that contained nine items and used a form of the guilty knowledge test to tell which one was selected. Examinees were instructed not to react as the items were named, to try to defeat this test of deception. A large positive component was present in the event-related potentials between 400 and 700 milliseconds after the presentation of the chosen item but not after the other items. In another study, Rosenfeld and colleagues (Rosenfeld et al., 1991) investigated the modulation the P300 component of the event-related potential during deception (P300 is a positive wave of the event-related potential that occurs approximately 300 milliseconds following a stimulus). There is a very large literature on the psychological significance of the P300, and it appears to reflect task relevance, stimulus probability, or information processing resources being used (see Donchin and Coles, 1988, for a review). Rosenfeld et al. (1991) used a hybrid test format that they characterized as a control question test to ask about a series of antisocial acts, one of which
the guilty examinees had conducted in a simulation. When the acts were reviewed and rehearsed on the day of the study, 12 of 13 guilty subjects and 13 of 15 innocent subjects were correctly classified on the basis of the P300 amplitude. However, when evaluation of the event-related potentials was conducted on a separate day from the review and rehearsal of the target acts, only 3 of 8 subjects were correctly classified.
Variants of these studies using concealed information formats have since appeared. They typically indicate that the P300 component of the event-related potential, when examined under specific restricted laboratory conditions, can accurately classify approximately 85 percent of examinees in simulation experiments (e.g., Farwell and Donchin, 1991; Johnson and Rosenfeld, 1992; Allen and Iacono, 1997). This level of accuracy is roughly the same as that reported for simple electrodermal measures (see MacLaren, 2001, for review).
In a recent study, Farwell and Smith (2001) used a composite measure of brain electrical activity, including the P300 and other metrics, to examine reactivity to autobiographical information. They report extremely high accuracies of classifying examinees according to the knowledge they possess. However, the range of stimuli to which examinees were exposed was small, and the sample size was very small (only three examinees per condition). Whether these findings generalize to other, more complex contexts in larger groups is not known.
Three recent unpublished studies (Johnson et al., 2002a, b, c) further explore the role of event-related potentials (the P300, the N100, and related measures) and behavioral measurements in understanding the underlying mechanisms involved in making deceptive responses. This work deals with issues such as response conflict and the conscious regulation of actions; it is similar to work in cognitive neuroscience using fMRI techniques. Both approaches emphasize the importance of specific control processes in the mental activities that must underlie deception. They also have similar shortcomings in terms of their applicability to the psychophysiological detection of deception. As with the fMRI studies, this research has not yet included controlled trials that allow assessment of regularities within individual examinees.
These studies have not systematically investigated the incremental validity of event-related potential measures in comparison with what might be achieved with the indicators traditionally used in the polygraph or the possibility that combining the polygraph with P300 might yield better classification than either approach alone. In addition, it is not known whether simple countermeasures could potentially defeat this approach by generating brain electrical responses to comparison questions that mimic those that occur with relevant questions.
DETECTION OF DECEPTION FROM DEMEANOR
Some techniques for detecting deception are based on the interpretation of subtle signals in behavior or demeanor, defined here as activities of an individual that can be observed with the usual human senses, without physical contact with the individual and therefore, potentially, without the individual’s knowledge. Demeanor includes, among other things, gaze, posture, facial expressions, body movements, sound of the voice, and the patterns and content of speech when one person talks to another during an interview, interrogation, or any other conversation. We use the term detection of deception from demeanor to refer to efforts to discriminate lying from truth-telling on the basis of such cues. There can be a fine line between such detection and peripheral measurement of autonomic responses, as suggested, for example, by thermal imaging techniques. These techniques can detect both phenomena that a trained observer can learn to discriminate (such as blushing) and others that are beyond the capabilities of human senses because they involve infrared emissions. Because thermal imaging primarily measures infrared emissions, we classify it with techniques for the psychophysiological detection of deception.
Several authors have reviewed the large body of research connecting lying or truth-telling to cues from demeanor (Zuckerman, DePaulo, and Rosenthal, 1981, 1986; Zuckerman and Driver, 1985; DePaulo, Stone, and Lassiter, 1985; DePaulo et al., 2001; Ekman, 2001). Because this research is rooted in social psychology more than in law enforcement or counterintelligence practice, it has a somewhat different flavor and focus than the polygraph research (reviewed in Chapters 4 and 5). Many of the studies, for example, concern everyday “white lies” and other deliberate untruths that may be quite different psychologically from serious lies or truth telling, such as occur about suspected criminal activity or espionage. Their findings may not transfer to such practical settings. Some of the reviews do not analyze results in a way that shows how many subjects were correctly or incorrectly classified as liars or truth-tellers and how many could not be classified. Also, many of the studies focus on specific demeanor cues or classes of cues, rather than on building a full capability for detecting deception from demeanor by combining information on any aspects of demeanor that might provide useful information. For such reasons, large segments of the research have very limited practical relevance for criminal or security investigation contexts. In addition, most of the research has limitations in terms of external validity, as does most polygraph research: for example, the stakes are almost always low, and there are no negative consequences for being judged to be lying. In this context, it is worth noting the results from one meta-analytic study (DePaulo et al., 2001) indicating that the associations of demeanor indica-
tors with deception tended to increase when people were more highly motivated to succeed and when lies were about transgressions rather than about less personally significant matters. If these findings are robust, the accuracy of detection of deception from demeanor may be expected to increase with the stakes, so that it would perform better in real criminal or security investigation settings than in much of the research literature.
This section is organized by classes of demeanor cues for which there has been significant research attention to the detection of deception. We conclude with an assessment of the potential of cues from demeanor as tools in criminal and security screening.
Facial and Body Movement
Some studies of demeanor and deception show internally reliable associations of facial or movement cues with deception. Few of these associations, however, have appeared consistently across large numbers of studies, and some cues associated with deception in conventional wisdom, such as avoidance of eye contact, have not shown reliable associations with deception in research studies (DePaulo et al., 2001). The meta-analytic literature fails to identify any pattern of facial or body movement that generally signals deception. However, some studies designed to develop indicators based on these movements show some ability to discriminate lying from truth-telling. For example, Ekman and his colleagues studied lying or truth-telling under fairly strong motivational conditions about three different matters: emotions felt at the moment (Ekman et al., 1991), a strongly held belief, and whether money was taken (Frank and Ekman, 1997). The motivation for the liar was either career success (for lies about emotions) or $100 (for the belief and money scenarios). Punishment involving loss or money and isolation in a dark room was also threatened for anyone, truthful or not, who was judged to be lying. An index based on precise measurement of all facial movement, voice pitch, and one body movement yielded a hit rate of 58 percent of all examinees lying about their emotions—86 percent of those who could be classified by the index. The study of lies about actions and beliefs with only measures of facial behavior yielded a hit rate of 75 percent for the lie about beliefs and 70 percent for the lie about actions. These studies suggest that the right measures of facial and motion features can offer accuracy better than chance for the detection of deception from demeanor in somewhat realistic situations. At present, the measurement of facial behavior and body movement is very labor intensive; recent work suggests, however, that it will be possible to automate the measurement of facial movements (Bartlett et al., 1999; Cohn et al., 1999).
Several different aspects of language use seem to be consistently associated with deception. For some of the strongest associations, such as with immediacy of expression (e.g., using active or passive voice, affirmations or negations), observers’ subjective impressions have been more strongly correlated with deception than the objective measures that have been tested (DePaulo et al., in press). This finding suggests that efforts to design measures for the detection of deception based on language use may have untapped potential.
There have been a few efforts to develop such techniques. For example, one field study (Smith, 2001) evaluated scientific content analysis, developed by Sapir (1987), using statements made by criminal suspects who were later confirmed to be either lying or truthful. This approach can only be applied to written statements made by the suspect without assistance. Trained policemen correctly detected 80 percent of truthful statements and 75 percent of deceptive statements, but experienced policemen not trained in the technique were just as accurate. The study design did not make it possible to tell whether the examiners might have been making judgments based on their own experience rather than by using the principles for the technique. In either case, the study strongly suggests that close examination of how a suspect describes an incident of interest is likely to be fruitful. Pennebaker, Francis, and Booth (2001) and Newman and colleagues (2002) applied a computer program for analyzing five different aspects of language usage (e.g., first person or third person pronouns) to interviews about laboratory lies when the stakes were minimal. The program accurately classified 68 percent of those who lied and 66 percent of those who were truthful.
Another technique for analyzing cues in language is statement validity analysis (Horowitz, 1991; Lamb et al., 1997; Porter and Yuille, 1996; Steller and Koehnken, 1989). This technique, which involves content analysis of in-depth accounts of alleged events, has been used primarily to assess statements of victims or witnesses. There is evidence that credible accounts are more likely to contain an appropriate amount of detail about the alleged event (e,g., Steller and Koehnken, 1989; Porter and Yuille, 1996). Very little research has been done, however, on the technique’s applicability to statements by criminal suspects, some of whom may be unwilling or unable to provide detailed accounts (Porter and Yuille, 1995).
In sum, the available evidence suggests that analysis of language usage and of facial and body movement might be useful in distinguishing lies from truth. It is reasonable to expect that accuracy can be improved by using measures that combine information from several channels (e.g.,
facial expression, various body movements, posture, and various measures of speech). The evidence suggests that such measures are likely to have the greatest success when lies have high personal relevance, when the stakes are high, when the liar knows he or she is telling a lie when it is being told, and before there has been opportunity to practice and rehearse the lie (Ekman, 2001; DePaulo et al., 2001). So far, however, no research has been done combining all of the behavior measures and testing their accuracy under the appropriate circumstances.
Given the apparent potential for the detection of deception from demeanor and the difficulty and limited effectiveness of objective measurement so far, the question arises whether it might be possible to train observers to make accurate judgments from demeanor without formal measurements. Without training, most observers, even experienced law enforcement personnel or security officers, cannot do much better than chance, and their confidence in their judgment is unrelated to accuracy (Ekman and O’Sullivan, 1991; Ekman, O’Sullivan, and Frank, 1999). Some groups, however, do perform better than chance in detecting lies from demeanor just by viewing videotapes. A group of U.S. Secret Service agents averaged 64 percent correct judgments when chance performance was 50 percent, with about half of them achieving an accuracy level of 70 percent or more (Ekman and O’Sullivan, 1991). No studies have yet been done to determine if those who do poorly in detecting deception from demeanor can be trained to become very accurate. However, a review of the research on training effects in deception studies showed a moderate improvement (Frank and Feeley, 2002).
Voice Stress Analysis
The research on the detection of deception from demeanor includes the presumption that liars experience more stress than truth-tellers, especially in high-stakes circumstances, and that this stress shows in various channels, including in the voice. Recent meta-analytic evidence shows consistent associations of lying with vocal tension and high pitch (DePaulo et al., in press). Applied efforts to develop measures of voice stress for the detection of deception have not been very successful, however.
As early as 1941, Faye and Middleton attempted to use human judgment of voice responses to determine deceptions of subjects told to answer a series of questions either truthfully or untruthfully. Their methodology yielded correct judgments for truthful responses at essentially chance levels and slightly higher rates of correct judgments for untruthful
responses. Other studies, for example by Motley (1974), Horvath (1978, 1979), Lynch and Henry (1979) and Brenner, Branscomb, and Schwartz (1979), have attempted, with limited success at best, to extract information from recorded voice signals to measure stress in analogue studies and then to use the resulting determination as an indirect indicator of deception in much the same way as is done in polygraph research.
Various instruments have been developed over the past 20 years or more that purport to detect deception by means of signals of “voice stress” as reflected in intensity, frequency, pitch, harmonics, and even microtremors. One of the more widely used devices is the computer voice stress analyzer, manufactured by the National Institute for Truth Verification (NITV), which is now used by a number of law enforcement agencies. The underlying theory for the analyzer and some of its predecessor instruments is that the instrument detects physiological microtremors in muscles in the voice mechanism that are associated with deception.
In addition to manufacturing the computer voice stress analyzer, NITV publishes its own journal reporting on the ease of use of the analyzer and its utility in obtaining confessions. NITV also trains and certifies voice stress analysts using protocols for question format and sequences of relevant and irrelevant questions that are remarkably like those used for polygraph testing. The polygraph seems to be the reference point and the target of marketing for NITV and the analyzer. For example, Tippett (1995), writing in the NITV journal, argues that earlier failures to obtain high accuracy rates with the analyzer and similar devices were largely due to the low levels of jeopardy involved in the analog studies. He reports on a study of 54 subjects undergoing mandatory therapy as a condition of probation for past sex offenses and claims to have found “100 percent agreement between the [computer voice stress analyzer] and the polygraph” in the judgments of examiners for the respective techniques. The article does not report on the methods used for scoring or for determining truth, so is not usable for judging the accuracy of the analyzer.
Although proponents of voice stress analysis claim high levels of accuracy, empirical research on the validity of the technique has been far from encouraging. First, the reliability of this method is highly suspect (Horvath, 1978; Waln and Downey, 1987). The agreement between readings of the same voice stress charts by independent analysts is generally low, and correlations of test results between interviews in their original form and recordings of the same interviews transmitted over the telephone are also low (Waln and Downey, 1987). Second, the validity of judgments made on the basis of voice stress analysis appears to be questionable (Lykken, 1981). For example, Horvath (1979) showed approximately chance level of success in identifying deception in mock crime
situations, and O’Hair and Cody (1987) found voice stress analyses to be unsuccessful in detecting spontaneous lies in a simulated job interview. Voice stress analysis may be more successful in detecting real crimes or other nontrivial deceptions, when the level of stress is presumably higher, but even in these cases the evidence of accuracy is rather slim.
During the 1990s, the U.S. Department of Defense Polygraph Institute (DoDPI) carried out a series of laboratory tests comparing the use of the computer voice stress analyzer and the polygraph using peak of tension and control question test formats. Cestaro and Dollins (1994) used a peak of tension test to compare with the analyzer in a standard laboratory comparison, and Cestaro (1996) and Janniro and Cestaro (1996) carried out comparisons with control question test formats for mock crime scenarios. These studies, which suffer from the same methodological deficiencies as most polygraph research, found that the computer voice stress analyzer was never significantly superior in its detection accuracy to the polygraph and that neither had exceptionally high correct detection rates. Palmatier (1996) conducted the only field test comparison, in collaboration with the Michigan Department of Police, using confirmed guilty and a group of presumably truthful examinees. Again, the analyzer results were close to chance levels (polygraph results were not reported). The detailed administration of the analyzer tests was severely criticized by the NITV, and the details of these criticisms are appended to the report. The most recently completed DoDPI study (Meyerhoff et al., 2000) compared the computer voice stress analyzer with biochemical and direct physiological measures of stress and concluded that the analyzer scores did not reflect the acute stress observed by more traditional stress measurements.
Overall, this research and the few controlled tests conducted over the past decade offer little or no scientific basis for the use of the computer voice stress analyzer or similar voice measurement instruments as an alternative to the polygraph for the detection of deception. The practical performance of voice stress analysis for detecting deception has not been impressive. It is possible that research conducted in high-stakes situations would give better results, but we have not found reports of the accuracy of voice stress analysis in such situations.
Handwriting analysis, or graphology, is sometimes used to make inferences about honesty, integrity, or dependability. The underlying theory is that various characteristics of a person’s handwriting provide information about his or her personality, including such traits as honesty or loyalty. Although there are serious questions regarding the validity of
assessments provided by this technique (Bar-Hillel and Ben-Shakhar, 1986; Ben-Shakhar, 1989), it is widely used, especially in Israel (BenShakhar et al., 1986) and Europe (Ben-Shakhar and Furedy, 1990). In the United States, more than 2,000 employers were thought to be using graphology in preemployment screening in the 1980s (Sinai, 1988).
Graphologists examine a number of specific structural characteristics of a handwriting sample (e.g., letter shapes and sizes) to make inferences about the writer. Graphologists typically insist that the sample must be spontaneous and that handwriting samples that involve copying text from a book or writing a passage from memory will not yield a valid reading. Graphologists often request a brief autobiographical sketch or some other sort of self-description (Ben-Shakhar, 1989; Ben-Shakhar et al., 1986).
Although there is some evidence of temporal stability and interrater agreement in graphological analyses (Tziner, Chantale, and Cusson, 1993), evidence regarding validity is limited, at best. Graphologists claim that their assessments and evaluations are the result only of close examination of the features of letters, words, and lines in the sample and are not influenced by the content or the quality of the writing sample (e.g., fluency, clarity of expression). This claim is called into question by two lines of evidence. First, when the same biographical passages are examined by graphologists and other analysts, their assessments of individual examinees tend to agree, and graphologists are no more accurate in their assessments than the other analysts (Ben-Shakhar et al., 1986; Ben-Shakhar, 1989). Indeed, predictions based solely on the content of writing samples, using a simple unweighted linear model based on information from the passages, were more accurate than those obtained from professional graphologists (Ben-Shakhar et al., 1986). Second, when the content of passages is not biographical in nature (e.g., meaningless text or text copied from some standard source), graphologists seldom make valid predictions. These findings strongly suggest that the graphological features of the writing do not increase the ability to make assessments of the writer.
The available evidence also casts doubt on graphologists’ ability to make even the most general assessments of individuals more accurately than others given the same materials (Ben-Shakhar, 1989; Jansen, 1973; Murphy, 1993; Neter and Ben-Shakhar, 1989; Rafaeli and Klimoski, 1983). This research suggests that assessments of specific characteristics, such as honesty and integrity, by graphology will not be successful. There is little, if any, empirical research that adequately assesses the accuracy of specific assessments made by graphologists (e.g., assessments of a candidate’s honesty), but given the generally dismal track record of graphologists in making global predictions, there is very little reason to believe that their more specific predictions will be any better.
Theoretically, it should be possible to detect deception from demeanor with some skill. And evidence from experimental and field studies has identified some cues emitted by people who are deceptive, particularly in high-stakes situations, that can be observed with human sense organs. Moreover, a small proportion of experienced interviewers exhibit skill in detecting deception from such cues. However, attempts to systematize such skill have so far been disappointing. Voice stress analysis and graphology, two commonly used techniques, have not convincingly demonstrated accuracy for detecting deception.
The gap between the promise and the practice of the detection of deception from demeanor has several possible explanations. It may be that different liars emit different cues, so that any standard protocol would have only limited accuracy. It may also be that research has not yet identified the most valid behavioral indicators of deception. The research has seemed to focus mostly on particular channels (e.g., facial expression, voice quality) rather than on developing an underlying theory of behavioral indicators or searching for several indicators, possibly including disparate channels, that have high accuracy in situations of interest. It seems possible that such approaches could lead to methods of detecting deception from demeanor with practical value. It is also possible that such methods might add information to what can be achieved by physiological indicators—though that possibility has not to our knowledge been investigated. In our judgment, the search for useful methods of detecting deception should not exclude efforts to find valid indicators in the subtleties of behavior.
Methods of direct investigation, such as background checks, interviews, and the like are already used for making personnel decisions, both with and without the polygraph as accompaniment. This section reviews what is known about the ability of these techniques to detect individuals who pose risks to their employers’ objectives.
Little scientific evidence is available about the validity of the background checks and other investigative methods that have been used to identify individuals who create threats to national security. There is some anecdotal evidence, however, on the value of these methods. Publicly available reports indicate that the spies who have been detected within
the U.S. government have been detected by normal investigative techniques. This track record supports the validity of investigations; it does not provide scientific evidence on their incremental value over polygraph testing or the incremental value of polygraph testing over background checks.
Some scientific evidence exists on reference checks and background investigations as used in the private sector for preemployment screening. Schmidt and Hunter’s (1999) meta-analysis on preemployment reference checks suggests that the information gained has at best only a modest correlation with performance on the job and in training.1 Background investigations are used by almost all police departments as part of their personnel selection processes (Decicco, 2000; Nelson, 1999). Researchers have advocated the development of structured protocols for investigating previous behaviors of job applicants (e.g., Dwyer, Prien, and Bernard, 1990), but there is little evidence of scientifically based approaches to background checks. On the contrary, background investigators are often untrained (Fuss, McSheey, and Snowden, 1998), and their investigations are rarely standardized. Background investigations might include obtaining photographs and fingerprints, conducting in-depth personal interviews, drug screens, the compilation and assessment of criminal history, employment history, military service, and driving records, as well as interviews with family members and persons familiar with the candidate (Harvey and Ward, 1996; Kirksey and Smith, 1998; Wright, 1991). These investigations often take 40 or more hours to complete (Harvey and Ward, 1996).
Empirical assessments of the validity of background investigations are rare. As with polygraph tests, the fact that background checks often yield derogatory or disqualifying information about those being evaluated is taken as prima facie evidence of their value. However, there have been instances where so much derogatory information is obtained that it becomes impossible to fill positions. Dickson (1986) described a program combining polygraphs with background investigations used in screening police applicants. Of the 2,711 applicants screened with this program, 1,626 (60 percent) were rejected, many of whom had committed serious felony crimes. Because a majority of applicants had used illegal drugs at some time, rejection standards had to be amended.
There are two factors that limit the utility of background checks as a general screening tool. First, they are time-consuming and expensive, and in most police departments and many other security-sensitive employers, staffing and budgetary constraints make it impossible to carry out background checks for most or all candidates. Second, these investigations can be intrusive, and applicants and the general public may regard the invasions of privacy that accompany background examinations
as unwarranted unless the candidate is under serious consideration for hiring. Most agencies that use background checks do so late in the selection process, after most applicants have been screened out and the applicant pool has been narrowed down to qualified candidates who have a reasonable chance of being considered for the job.
Standardized tests, though not commonly used to assess deceptiveness, are widely used by employers to assess conscientiousness, dependability, and integrity. These techniques have improved over time as a result of refinements and learning from research.2
An example is integrity testing. Such tests were used by 10 to 15 percent of all U.S. employers in the 1980s, concentrated in the retail sales, banking, and food service industries, and over 2.5 million tests were given by over 5,000 employers each year (O’Bannon, Goldinger, and Appleby, 1989). Current figures for integrity test use are probably even higher because of increasing awareness of the cost and extent of employee theft and increasing evidence of the validity of several widely distributed tests.
Virtually all integrity tests include items that refer to one or more of the following areas: (a) direct admissions of illegal or questionable activities, (b) opinions regarding illegal or questionable behavior, (c) general personality traits and thought patterns thought to be related to dishonesty (e.g. the tendency to constantly think about illegal activities), and (d) reactions to hypothetical situations that may or may not feature dishonest behavior.
Several reviews of research are available on the reliability, validity, and usefulness of integrity tests (Sackett, Burris, and Callahan, 1989; Goldberg et al., 1991; U.S. Office of Technology Assessment, 1983). The early reviews of research on integrity tests were sharply critical, but both the research and the tests themselves appear to have improved, partly as a result of the earlier criticism. There is now a substantial body of evidence showing that integrity tests have some validity for predicting a variety of criteria that are relevant to organizations. This research does not say that tests of this sort will eliminate theft or dishonesty at work, but it does suggest that individuals who receive poor scores on these tests tend to be less desirable employees.
Although the reviews all raise concerns and several lament the shortcomings of research on the validity of integrity tests, the general conclusion of the more recent reviews is positive. A large-scale meta-analysis that quantitatively summarized the outcomes of multiple validity studies (Ones, Viswesvaran, and Schmidt, 1993), found that scores on integrity tests were related to measures of job performance and counterproductiv-
ity.3 Different specific criteria have been used to assess validity in different studies: some studies have validated integrity tests against measures of counterproductive behavior; others have validated the tests against measures of general job performance. These two criteria are clearly not independent: employees who engage in a wide variety of counterproductive behavior are unlikely to be good performers. Nevertheless, there are important differences between the two criteria, and more important, differences in the validity of integrity tests for predicting the two. There is no literature correlating the results of these tests with indicators of the more specific kinds of counterproductive behavior of interest in national security settings.
Early research on the validity of employment interviews portrayed a consistently negative picture, with correlations to job performance often embarrassingly close to zero (Arvey and Campion, 1982; Hunter and Hunter, 1984; Reilly and Chao, 1982). More recent research suggests that structured interviews—for example, those that include questions about past and potential job situations—can be a useful and valid method of selecting employees (Campion, Pursell, and Brown, 1988; Wiesner and Cronshaw, 1988; Campion, Palmer, and Campion, 1997).
The applicability of these employee screening techniques to the national security context is unclear. The correlations alone do not suggest that they are likely to provide reasonable and valid alternatives to the polygraph. The evidence does suggest, however, that more focused questioning in an interview or testing format is likely to have greater predictive value than unfocused questioning and that standardized measures with acceptable reliability do better than unstandardized methods.
Various techniques for detecting deception have been suggested or might be used as substitutes for or supplements to the polygraph. None of them has received as much research attention as the polygraph in the context of detecting deception, so evidence on accuracy is only minimal for most of the techniques. Some of the potential alternatives show promise, but none has yet been shown to outperform the polygraph. None shows any promise of supplanting the polygraph for screening purposes in the near term. Our conclusions are based on basic scientific knowledge and available information about accuracy.
Some new or additional autonomic measures for detecting deception seem, on theoretical grounds, to be closer than polygraph measures to the psychological phenomena believed to be signals of deception. Some of them, such as facial thermography, may have practical advantages over the polygraph because they may be quicker, easier, or less invasive. Members of this class of measures that have any of these advantages may be promising alternatives to the polygraph that may be worthy of further investigation. They may have only limited value as supplements, however, if in fact they are measuring the same underlying phenomena. If so, their only potential value as supplements would be to help correct for error in polygraph-based estimates of those phenomena.
Measurements of Brain Function
Functional brain imaging techniques have important advantages over the polygraph, in theory, because they examine directly what the brain is doing. However, they are far from providing a practical alternative or supplement to the polygraph. Part of the limitation is theoretical. Not enough is yet known about the specific cognitive or emotional processes that accompany deception, about their localization in the brain, or about whether imaging signals can differentiate the brain activity associated with these processes from brain activity associated with other processes to make an assessment of the potential validity of these techniques on the grounds of the basic science. Further research with fMRI, coupled with a scientifically based cognitive psychological approach to deception, will be needed to determine if these issues can be addressed. Such research is likely to identify some signals of deception and localize some relevant processes, but not enough is known yet to guess whether the signals will be specific to deception. Functional imaging might also be used in efforts to identify brain signatures of mental activities that might be used as countermeasures to the psychophysiological detection of deception. If a research effort is undertaken to find improved scientific techniques for the detection of deception, basic research on brain imaging would be a top candidate for the research agenda.
There are also major practical problems at present with using brain imaging techniques for the psychophysiological detection of deception. The most likely technique to be used, fMRI, is both time consuming and expensive to perform. A typical research study with fMRI presently takes 2 to 3 hours to perform and many hours thereafter to analyze. Furthermore, almost all research to date has focused on results averaged over groups of individuals. While such an averaging approach is important
for understanding basic brain processes, it is antithetical to the use of imaging for detecting deception in individuals. Some recent fMRI studies on individual differences do suggest the possibility of a future role for brain imaging in detecting deception, but much additional research must be done to move that prospect beyond mere possibility.
Measurement of event-related potentials has shown some promise as a way to assess orienting responses that are believed to signal the presentation of material that is familiar to the examinee. If this theory is accurate, they would be appropriate for lie detection in settings when questions can be asked about concealed information. The mechanisms linking deception to event-related potentials have not been clearly elucidated. In fact, it will be difficult to establish the mechanisms because measurement of the potentials is too diffuse to localize the underlying brain activity. Nevertheless, the basis for the orienting response is plausible and the very limited data on accuracy suggest a level similar to that of the polygraph. It seems plausible that event-related potentials tap different underlying phenomena than the polygraph measures, so that combining the two techniques might provide some added validity. This possibility is worth investigating. Some believe that event-related potentials are less vulnerable to countermeasures than the polygraph, which, if true, would make them useful as a substitute for the polygraph when questions about concealed information can be asked. The basic science, however, is unclear on whether or not people can learn to manipulate event-related potentials. There are as yet no empirical data on countermeasures and event-related potentials. In sum, the limited available knowledge justifies further research investigation of measurement of event-related potentials as an alternative or supplement to the polygraph.
Detection of Deception from Demeanor
Although there is considerable research on cues to deception in demeanor, there is relatively little on any one cue and much less on finding combinations of cues that might accurately discriminate lying from truth-telling. Most of the research on deception and demeanor has not been seriously applied to criminal or security investigation contexts. The evidence indicates that the right measure or measures might achieve a useful level of accuracy in those contexts, even though some techniques on the market, such as voice stress analysis, have not demonstrated such accuracy. It is unclear whether accurate demeanor measures would provide information different from the polygraph in terms of the underlying processes assessed: the theory of demeanor indicators is not well enough developed to judge.
Valid demeanor measures would have a significant practical advantage over the polygraph because tests could be conducted noninvasively and even without the examinee’s knowledge. We note but do not judge the significant ethical and legal issues raised by this practical advantage. There is also the potential that interrogators might be taught to improve their skills by becoming more sensitive to demeanor indicators. In our judgment, any systematic effort to improve techniques for detecting deception should include attention to measures of demeanor.
Available evidence does not suggest that any direct investigation method is likely to provide a reasonable and valid alternative to the polygraph. The evidence does suggest ways to improve these techniques. Studies assessing whether they provide incremental accuracy over the polygraph, or whether the polygraph provides incremental accuracy over direct investigation, have not been done.
Need for Evaluation
Our conclusions about specific potential alternatives or supplements to the polygraph are all tentative and made with limited confidence because of the limited base these techniques now have in either basic science or empirical criterion validation. We have much greater confidence in concluding that security and law enforcement agencies need to improve their capability to independently evaluate claims proffered by advocates of new techniques for detecting deception. The history of the polygraph makes clear that such agencies typically let clinical judgment outweigh scientific evidence in their assessment of the validity of techniques for the psychophysiological detection of deception or the detection of deception from demeanor. Although it is probable that belief in a technique can go a long way in making it useful for deterrence and for eliciting admissions, overconfidence does not in the long run contribute positively to national security or law enforcement goals. Agencies that use such techniques should support independent scientific evaluation so that they can be fully informed when making decisions on whether and how to use the techniques and on how to use the test results they produce. We return to this issue in Chapter 8.