The committee was tasked with (1) critically assessing the existing body of scientific research on eyewitness identification; (2) identifying gaps in the literature; and (3) suggesting other research that would further the understanding of eyewitness identification and improve law enforcement and courtroom practice. Eyewitness identification research resides in both the scientific literature and the law and justice-related scholarly literature. Although experiential, anecdotal, and some administrative records from law enforcement and the judiciary could contribute to a better understanding of eyewitness identification, the committee did not comprehensively review this more qualitative material. The committee did, however, examine select examples of law enforcement policies and influential judicial rulings.
In late 2013, the committee compiled an extensive and comprehensive bibliography from the following nine electronic databases, with the search limited to publications over the past two decades (i.e., since 1993): Academic Search Premier (EBSCO), Embase (Elsevier), MEDLINE (National Library of Medicine), NCJRS Abstracts Database (U.S. Department of Justice), PsycINFO (American Psychological Association), PubMed (National Institutes of Health), Scopus (Elsevier), Web of Science (Thomson Reuters), and LexisNexis.1 Papers were drawn from such fields as social science, cognitive science, behavioral science, neuroscience, criminology, and law
1The law review literature was represented by the citations from the LexisNexis search. While all these materials were not reviewed in detail, several of the documents informed Chapter 3 of this report (The Legal Framework for Assessment of Eyewitness Identification).
using Boolean-logic-based search strategies designed to identify empirical research reports, review articles, systematic reviews, meta-analyses, and articles in law reviews and legal journals.
The committee concentrated its review on the subset of the bibliography deemed most important to its task, focusing more on the scientific literature than on the law review literature. These materials included meta-analyses and systematic reviews and primary research in neuroscience, statistics, and eyewitness identification. This report also was informed by several early foundational papers and written comments from, and presentations to, the committee by representatives from science, law enforcement, state courts and government, private organizations, and other interested parties. The comments and presentations revealed additional highly relevant new findings, some recently published or in press and others in submission. The agenda for each committee meeting is available in Appendix B. All materials submitted to the committee are retained in the Academies’ public access file and are available upon request.
Many factors affect eyewitness accuracy. Some factors are related to protocols within the law enforcement and legal systems, while others are related to characteristics associated with the crime scene, perpetrator, and witness.
System variables are those that the criminal justice system can influence through the enforcement of standards and through education and training of law enforcement personnel in the use of best practices2 and procedures (e.g., by specifying the content and nature of instructions given to witnesses prior to a lineup identification). Estimator variables include factors operating either at the time of the criminal event (relating to visual experience or memory encoding) or during the retention interval (the time between witnessing an event and the identification process). Specific examples include the eyewitness’ level of stress or trauma at the time of the incident, the light level and nature of the visual conditions that affect visibility and clarity of a perpetrator’s features, similarity of age and race of the witness and perpetrator, presence or absence of a weapon during the incident, and the physical distance separating the witness from the perpetrator.
A scientific consensus about the effects of some factors has emerged, but no such consensus exists for many other factors. One method of assessing scientific consensus is by surveys of experts. A 2001 survey collected
2As noted in Chapter 1, for the purposes of this report, the committee characterizes best practice as the adoption of standardized procedures based on scientific principles. The committee does not make any endorsement of practices designated as best practices by other bodies.
responses from 64 psychologists about their courtroom experiences and their opinions on 30 eyewitness-related phenomena to determine the “general acceptance” of these phenomena within the eyewitness identification research community.3 General acceptance is relevant to whether scientific testimony is admissible as evidence in court (see Chapter 3). The survey revealed substantial agreement about which findings these experts felt were sufficiently reliable to present in court.4
The committee examined the scientific literature on eyewitness identification, focusing first on quantitative syntheses, largely systematic reviews and meta-analyses, which were identified in a comprehensive search of electronic databases designed to locate research on both estimator and system variables. In addition, primary research studies were identified in this database search, many of which were also highlighted in the relevant systematic reviews and meta-analyses. Finally, some researchers forwarded manuscripts to the committee that have been submitted for peer-review or are in press. In their examination of this body of literature, the committee examined the quality of the identified research and, where possible, worked to derive summary empirical generalizations related to variables of interest.
Quantitative Syntheses of Eyewitness Identification Research
The committee first evaluated the consistency of research findings across studies for system and estimator variables by studying published quantitative reviews of empirical research. Systematic reviews, which collect and appraise available research on specific hypotheses or research questions, are efforts to synthesize the effects of variables across studies. Within systematic reviews, meta-analysis is often, but not always, used to compute the effects of variables as well as to identify factors that explain differences across studies. When assumptions about consistency of data collected across studies are met, meta-analysis provides a quantitative summary of empirical findings by statistically averaging effect sizes across individual studies, thereby increasing the precision of the effect size estimate as well
3S. M. Kassin et al., “On the ‘General Acceptance’ of Eyewitness Testimony Research: A New Survey of the Experts,” American Psychologist 56(5): 405–416 (2001).
4Kassin et al. also compared the reliability assessments of the 2001 survey to assessments from a similar 1989 survey and noted that, for the 17 propositions retested, there was a remarkable degree of consistency: “most experts saw as sufficiently reliable expert testimony on the wording of questions,” lineup instructions, attitudes and expectations, the accuracy-confidence correlation, the forgetting curve, exposure time, and unconscious transference. “There was less, if any, consensus on the effects of color perception in monochromatic light,” “observer training, high levels of stress, the accuracy of hypnotically refreshed testimony, and event violence.” The authors observed that two phenomena were seen as significantly more reliable than had been the case when the initial survey was conducted: weapon focus effect and hypnotic suggestibility effects. See p. 410.
as the statistical power to detect effects. Done well, systematic reviews with or without meta-analysis provide evidence for practice and policy for such fields as health care,5 crime and justice, social welfare, and education.6 The utility of systematic reviews for informing practice and policy is predicated on the included studies being transparently reported, conducted so as to minimize risk of bias, and representing as complete a sample as possible of research conducted on the central question, including both published and unpublished studies. In turn, systematic reviews should specify inclusion criteria and data extraction procedures a priori, use independent and duplicate procedures for study selection and data extraction, rigorously evaluate potential biases in included studies, and interpret results of meta-analyses in terms that are useful to decision-makers. Further, meta-analyses should not be conducted outside the context of systematic reviews. In short, both systematic reviews and the studies they include need to be transparent and reproducible in order to best inform practice and policy decisions about eyewitness identification.
The committee examined quantitative reviews that covered decades of research on both estimator variables (exposure duration,7 retention interval,8 stress,9 weapon focus,10 own-race bias,11 and own-age bias12) and system variables (identification test medium, i.e., live lineup versus photo array,13
7B. H. Bornstein et al., “Effects of Exposure Time and Cognitive Operations on Facial Identification Accuracy: A Meta-Analysis of Two Variables Associated with Initial Memory Strength,” Psychology, Crime and Law 18(5): 473–490 (2012).
8K. A. Deffenbacher et al., “Forgetting the Once-Seen Face: Estimating the Strength of an Eyewitness’s Memory Representation,” Journal of Experimental Psychology: Applied 14(2): 139–150 (2008).
9K. A. Deffenbacher et al., “A Meta-Analytic Review of the Effects of High Stress on Eyewitness Memory,” Law and Human Behavior 28(6): 687–706 (2004).
10J. M. Fawcett et al., “Of Guns and Geese: A Meta-Analytic Review of the ‘Weapon Focus’ Literature,” Psychology, Crime and Law 19(1): 35–66 (2013).
11C. A. Meissner and J. C. Brigham, “Thirty Years of Investigating the Own-Race Bias in Memory for Faces—A Meta-Analytic Review,” Psychology, Public Policy, and Law 7(1): 3–35 (2001).
12M. G. Rhodes and J. S. Anastasi, “The Own-Age Bias in Face Recognition: A Meta-Analytic and Theoretical Review,” Psychological Bulletin 138(1): 146–174 (2012).
13B. L. Cutler et al., “Conceptual, Practical, and Empirical Issues Associated with Eyewitness Identification Test Media,” in Adult Eyewitness Testimony: Current Trends and Developments, ed. D. F. Ross (New York: Press Syndicate of the University of Cambridge, 1994), 163–181.
biased and unbiased lineup instructions,14 post-identification feedback,15 simultaneous versus sequential lineup presentation,16 target absent versus target present lineups,17 foil similarity,18 blinding,19 showup versus lineup,20 prior mug shot exposure,21 verbal description and identification,22 and the cognitive interview23). Many of these quantitative reviews were published recently, with more than one-third published since 2010. However, none of the reviews met all current standards for conducting and reporting sys-
14S. E. Clark, “A Re-Examination of the Effects of Biased Lineup Instructions in Eyewitness Identification,” Law and Human Behavior 29(4): 395–424 (2005). S. E. Clark, “Costs and Benefits of Eyewitness Identification Reform: Psychological Science and Public Policy,” Perspectives on Psychological Science 7(3): 238–259 (2012). N. K. Steblay, “Social Influence in Eyewitness Recall: A Meta-Analytic Review of Lineup Instruction Effects,” Law and Human Behavior 21(3): 283–297 (1997). N. K. Steblay, G. L. Wells, and A. B. Douglass, “The Eyewitness Post Identification Feedback Effect 15 Years Later: Theoretical and Policy Implications,” Psychology, Public Policy, and Law 20(1): 1–18 (2014).
15S. E. Clark and R. D. Godfrey, “Eyewitness Identification Evidence and Innocence Risk,” Psychonomic Bulletin and Review 16(1): 22–42 (2009). A. B. Douglass and N. K. Steblay, “Memory Distortion in Eyewitnesses: A Meta-Analysis of the Post-Identification Feedback Effect,” Applied Cognitive Psychology 20(7): 859–869 (2006).
16Clark, “Costs and Benefits of Eyewitness Identification Reform.” S. E. Clark, R. T. Howell, and S. L. Davey, “Regularities in Eyewitness Identification,” Law and Human Behavior 32(3): 187–218 (2008). N. K. Steblay et al., “Eyewitness Accuracy Rates In Sequential and Simultaneous Lineup Presentations: A Meta-Analytic Comparison,” Law and Human Behavior 25(5): 459–473 (2001). N. K. Steblay et al., “Seventy-two Tests of the Sequential Lineup Superiority Effect: A Meta-Analysis and Policy Discussion,” Psychology, Public Policy, and Law 17(1): 99–139 (2011).
17Clark, “A Re-Examination of the Effects of Biased Lineup Instructions in Eyewitness Identification.” Clark, Howell, and Davey, “Regularities in Eyewitness Identification.” Clark and Godfrey, “Eyewitness Identification Evidence and Innocence Risk.”
18Clark, “Costs and Benefits of Eyewitness Identification Reform.” Clark and Godfrey, “Eyewitness Identification Evidence and Innocence Risk.” Clark, Howell, and Davey, “Regularities in Eyewitness Identification.” R. J. Fitzgerald et al., “The Effect of Suspect-Filler Similarity on Eyewitness Identification Decisions: A Meta-Analysis,” Psychology, Public Policy, and Law 19(2): 151–164 (2013). S. L. Sporer et al., “Choosing, Confidence, and Accuracy: A Meta-Analysis of the Confidence-Accuracy Relation in Eyewitness Identification Studies,” Psychological Bulletin 118(3): 315–327 (1995).
19Clark, “Costs and Benefits of Eyewitness Identification Reform.”
20Clark, “Costs and Benefits of Eyewitness Identification Reform.” N. K. Steblay et al., “Eyewitness Accuracy Rates in Police Showup and Lineup Presentations: A Meta-Analytic Comparison,” Law and Human Behavior 27(5): 523–540 (2003).
21K. A. Deffenbacher et al., “Mugshot Exposure Effects: Retroactive Interference, Mugshot Commitment, Source Confusion, and Unconscious Transference,” Law and Human Behavior 30(3): 287–307 (2006).
22C. A. Meissner, S. L Sporer, and K. J. Susa, “A Theoretical Review and Meta-Analysis of the Description-Identification Relationship in Memory for Faces,” European Journal of Cognitive Psychology 20(3): 414–455 (2008).
23A. Memon et al., “The Cognitive Interview: A Meta-Analytic Review and Study Space Analysis of the Past 25 Years,” Psychology, Public Policy, and Law 16(4): 340–372 (2010).
tematic reviews,24 and few met even a majority of these standards, making assessment of the credibility of their findings problematic.
After examining the reviews, the committee concluded that the findings may be subject to unintended biases and that the conclusions are less credible than was hoped. In many cases, the data from the studies cited were not readily available or were not clearly presented. Nevertheless, these reviews were helpful in highlighting some of the issues associated with specific research questions and in identifying primary studies that might be both credible and important.
RESEARCH STUDIES ON SYSTEM VARIABLES
After its assessment of the systematic reviews and meta-analytic studies, the committee’s review focused on the most-studied system variables. Key system variables, such as lineup procedures (e.g., simultaneous vs. sequential lineups, blinded vs. non-blinded lineup administration) and the collection/use of witness confidence statements, can have a marked influence over the validity of eyewitness identifications. In the following section, one of the most important practical issues raised by this influence is addressed: What is the best way to evaluate the effects of system variables on the diagnostic accuracy of eyewitness reports, and how might we use the results of such an evaluation to optimize the states of key system variables and thus maximize performance of an eyewitness? This question is, in principle, relevant to all system variables, but we address it first in the timely and controversial context of simultaneous versus sequential lineup presentations and in the role of eyewitness confidence judgments in evaluation of identification performance. This examination of lineup procedures and confidence reports is followed by a brief discussion of the effects on eyewitness performance of another important system variable: the extent and content of communications between the witness and the larger community (law enforcement, legal defense, the press, family and friends, etc.).
Evaluating Eyewitness Performance
Perhaps the most important empirical question that can be asked about eyewitness identification is: How well do witnesses perform as a function of different system and estimator variables? For example, do factors such as the structure of a lineup, stress, or weapon focus affect the ability of
24See, e.g., Institute of Medicine, Finding What Works in Health Care: Standards for Systematic Reviews (Washington, DC: The National Academies Press, 2011) and B. J. Shea et al., Development of AMSTAR: A Measurement Tool to Assess the Methodological Quality of Systematic Reviews, BMC Medical Research Methodology 2007, 7:10 doi:10.1186/1471-2288-7-10.
a witness to provide reliable information? If so, what practices will yield the best performance? The issues are multifaceted, and the answers likely depend upon many factors. Given the complexity of these issues, the experimental literature to date has focused largely on one of the more tractable problems: How do different lineup identification procedures affect witness identifications? The committee will use this focus (and its eminent practical relevance) to illustrate how one might go about evaluating eyewitness performance generally.
Most lineup identification procedures take one of two forms: simultaneous or sequential. In a simultaneous procedure, the witness views all individuals in the lineup at the same time and either identifies one (or more) as the perpetrator or reports that the person she or he saw at the crime scene was not in the lineup. In a sequential procedure, the witness views individuals one at a time and reports whether or not each one is the person from the crime scene. Rigorous evaluation of eyewitness identification performance as a function of these two procedures requires a formal understanding of the task that the witness confronts, and it requires criteria for assessing the outcome.
The task of a witness viewing a lineup is an example of what is known as a binary classification problem.25 Each eyewitness faces two possible (binary) states associated with each person in the lineup (guilt or innocence), and the witness must assign each person to one of two classes (guilty or innocent). For each decision, the witness can be correct or incorrect, yielding four possible outcomes: a correct classification as guilty (“hit”), an incorrect classification as guilty (“false alarm”), a correct classification as innocent (“correct rejection”), and an incorrect classification as innocent (“miss”). These outcomes are commonly presented in a contingency table26 (see Figure 5-1), and the frequencies in each part of that table are the raw data used to evaluate performance on a binary classification task, such as eyewitness identification.27
There are many different performance measures that can be derived from data of this sort—indeed, the fields of statistical classification and machine learning are replete with tools for the evaluation of binary classifiers.28
25The binary classifier in this context is defined as the witness operating under a specific set of conditions, such as lineup procedures.
26Also termed “confusion matrix.”
27The prevalence or “base-rate”—the fraction of individuals in each category (guilty or innocent, in the eyewitness problem) in the population is also a factor that may come into play when evaluating binary classification performance.
28See, e.g., T. Hastie, R. Tibshirani, and J. H. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction (New York: Springer, 2009) and A. Smola and S. V. N. Vishwanathan, Introduction to Machine Learning (Cambridge: Cambridge University Press, 2008).
FIGURE 5-1 Contingency table for possible eyewitness identification outcomes.
SOURCE: Courtesy of Thomas D. Albright.
The preferred measure will depend to a large degree upon the criteria one adopts for performance evaluation.
Perhaps the simplest measure of binary classification performance is the ratio of hit rates (HR) to false alarm rates (FAR), i.e., HR/FAR.29 The magnitude of this measure, which is known in the eyewitness identification literature as the “diagnosticity ratio,” is proportional to the likelihood that a classification is correct, i.e., that the person identified as guilty is actually guilty.30 The diagnosticity ratio is appealing if the most critical criterion is avoiding erroneous identifications.
29The “rate” associated with each cell of the contingency table is computed as the number of counts within that cell (e.g., number of people correctly classified as guilty) divided by the number of instances that are truly in that class (e.g., total number of guilty people being classified). Thus, hit rates (HR) = number of hits / (number of hits+number of misses), and false alarm rate (FAR) = number of false alarms / (number of false alarms+number of correct rejections).
30The “diagnosticity ratio” is also known in other disciplines by other names; e.g., “positive likelihood ratio” or “LR+ = Likelihood Ratio of a Positive Call;” see Peter Lee, Bayesian Statistics: An Introduction (Chichester: Wiley, 2012), Sec 4.1.
Not surprisingly, the diagnosticity ratio was adopted in pioneering efforts to identify lineup conditions that would yield better witness identification performance.31 Most laboratory-based studies and meta-analyses of the effects of lineup procedures on eyewitness identification performance show that, with standard lineup instructions informing the witness that the perpetrator may or may not be present, the sequential procedure produces a higher diagnosticity ratio.32 That is, when considering only those cases in which a witness actually selects someone from a lineup, the ratio of correct to false identifications is commonly higher with the sequential than with the simultaneous procedure.33
A higher diagnosticity ratio could result from a higher hit rate, a lower false alarm rate, or some combination of the two. Some early reports suggested that sequential procedures (relative to simultaneous) lead to fewer false alarms without changing the frequency of hits, which would result in a higher diagnosticity ratio.34 More recent laboratory-based studies and meta-analyses typically show that sequential procedures (relative to simultaneous) are associated with a somewhat reduced hit rate accompanied by a larger reduction in the false alarm rate, thereby resulting in diagnosticity ratios higher than those yielded by simultaneous procedures.35 In other
31R. C. L. Lindsay and G. L. Wells, “Improving Eyewitness Identifications from Lineups: Simultaneous Versus Sequential Lineup Presentation,” Journal of Applied Psychology 70(3), 556–564 (1985).
32Steblay et al. “Eyewitness Accuracy Rates in Sequential and Simultaneous Lineup Presentations.” Steblay, et al., “Seventy-two Tests of the Sequential Lineup Superiority Effect.” S. D. Gronlund et al., “Robustness of the Sequential Lineup Advantage,” Journal of Experimental Psychology: Applied 15(2): 140–152 (2009). S. D. Gronlund, J. T. Wixted, and L. Mickes, “Evaluating Eyewitness Identification Procedures Using ROC Analysis,” Current Directions in Psychological Science 23(1): 3–10 (2014).
33But see C. A. Carlson, S. D. Gronlund, and S. E. Clark, “Lineup Composition, Suspect Position, and the Sequential Lineup Advantage,” Journal of Experimental Psychology-Applied 14(2): 118-128 (2008), for a counterexample. Also, Clark, Moreland, and Gronlund have demonstrated that the accuracy advantage of sequential lineups as measured by diagnosticity ratios has decreased over time since the original report. Reanalysis of diagnosticity data for sequential studies showed slight, non-significant decreases in correct identification effects and increases in false identification effects, which together combine to produce a significant decrease in the advantage of sequential over simultaneous lineup methods. See S. E. Clark, M. B. Moreland, and S. D. Gronlund, “Evolution of the Empirical and Theoretical Foundations of Eyewitness Identification Reform,” Psychonomic Bulletin and Review 21(2): 251–267 (2014).
34R. C. L. Lindsay, “Applying Applied Research: Selling the Sequential Lineup,” Applied Cognitive Psychology 13(3): 219–225 (1999). G. L. Wells, S. M. Rydell, and E. P. Seelau, “The Selection of Distractors for Eyewitness Lineups,” Journal of Applied Psychology 78(5): 835–844 (1993).
35A recent field-based study comparing sequential to simultaneous procedures in a limited number of jurisdictions computed the diagnosticity ratio using filler identifications as the false alarm rate (because the innocence or guilt of the suspect is unknown in such situations). See G. L. Wells, N. K. Steblay, J. E. Dysart, “Double-Blind Photo-Lineups Using Actual
words, when using a single diagnosticity ratio as a measure of eyewitness performance, the sequential procedure (relative to simultaneous) comes closer to satisfying the popular criterion that those identified as guilty are actually guilty. In light of these findings, many policy makers have advocated sequential procedures, and those procedures have been adopted by law enforcement in many jurisdictions.
While policy decisions and practice have been influenced by the aforementioned studies, there are other criteria worthy of consideration when evaluating eyewitness performance. One alternative is revealed by asking why the diagnosticity ratio changes across lineup conditions. This question can be addressed given a plausible model of the mechanisms underlying human recognition memory. Most models of recognition memory are based on the idea that a cue (e.g., a face in a lineup) results in the retrieval of information stored in memory (see Chapter 4). When the retrieved information provides enough evidence to satisfy the observer, they make an identification—that is, they decide that the stimulus is “recognized.” Explicit in this model are two important parameters: the observer’s memory sensitivity (that is, the “discriminability” between the strength of memory evidence elicited by a previously encountered stimulus and that elicited by novel stimuli), and the degree of evidence that the observer requires to make an identification (“response criterion” or “bias”) (see Box 5-1).
The first of these two parameters—discriminability—is important for evaluating eyewitness performance. It tells whether a difference in performance under different task conditions reflects a true improvement in memory-based discrimination, i.e., an improvement in the strength of the observer’s retrieved memory evidence of the perpetrator.
The fact that these two measures (the likelihood that an identified person is guilty vs. discriminability) do not assess the same thing is counterintuitive—a fact that has generated controversy in the field of eyewitness
Eyewitnesses: An Experimental Test of a Sequential versus Simultaneous Lineup Procedure,” Law and Human Behavior, 15 June 2014, doi: 10.1037/lhb0000096. When computed in this manner, the data revealed a modest diagnosticity ratio advantage for the sequential procedure. However, Amendola and Wixted re-analyzed a subset of the data for which proxy measures of ground truth were available [K. Amendola and J. T. Wixted, “Comparing the Diagnostic Accuracy of Suspect Identifications Made by Actual Eyewitnesses from Simultaneous and Sequential Lineups,” accepted by Journal of Experimental Criminology (2014)]. Their analyses suggested that identification of innocent suspects is less likely and identification of guilty suspects is more likely when using the simultaneous procedures. While future field studies are needed, these latter findings raise the possibility that diagnosticity is higher for the simultaneous procedure. See also Clark, Moreland, and Gronlund, who report that published diagnosticity ratios have changed over time, reflecting a significant decrease in the advantage of sequential over simultaneous lineup procedures. (Clark, Moreland, and Gronlund, “Evolution of the Empirical and Theoretical Foundations of Eyewitness Identification Reform.”)
All human decisions about the classification of objects based on memory—including a witness’ classifications of guilt or innocence for faces in a lineup, an individual’s decision as to whether a piece of luggage is his or her own, a botanist’s recognition of a specific type of fern, a radiologist’s detection of a tumor in a mammogram, or the determination of the sex of a newly-hatched chicken—can be distilled down to the influence of two factors that are rooted in causal models of recognition memory:the degree to which the relevant objects are discriminable by the decider (the decider’s sensitivity to the difference between them), and the decider’s criterion for making a decision (response bias, or the decider’s degree of specificity in making choices).a There are, of course, many other variables that will affect the outcome (e.g., levels of stress, attentional focus, potential rewards or expectations), but all of these are believed to exert their influence over memory-based classification decisions by affecting discriminability and/or response bias.
To illustrate the distinction between discrimination and response bias as applied to a real-world decision problem, consider how an audiologist conducts a hearing test. In a hearing test, an individual might be asked to detect sounds along a continuum of loudness and to indicate when a sound is present. The audiologist wants to know how well someone can discriminate presence versus absence of a sound, but that assessment is complicated by the criterion people use when deciding to say that they heard a sound (response bias). Some people are hesitant to respond positively, saying “I hear it” only when they are absolutely certain (“conservative” responders). Others are more willing to respond positively, saying “I hear it” with less information and greater uncertainty (“liberal” responders). Those with a conservative bias are less likely to report hearing a sound in general, so they will have both fewer correct detections (“hits”) and fewer overt mistakes (“false alarms”). By contrast, those with a liberal bias are more likely to say that they heard a sound, so they will have more hits but also more false alarms. Importantly, this can occur even if the conservative and liberal responders do not differ in their ability to discriminate the presence or absence of sound.
aSee, e.g., W. P. Banks, “Signal Detection Theory and Human Memory,” Psychological Bulletin 74(2): 81–99 (1970); J. P. Egan, Recognition Memory and the Operating Characteristic (Bloomington: Indiana University Hearing and Communication Laboratory, 1958); D. M. Green and J. A. Swets, Signal Detection Theory and Psychophysics (New York: Wiley,1966).
identification research.36 Intuitively, if sequential lineups yield a higher likelihood that an identified person is guilty (as quantified by a higher diagnosticity ratio), then it seems as if that procedure yields objectively better performance. The problem with this intuition is that it fails to take into account the second of the two parameters of recognition memory models—the response bias or degree of evidence that the observer finds acceptable to make an identification. This parameter, which is distinct from discriminability, reflects the witness’ tendency to pick or not to pick someone from the lineup. If a witness sets a high bar for acceptable evidence—a conservative bias—then he or she will be unlikely to select anyone from the lineup (low pick frequency), meaning that they will have more misses (will be more likely to fail to select the suspect because they are less likely to make a selection at all) and fewer false alarms.
Conversely, if a witness sets a low bar for acceptable evidence—a liberal bias—then she or he will be more likely to make a selection from the lineup (a high pick frequency), meaning he or she will have more hits and will make more false identifications. Differences in pick frequency can, and generally do, lead to differences in the ratio of hit rates to false alarm rates; all else being equal, the diagnosticity ratio will be higher for a conservative bias than for a liberal bias.37 In other words, simply by inducing a witness to adopt a more conservative bias, it is possible to increase the likelihood that an identified person is actually guilty. Importantly, this may be true even if the procedure yields no better, or potentially worse, discriminability.38
Despite its merits, a single diagnosticy ratio thus conflates the influences of discriminability and response bias on binary classification, which muddies the determination of which procedure, if any, yields objectively better discriminability in eyewitness performance. To overcome this problem, some investigators have recently adopted a technique from signal detection
36 See, e.g., J. T. Wixted and L. Mickes, “The Field of Eyewitness Memory Should Abandon Probative Value and Embrace Receiver Operating Characteristic Analysis,” Perspectives on Psychological Science 7(3): 275-278 (2012); Clark, “Costs and Benefits of Eyewitness Identification Reform”; G. L. Wells, “Eyewitness Identification Probative Value, Criterion Shifts, and Policy Regarding the Sequential Lineup,” Current Directions in Psychological Science 23(1): 11–16 (2014); and Steblay, et al. “Seventy-two Tests of the Sequential Lineup Superiority Effect.”
37The sole exception to this rule is the case in which classifications are made at chance level of performance, i.e., when the observer exhibits no ability to discriminate.
38L. Mickes, H. D. Flowe, and J. T. Wixted, “Receiver Operating Characteristic Analysis of Eyewitness Memory: Comparing the Diagnostic Accuracy of Simultaneous vs. Sequential Lineups,” Journal of Experimental Psychology: Applied 18 (4): 361–376 (2012). C. A. Meissner et al., “Eyewitness Decisions In Simultaneous and Sequential Lineups: A Dual Process Signal Detection Theory Analysis,” Memory and Cognition 33(5): 783–792 (2005). M. A. Palmer and N. Brewer, “Sequential Lineup Presentation Promotes Less-Biased Criterion Setting but Does Not Improve Discriminability,” Law and Human Behavior 36(3): 247–255 (2012).
theory, which distinguishes the relative influences of discriminability and bias on binary classification.39 This technique involves analysis of Receiver Operating Characteristics (see Box 5-2). ROC analysis has been used extensively in multiple contexts of human decision-making, notably in basic research on visual perception and memory and applied studies of medical diagnostic procedures.40 In essence, ROC analysis examines diagnosticity ratios integrated over different response biases. This approach to eyewitness research has been promoted based on the claim that it can enable lineup procedures to be evaluated by their effect on discrimination, separate from response bias, and—importantly—because the dimensions of analysis (discriminability and response bias) correspond to the mechanistic parameters of causal models of human recognition memory.
Use of ROC analysis to evaluate eyewitness performance requires calculating the diagnosticity ratio for different response bias conditions (see Box 5-2). Using expressed confidence level (ECL) as a proxy for response bias (see below), a small set of recent studies using ROC analysis has reported that discriminability (area under the ROC curve) for simultaneous lineups is as high, or higher, than that for sequential lineups.41 In other words, when eyewitness identification performance is evaluated based on a criterion of bias-free discriminability, the results differ from those based on a single diagnosticity ratio, and they do so because the latter fails to account for response bias.
Looking broadly at the many empirical studies that have used a single diagnosticity ratio to evaluate eyewitness performance, as well as the more recent findings using ROC analysis, it appears that the practical advantage of one lineup procedure over another depends to a large degree upon the performance criterion that one adopts. From the perspective of many, the ideal lineup procedure would elicit a conservative bias (thus reducing false identifications) and high discriminability (that is, optimizing memory sensitivity). If there exists no discriminability advantage for one lineup
39D. M. Green and J. A. Swets, Signal Detection Theory and Psychophysics (New York: Wiley, 1966); D. McNicol, A Primer of Signal Detection Theory (London: George Allen and Unwin, 1972).
40J. A. Swets, “ROC Analysis Applied to the Evaluation of Medical Imaging Techniques,” Investigative Radiology 14(2): 109–121 (1979).
41 Mickes, Flowe, and Wixted, “Receiver Operating Characteristic Analysis of Eyewitness Memory.” C. A. Carlson and M. A. Carlson, “An Evaluation of Lineup Presentation, Weapon Presence, and a Distinctive Feature Using ROC Analysis,” Journal of Applied Research in Memory and Cognition 3(2): 45–53 (2014). D. G. Dobolyi and C. S. Dodson, “Eyewitness Confidence in Simultaneous and Sequential Lineups: A Criterion Shift Account for Sequential Mistaken Identification Overconfidence,” Journal of Experimental Psychology: Applied 19 (4): 345–357 (2013). S. D. Gronlund et al., “Showups Versus Lineups: An Evaluation Using ROC Analysis,” Journal of Applied Research in Memory and Cognition 1(4): 221–228 (2012).
Binary classification decisions by human observers are affected by both discriminability (the observer’s sensitivity to the difference between target and non-targets) and response bias (the observer’s degree of specificity in making a response). Analysis of Receiver Operating Characteristics (ROCs) is a method from signal detection theory that enables one to distinguish the relative influences of discriminability and response bias on binary classification decisions. ROC analysis is performed by plotting the frequency of decisions that are hits (correctly detecting a target) versus the frequency of decisions that are false alarms (incorrectly classifying a non-target as a target).
The positive diagonal in an ROC plot (see figure next page) corresponds to response bias, moving from high specificity at the lower left corner [no detection of targets (hit rate = 0) and no incorrect attribution of non-targets as targets (false alarm rate = 0)], to low specificity at the upper right corner [all targets detected (hit rate = 1.0) and all non-targets attributed as targets (false alarm rate = 1.0)]. Because all points along this positive diagonal reflect equal ratios of hits to false alarms, they vary in response bias (i.e., the frequency of lineup picks, or “pick frequency”), but they do not manifest differences in discriminability. The negative diagonal in an ROC plot corresponds, by contrast, to discriminability, moving from chance discriminability at the intersection with the positive diagonal, where hits and false alarms are equally likely, to the highest discriminability in the upper left corner, where all targets are detected (hit rate = 1.0), but no non-targets are attributed as targets (false alarm rate = 0).
To see how measured hit and false alarm rates vary over different conditions of discriminability and response bias in laboratory experiments, one can manipulate or estimate these conditions and record a diagnosticity ratio (HR/FAR) for each condition. The typical result is a set of diagnosticity ratios that, when plotted in the ROC space (represented by the dots in the figure at right), form a curve spanning from lower left to upper right. The extent to which that curve deviates (bows above and away) from the positive diagonal is a quantitative measure of discriminability (assessed as the area under the curve) for which response bias has been factored out.
ROC analysis has been used extensively in basic and applied research on recognition memory. In these experiments, response bias is sometimes manipulated explicitly by encouraging observers to be more or less selective in
their responses. Frequently, however, “expressed confidence level” (ECL)—the confidence that an observer holds in his or her classification—is used as a proxy for response bias, based on the assumption that more confident observers are likely to be more specific (conservative) in their responses, whereas less confident observers are likely to be less specific (liberal) in their responses.
Receiver Operating Characteristic (ROC) curve.
SOURCE: Courtesy of Thomas D. Albright.
procedure over another,42 then eyewitness performance may benefit from any procedure (such as sequential) that elicits a more conservative response bias.43 But one can only make that judgment after having applied an empirical test to determine whether a procedure offers a discriminability advantage. Future research might explore the possibility that other methods of inducing a conservative response bias (such as verbal instructions to the witness to be cautious in making an identification) might be combined with procedures that improve discriminability in order to optimize eyewitness identification performance.
Perhaps the greatest practical benefit of recent debate over the utility of different lineup procedures is that it has opened the door to a broader consideration of methods for evaluating and enhancing eyewitness identification performance. ROC analysis is a positive and promising step with numerous advantages. For example, the area under the ROC curve is a single-number index of discriminability. Moreover, this index reflects a parameter-free approach to binary classification performance; the outcome is entirely data-dependent and thus identical across all users drawing from
42The committee notes that some of the few recent reports using ROC analysis indeed claim improved discriminability for simultaneous lineup conditions, but the reported discriminability improvements are small.
43In reality, a more conservative bias may not always be beneficial, and whether it is or not depends upon a number of factors that have an impact distinct from diagnostic accuracy and are difficult to quantify. All else being equal, the “best” response bias will be one that maximizes the “expected value” of the outcome (Green and Swets, Signal Detection Theory and Psychophysics; Swets, “ROC Analysis Applied to the Evaluation of Medical Imaging Techniques”). For the problem of eyewitness identification, the response bias that maximizes expected value can be computed from the prevalence of guilty suspects in lineups and from societal values or costs associated with each of the possible eyewitness decisions (errors and correct assignments). Reliable data on prevalence are difficult to come by, and value/cost quantities are difficult to assign and likely to vary significantly across crimes and cultures. One can nonetheless gain an intuition for how these factors might define the best response bias conditions. Consider, for example, the consequences of decreasing the prevalence of guilty suspects in lineups. In this case, expected value can be maximized by inducing a conservative bias—i.e., if innocence is a priori likely, then there is value gained by being more selective in your response. Similarly, the optimal response bias will depend upon normative costs associated with different types of eyewitness errors. Generally speaking, if a society places greater emphasis on not identifying the innocent, relative to failing to identify the guilty, then expected value can be increased by inducing a more conservative response bias. But the opposite would be true if there were greater societal pressures for identifying the guilty, relative to protecting the innocent. Although an understanding of the relationship between response bias and expected value is important, expected value in this case has little to do with the diagnostic accuracy of an eyewitness report. But it does nonetheless bear on decisions about which lineup procedure should be employed.
the same data set.44 Most importantly for its application to the problem of evaluating eyewitness performance, the ROC approach possesses a distinct advantage because the dimensions of analysis—discriminability and response bias—map directly onto the mechanistic parameters of causal models of human recognition memory (see Chapter 4). In other words, the approach affords insight into and quantification of the sensory and cognitive processes that are believed to underlie memory-based classification decisions (see Box 5-1), such as eyewitness identifications.
Despite these merits, as a general statistical procedure for evaluation of binary classification performance and as a tool for evaluation of eyewitness performance, the ROC approach has some well-documented quantitative shortcomings. For example, ROC analysis depends on the ability to manipulate response bias or to estimate it from some other variable, and in the case of eyewitness identification that ability has been the subject of some debate. Recent studies have used expressed confidence level (ECL)—a measure of a witness’ confidence in his or her selection—as a proxy for response bias,45 based on the common-sense logic that a witness who has high confidence in their lineup selection should manifest a more conservative response bias than a witness who selected someone from the lineup despite lacking confidence in that selection (i.e., someone who made a selection even though they were not certain—a liberal response bias). This proxy relationship is inherently noisy within individuals, and the noisy relationship is exacerbated by the fact that the eyewitness identification ROC is population-based; individual data points are obtained from different people who may scale their confidence reports differently.46 On the other hand, it is empirically clear that, when scaled appropriately (within and across individuals), different levels of expressed confidence do, in fact, correspond to different pick frequencies and response biases.47
44Green and Swets, Signal Detection Theory and Psychophysics. D. J. Hand, “Measuring Classifier Performance: A Coherent Alternative to the Area under the ROC Curve,” Machine Learning 77, 103–123 (2009).
45See, e.g., N. Brewer and G. L. Wells, “The Confidence-Accuracy Relationship in Eyewitness Identification: Effects of Lineup Instructions, Foil Similarity, and Target-Absent Base Rates,” Journal of Experimental Psychology: Applied 12(1): 11–30 (2012); Mickes, Flowe, and Wixted, “Receiver Operating Characteristic Analysis of Eyewitness Memory”; and Carlson and Carlson, “An Evaluation of Lineup Presentation.”
46ECL is affected by over-confidence and under-confidence at the individual level, and the current implementation of the ROC approach, combining results across subjects, does not build this measurement error into the analysis or the comparison of empirical ROC curves. See Appendix C.
47See, e.g., Table 1 of Mickes, Flowe, and Wixted, “Receiver Operating Characteristic Analysis of Eyewitness Memory,” which summarizes confidence ratings, hit rates, false alarm rates, and diagnosticity ratios (HR/FAR) derived from data published in Brewer and Wells, “The Confidence-Accuracy Relationship in Eyewitness Identification.” Brewer and Wells employed
An additional prerequisite for the use of ECL as a measure of response bias is that an orderly relationship exists between confidence and accuracy—that witnesses expressing greater confidence are more likely to be accurate in their identifications. Although this hypothesis conforms to intuition,48 the existence of a significant confidence–accuracy relationship has been challenged repeatedly over the years.49 Recent evidence, however, suggests ways of improving the confidence–accuracy relationship (and obtaining more reliable measurements of it).50 While the ECL measure thus has potential, more research on this and other possible methods of estimating or controlling response bias is warranted to support efforts to extract a bias-free measure of discriminability.
Another technical concern raised by the use of ROC analysis to evaluate eyewitness identification performance is that it relies on a partial, rather than full, area under the ROC curve measure (see Box 5-2) as an index of discriminability that is separate from response bias. This is necessitated by the fact that the highest false alarm rates in eyewitness identification data are commonly well below 1.0, even under the most liberal response bias
a “confidence calibration” technique to normalize scaling of expressed confidence across witnesses. Both hit rates and false alarm rates declined steeply—implying an increasingly conservative response bias—as confidence levels increased. Diagnosticity ratios increased monotonically with increasing confidence. An identical pattern can be seen in Table 3 of Mickes, Flowe, and Wixted, “Receiver Operating Characteristic Analysis of Eyewitness Memory.” See also H. L. Roediger III, J. T. Wixted, and K. A. DeSoto, “The Curious Complexity Between Confidence and Accuracy in Reports from Memory,” in Memory and Law, ed. L. Nadel and W. Sinnott-Armstrong (Oxford: Oxford University Press, 2012), 97.
48K. A. Deffenbacher and E. F. Loftus, “Do Jurors Share a Common Understanding Concerning Eyewitness Behavior?,” Law and Human Behavior 6: 15–30 (1982); and G. L. Wells, T. J. Ferguson, and R. C. L. Lindsay, “The Tractability of Eyewitness Confidence and Its Implication for Triers of Fact,” Journal of Applied Psychology 66: 688–696 (1981).
49G. L. Wells and D. M. Murray, “Eyewitness Confidence,” in Eyewitness Testimony: Psychological Perspectives, ed. G. L. Wells and E. F. Loftus (New York: Cambridge University Press, 1984). B. L. Cutler and S. D. Penrod, Mistaken Identification: The Eyewitness, Psychology, and the Law (Cambridge: Cambridge University Press, 1995). R. K. Bothwell, K. A. Deffenbacher, and J.C. Brigham, “Correlation of Eyewitness Accuracy and Confidence: Optimality Hypothesis Revisited,” Journal of Applied Psychology 72:691–695 (1987). S. L. Sporer et al., “Choosing, Confidence, and Accuracy: A Meta-Analysis of the Confidence-Accuracy Relation in Eyewitness Identification Studies,” Psychological Bulletin 118(3): 315–327 (1995). T. A. Busey et al., “Accounts of the Confidence-Accuracy Relation in Recognition Memory,” Psychonomic Bulletin and Review 7(1): 26-48 (2000).
50N. Brewer and G. L. Wells, “The Confidence-Accuracy Relationship in Eyewitness Identification. P. Juslin, N. Olsson, and A. Winman, “Calibration and Diagnosticity of Confidence in Eyewitness Identification: Comments on What Can Be Inferred From the Low Confidence-Accuracy Correlation,” Journal of Experimental Psychology: Learning, Memory, and Cognition 22(5): 1304–1316 (September 1996). Roediger, Wixted, and DeSoto, “The Curious Complexity between Confidence and Accuracy.” Mickes, Flowe, and Wixted, “Receiver Operating Characteristic Analysis of Eyewitness Memory.”
conditions.51 In practice, partial area under the curve is computed by truncating the ROC curve at the highest false alarm rate obtained. Because the standard error of the partial area under the curve measure depends upon the degree of truncation, accuracy of this discriminability measure can easily vary across conditions and across studies, making the interpretation difficult.52
While ROC analysis has many recognized merits for the evaluation of binary classification, the residual concerns associated with its typical use for evaluating eyewitness performance merit consideration of other statistical approaches to this problem. As noted above, many methods have been proposed—and adopted in specific applications—for evaluation of binary classification performance.53 The committee knows of no instance in which any of these alternative methods has been applied to the problem of eyewitness identification. Moreover, because they have not been vetted, the committee is not in a position to endorse any specific statistical tool, the committee nevertheless encourages a general exploration of these alternatives. These alternatives may have their own share of unforeseen problems, and/or the performance criteria employed by them may bear no meaningful relationship to the sensory and cognitive processes involved in eyewitness identification. Nonetheless, some of these methods may provide greater insight into the factors that affect eyewitness identification performance and may, in turn, suggest ways of improving performance. To illustrate this opportunity by example, we consider the following possibilities.
It has been argued that a basic weakness of the existing ROC approach to binary classification performance results from the fact that, in principle
51Carlson and Carlson, “An Evaluation of Lineup Presentation.” Mickes, Flowe, and Wixted, “Receiver Operating Characteristic Analysis of Eyewitness Memory.”
52Along the same lines, accuracy of discriminability measures derived from ROC studies may be called into question when those studies do not take into account uncertainty in the data used to construct the ROC curves; see Appendix C. An argument has also been made that the area under the ROC curve can be a flawed metric for comparing binary classification conditions when the costs of classification errors are not precisely known and are different for different conditions (Hand, “Measuring Classifier Performance”). The costs of classification errors may be similar across some lineup comparisons and across some conditions of other systems variables, and for others they may be different. But for the most part they are not precisely known, and this is thus a topic that deserves greater attention given the growing use of ROC-based evaluation of eyewitness identification performance.
53Numerous methods for the evaluation of binary classifiers have been developed and applied in the field of machine learning, which seeks to optimize autonomous classification devices (such as, for example, the fingerprint lock access control on a smart phone, which must quickly and reliably distinguish the finger from another). This field has a long and rich history, and candidate methods are summarized in several texts on statistical classification and machine learning, such as Hastie, Tibshirani, and Friedman, The Elements of Statistical Learning and A. Smola and S. V. N. Vishwanathan, Introduction to Machine Learning (Cambridge: Cambridge University Press, 2009).
(and in practice under certain commonly unrecognized conditions), the area under the ROC curve is dependent on imprecise assumptions about the costs of classification errors across different classification conditions.54 One might suppose, for example, that the cost of a miss for a crime of murder is greater than the cost of a miss for a stolen car. But without a precise understanding of these relative decision costs, the area under the ROC curve measure can be incoherent, in that it depends as much on the classification conditions as it does on the sensitivity of the classifier. An alternative method has been proposed to address this problem—derivation of the “H measure”—that enables the performance of binary classifiers to be compared using a common metric that is independent of the cost distributions for different types of classification errors.55 The committee supports exploration of this alternative.
Another avenue for exploration emerges from the fact that the literature evaluating eyewitness identification performance has focused exclusively on the positive predictive value (PPV) of a witness’ classification as guilty. For a given response bias, PPV is related to the diagnosticity ratio, in that, given equal prevalence of the culprit in two conditions (e.g., lineup procedures) being compared, a higher diagnosticity ratio leads to a higher PPV. As discussed above, the diagnosticity ratio is a critical piece of information in efforts to evaluate eyewitness performance. As for any binary classification, however, there is also information associated with a negative response, which is the predictive value of a classifier’s assertion that a target is not present (in the eyewitness case, the witness’ assertion of innocence). This negative predictive value (NPV) is related to a different ratio of decisions, namely (1-HR)/(1-FAR),56 in that, given equal prevalence of the target in the two procedures being compared, higher values of this ratio correspond to higher values of NPV.
While NPV is commonly used to evaluate the accuracy of human classification decisions, such as in medical diagnosis, and is a source of information that may similarly be of additional value in efforts to evaluate lineup procedures, it has been largely neglected in the field of eyewitness identification.57 One might hold the intuition that PPV and NPV are monotonically related to one another—believing that the likelihood that the
54See Hand, “Measuring Classifier Performance.”
56The reciprocal of this ratio is called the “negative likelihood ratio.” See, e.g., T. Hoffmann, S. Bennett, and C. del Mar, Evidence-Based Practice Across the Health Professionals (Chatswood: Elsevier Australia, 2009).
57It seems likely that this neglect stems from the fact that the primary concern in eyewitness identification has been on incorrect assertions of guilt (i.e., false identifications) rather than incorrect assertions of innocence. There are normative values in society that reinforce this concern (as exemplified, for example, by Blackstone’s formulation: “Better that 10 guilty persons escape than that one innocent suffer.”)
witness will correctly identify the culprit is proportional to the likelihood that the witness will correctly identify lineup candidates as innocent—and thus conclude that evaluation of PPV alone is sufficient. Contrary to that intuition, however, evidence from studies of analogous binary classification problems reveals that these two predictive probabilities can vary with respect to one another in complex ways.58
In practice, NPV-related measures (quantified as negative likelihood ratios) can be subjected to ROC analysis to account for the effects of response bias in the same manner as PPV-related measures (quantified as positive likelihood ratios, i.e., diagnosticity ratios)—the ROC axes in the NPV case corresponding to 1-HR and 1-FAR. Consideration of NPV and its relationship to PPV, by this and other means, may provide additional insight into the ways in which estimator and system variables (such as lineup procedures) influence eyewitness identification performance.59
In sum, a formal understanding of the task facing an eyewitness, in conjunction with an appreciation of causal models of human recognition memory, has led to a potentially more comprehensive method—ROC analysis—for evaluating eyewitness identification performance. Despite these advances, it is important that practitioners in this field broadly explore the large and rich field of statistical tools for evaluation of binary classifiers. While the committee recognizes that these tools are uninvestigated for this application and may possess their own share of unforeseen problems or disadvantages, a move in this direction may be of great value for improving the validity of eyewitness identification.
Interactions with Eyewitnesses (Feedback)
The nature of law enforcement interactions with the eyewitness before, during, and after the identification plays a role in the accuracy of eyewitness identifications and in the confidence expressed in the accuracy of those identifications by witnesses.60 Law enforcement’s maintenance of neutral pre-identification communications—relative to the identification of a suspect—is seen as vital to ensuring that the eyewitness is not subjected to conscious or unconscious verbal or behavioral cues that could influence the
58S-Y Shiu and C. Gatsonis, “The Predictive Receiver Operating Characteristic Curve for the Joint Assessment of the Positive and Negative Predictive Values,” Philosophical Transactions, Series A, Mathematical, Physical and Engineering Sciences 366 (1874): 2313–2333 (2008).
59Another potentially informative analysis that combines PPV and NPV measures is known as a PROC (predictive ROC), which affords the opportunity to see how a given system or estimator variable may have interacting—synergistic or antagonistic—effects on assertions of guilt and innocence. See Shiu and Gatsonis, “The Predictive Receiver Operating Characteristic Curve.”
60S. E. Clark, T. E. Marshall, and R. Rosenthal, “Lineup Administrator Influences on Eyewitness Identification Decisions,” Journal of Experimental Psychology: Applied 15(1): 63 (2009).
eyewitness’ identification (see Box 2-1).61 If a witness happened to overhear an officer say, “We’ve got him, but before we finalize the arrest, let’s have the witness confirm it,” the witness might be biased to confirm the suspect’s identity in a showup. Furthermore, some types of law enforcement communication with a witness, after the witness has made an identification (e.g., “Good work! You picked the right guy…”), can increase confidence in an identification, regardless of whether the identification is correct.62
As discussed in Chapter 2, use of “blinded” or “double-blind” lineup identification procedures is an effective strategy for reducing the likelihood that a witness will be exposed to cues from interactions with law enforcement (such as feedback) that could influence identifications and/or confidence in those identifications. More generally, efforts to maintain objectivity and eliminate potentially informative communication will help ensure that eyewitness reports are not contaminated by knowledge or opinions held by others.
RESEARCH STUDIES ON ESTIMATOR VARIABLES
The impact of estimator variables on eyewitness accuracy is harder to measure in the field than the impact of system variables.63 Consequently, estimator variables have been studied nearly exclusively in laboratory settings. The committee’s review revealed the need for further empirical research in individual studies and systematic reviews of research on these factors.
The committee’s review focused on the most-studied estimator variables: weapon focus, stress and fear, own-race bias, exposure, and retention interval. It is important to emphasize, however, that numerous other estimator variables may affect both the reliability and the accuracy of eyewitness identifications. Research has shown that the physical distance between the witness and the perpetrator is an important estimator variable, as it directly affects the ability of the eyewitness to discern visual details,64 including features of the perpetrator65 (see discussion of vision in Chapter 4). Re-
61Clark, Moreland, and Gronlund, “Evolution of the Empirical and Theoretical Foundations of Eyewitness Identification Reform”: “…the performance advantage for unbiased instructions has decreased only slightly over the past 32 years. However, none of the correlations approached statistical significance.” p. 258.
62Douglas and Steblay, “Memory Distortion in Eyewitnesses.”
63G. L. Wells, “What Do We Know about Eyewitness Identification?” American Psychologist (May 1993): 553, 555.
64B. Uttl, P. Graf, and A. L. Siegenthaler, “Influence of Object Size on Baseline Identification, Priming, and Explicit Memory: Cognition and Neurosciences,” Scandinavian Journal of Psychology 48(4): 281–288 (2007).
65C. L. Maclean et al., “Post-Identification Feedback Effects: Investigators and Evaluators,” Applied Cognitive Psychology 25(5): 739–752 (2011).
search has also shown that an appearance change can greatly diminish the eyewitness’ ability to recognize the perpetrator; the eyewitness’ ability to remember faces of his or her own age group is often superior to his or her ability to remember faces of another age group (own-age bias); and if an eyewitness hears information or misinformation from another person before law enforcement involvement, his or her recollection of the event and confidence in the identification can be altered (co-witness contamination).66 Interactions between and among these variables have not been addressed systematically by researchers.
The presence of an unusual object at the scene of a crime can impair visual perception and memory of key features of the crime event. Research suggests that the presence of a weapon at the scene of a crime captures the visual attention of the witness and impedes the ability of the witness to attend to other important features of the visual scene, such as the face of the perpetrator (see also discussion of visual attention in Chapter 4). The ensuing lack of memory of these other key features may impair recognition of a perpetrator in a subsequent lineup.
A 1992 analysis of weapon focus studies found that the presence of a weapon reduced both identification accuracy and feature accuracy (e.g., the eyewitness’ ability to recall clothing, facial features, and more).67 A more recent analysis of the weapon focus literature concluded that the presence of a weapon has an inconsistent effect on identification accuracy, in that larger effect sizes were observed in threatening scenarios than in non-threatening ones.68 As the retention interval increased, the weapon focus effect size decreased. The analysis further indicated that the effect of a weapon on accuracy is slight in actual crimes, slightly larger in laboratory studies, and largest for simulations.
One possible cause of the inconsistent effects of the presence of a weapon is suggested by a recent laboratory-based study that exposed participants to crime videos.69 These investigators used ROC analysis to investigate discriminability as a function of (1) sequential versus simultaneous lineups; (2) the presence of a weapon; and (3) the presence of a distinctive facial feature. Importantly for the present discussion, discriminability was
66R. Zajac and N. Henderson, “Don’t It Make My Brown Eyes Blue: Co-Witness Misinformation about a Target’s Appearance Can Impair Target-Absent Lineup Performance,” Memory 17(3): 266–278 (2009).
67N. K. Steblay, “A Meta-analytic Review of the Weapon Focus Effect,” Law and Human Behavior 16(4): 413, 415–417 (1992).
68Fawcett et al., “Of Guns and Geese.”
69Carlson and Carlson, “An Evaluation of Lineup Presentation.”
reduced when the perpetrator possessed a weapon, but only when no distinctive facial feature was present. This interaction between weapon focus and distinctive feature highlights the importance of exploring the effects of interactions between different estimator variables on eyewitness identification performance.
Additional questions remain as to what is the cause of reduced eyewitness performance in cases where a weapon is present. Is the effect caused by a diversion of selective attention, as is suggested by basic research on the phenomenon of inattentional blindness (see Chapter 4)? Is stress a significant factor, i.e., does anxiety cause the witness to focus less on the features of a person’s face? To what extent is the prominence of the issue an artifact of the particular studies included in the meta-analysis? Is it possible, for example, that the magnitude of the weapon effect depends on whether the data are collected in a laboratory setting versus the real world? To this latter point, some analyses of weapon focus have been conducted using archival records of crimes involving weapons.70 Unfortunately, such efforts often encounter serious methodological difficulties that include a lack of information about the crime (e.g., exposure duration) and the general lack of “ground truth” regarding accuracy of any identification, among other problems.
Stress and Fear
High levels of stress or fear can affect eyewitness identification.71,72,73 This finding is not surprising, given the known effects of fear and stress on vision and memory (see Chapter 4). Under conditions of high stress, a witness’ ability to identify key characteristics of an individual’s face (e.g., hair length, hair color, eye color, shape of face, presence of facial hair) may be significantly impaired.74
In the particular case of weapon focus, it may not be possible to sufficiently test the effects of stress and heightened stress in the laboratory because of limitations on human participant research that uses realistic and heightened threats. A meta-analysis of the effect of high stress on eyewitness
70See, e.g., Fawcett et al., “Of Guns and Geese.”
71 Deffenbacher et al., “A Meta-Analytic Review of the Effects of High Stress.”
72C. A. Morgan III et al., “Accuracy of Eyewitness Memory for Persons Encountered During Exposure to Highly Intense Stress,” International Journal of Law and Psychiatry 27(3): 265–279 (2004).
73C. A. Morgan III et al., “Accuracy of Eyewitness Identification Is Significantly Associated with Performance on a Standardized Test of Recognition,” International Journal of Law and Psychiatry 30 (3): 213–223 (2007).
74C. A. Morgan III et al., “Misinformation Can Influence Memory for Recently Experienced, Highly Stressful Events,” International Journal of Law and Psychiatry 36(1): 11–17 (2013).
memory nonetheless found some support for the notion that stress impairs both eyewitness recall and identification accuracy.75 The study authors noted that lineup type “moderated the effect of heightened stress on the false alarm rate.”76 They also suggested that the modest effect of stress may be caused by the fact that the analysis included many studies that involved modest stress-induction.77
Earlier studies were more mixed but with clearer results at “high levels of cognitive anxiety.”78 The findings of an earlier study “provide a concrete illustration of catastrophic decline” of eyewitness identification performance at high anxiety levels.79 The correct identification rate went from 75 percent for those with low-state anxiety to 18 percent rate for those with high-state anxiety.80
The effects of suggestion may be particularly important when the original memory is of a highly stressful event. A recent study looked at more than 850 active-duty military personnel participating in a mock POW camp phase of U.S. military survival school training, which included aggressive interrogation and physical isolation-related stress.81 The study found that misinformative details of the interrogation event (e.g., regarding the identity of the interrogator), which were introduced after the event had been encoded into long-term memory, affected identification accuracy. The study also found that memories acquired during stressful events are highly vulnerable to modification by exposure to post-event misinformation, even in individuals whose level of training and experience might be considered relatively immune to such influences.
Another recent study comparing the eyewitness accuracy of officers and citizens, concentrated on the effects of stress and weapon focus.82 The results of this study showed that officers were less stressed and aroused than
75Deffenbacher et al., “A Meta-Analytic Review of the Effects Of High Stress.” It should be noted that the effect sizes for stress-induced support were small with wide confidence intervals, indicating considerable heterogeneity across studies. Although the authors assert that 300 studies with null findings would be required to negate the small effects found in this meta-analysis, fewer studies might be needed if they resulted in opposite effects.
79T. Valentine and J. Mesout, “Eyewitness Identification Under Stress in the London Dungeon,” Applied Cognitive Psychology 23(2): 151–161 (2009).
80K. A. Deffenbacher, “Estimating the Impact of Estimator Variables on Eyewitness Identification: A Fruitful Marriage of Practical Problem Solving and Psychological Theorizing,” Applied Cognitive Psychology 22(6): 822 (2008).
81Morgan et al., “Misinformation Can Influence Memory.”
82J. C. DeCarlo, “A Study Comparing the Eyewitness Accuracy of Police Officers and Citizens,” (PhD Diss, City University of New York, 2010).
citizens, but that both police and citizens made more errors when a weapon was inferred or present.
The race and ethnicity of a witness as it relates to that of the perpetrator is another important estimator variable. In eyewitness identification, own-race bias describes the phenomenon in which faces of people of races different from that of the eyewitness are harder to discriminate (and thus harder to identify accurately) than are faces of people of the same race as the eyewitness.83 In the laboratory, this effect is manifested by higher hit rates and lower false alarm rates (higher diagnosticity ratio) in the recognition of an observer’s own race relative to hits and false-alarms for recognition of other races.84 Own-race bias occurs in both visual discrimination and memory tasks, in laboratory and field studies, and across a range of races, ethnicities, and ages. Recent analyses revealed that cross-racial (mis) identification was present in 42 percent of the cases in which an erroneous eyewitness identification was made.85
A recent meta-analysis of own-race bias found an interaction between own-race bias and the duration of viewing exposure: reducing the amount of time allowed for viewing of each face significantly increased the magnitude of the bias, largely manifested as an increase in the proportion of false alarm responses to other-race faces.86 Own-race bias also interacts with the memory retention interval; cross-race errors of identification were greater when there were longer periods of time between the initial exposure and the memory retrieval.87 A recent study found that “context reinstatement,” wherein a researcher asks an individual to mentally re-create the context in which an incident occurred, failed to influence the identification of other-race faces.88
Although the existence of own-race bias is generally accepted, the causes for this effect are not fully understood. Some possible explanations are rooted in in-group/out-group models of human behavior (e.g., favorit-
83R. S. Malpass and J. Kravitz, “Recognition for Faces of Own and Other Race,” Journal of Personality and Social Psychology 13(4): 330–334 (1969).
84Meissner and Brigham, “Thirty Years of Investigating the Own-Race Bias.”
85 The Innocence Project, “What Wrongful Convictions Teach Us About Racial Inequality,” available at: http://www.innocenceproject.org/Content/What_Wrongful_Convictions_Teach_Us_About_Racial_Inequality.php.
86Meissner and Brigham, “Thirty Years of Investigating the Own-Race Bias.”
88J. R. Evans, J. L. Marcon, and C.A. Meissner, “Cross-Racial Lineup Identification: Assessing the Potential Benefits of Context Reinstatement,” Psychology, Crime, and Law 15 (1): 19–28 (2009).
ism in which decisions regarding members of one’s own “group” are regarded as having greater importance than decisions regarding members of a different “group”) and differential perceptual expertise that results from different degrees of exposure to and familiarity with same versus other races.
Recent work has examined the role that stereotyping might play.89 One study suggests that, in general, cross-race identification is further impaired when faces are presented in a group (as opposed to one at a time).90 Additional research is needed to identify procedures that may help estimate the degree of own-race biases in individual eyewitnesses following an identification procedure. Until the scientific basis for these effects is better understood, great care may be warranted when constructing lineups in instances where the race of the suspect differs from that of the eyewitness.
Eyewitness identification researchers have long believed that exposure duration (e.g., time spent observing a perpetrator’s face during a crime) is correlated with greater accuracy of eyewitness identification. The courts also have assumed that exposure duration has an effect on identification accuracy.91 Meta-analyses on the effects of exposure time have found that relatively long exposure durations produce greater accuracy92 and a larger and more stable effect size for exposure duration on eyewitness identi-
89H. M. Kleider, S. E. Cavrak, and L. R. Knuycky, “Looking Like a Criminal: Stereotypical Black Facial Features Promote Face Source Memory Error,” Memory and Cognition 40(8): 1200–1213 (2012).
90K. Pezdek, M. O’Brien, and C. Wasson, “Cross-Race (but Not Same-Race) Face Identification Is Impaired by Presenting Faces in a Group Rather Than Individually,” Law and Human Behavior 36(6): 488–495 (2012).
91Manson v. Brathwaite, 432 U.S. 98, 114 (1977), for example, included as a factor for assessing the reliability and admissibility of an identification, “the opportunity of the witness to view the criminal at the time of the crime” and explained that this factor includes both the length of time and the viewing conditions.
92B. H. Bornstein et al., “Effects of Exposure Time and Cognitive Operations on Facial Identification Accuracy: A Meta-Analysis of Two variables Associated with Initial Memory Strength,” Psychology, Crime, and Law 18 (5): 473–490 (2012). The authors state, “We used z as the primary effect size measure for differences between proportions correct, but we also converted z to Pearson’s r for comparability to other meta-analyses (see Tables 1 and 2). The rs were then normalized and averaged to obtain the overall mean effect sizes. We also report the value of Cohen’s d associated with each mean effect size” (Bornstein et al., “Effects of Exposure Time and Cognitive Operations).” Although not defined, presumably z refers to the usual difference in means divided by its standard error, and, from their tables, their r was calculated as z divided by the square root of the report sample size.
fication accuracy.93 Longer exposures were associated with higher rates of correct identifications and lower false alarm rates. Exposure duration may affect, or interact with, other variables, including own-race bias and the confidence–accuracy relationship assessed immediately after the lineup decision.94
The findings and conclusions from eyewitness identification studies of exposure duration are in keeping with much of the basic research on visual system function (reviewed in Chapter 4). This basic research indicates that the additional information available from longer viewing times reduces uncertainty and enables better detection and discrimination of visual stimuli.
Retention interval, or the amount of time that passes from the initial observation and encoding of a memory to a future time when the initial observation must be recalled from memory, can affect identification accuracy. Laboratory studies have demonstrated that stored memories are more likely to be forgotten with the increasing passage of time and can easily become “enhanced” or distorted by events that take place during this retention interval (see discussion of memory in Chapter 4). The amount of time between viewing a crime and the subsequent identification procedure can be expected to similarly affect the accuracy of the eyewitness identification, either independently or in combination with other variables.95
It is difficult to specify the precise relationship between retention interval and the accuracy of eyewitness identification testimony and to estimate when a lengthy retention interval will significantly impair the accuracy of identification. Although, in general, it appears that longer retention intervals are associated with poorer eyewitness identification performance, the strength of this association appears to vary greatly across the circumstances of the initial encounter, identification procedures, and research method-
93B. H. Bornstein, K. A. Deffenbacher, E. K. McGorty, and S. D. Penrod, “The Effect of Cognitive Processing on Facial Identification Accuracy: A Meta-Analysis” (Unpublished manuscript, University of Nebraska-Lincoln, 2007).
94M. A. Palmer, et al., “The Confidence–Accuracy Relationship for Eyewitness Identification Decisions: Effects of Exposure Duration, Retention Interval, and Divided Attention,” Journal of Experimental Psychology: Applied 19(1): 55–71 (2013).
95One month is the most commonly encountered delay by British police. G. Pike, N. Brace, and S. Kynan, The Visual Identification of Suspects: Procedures and Practice (London: Policing and Reducing Crime Unit, 2002), cited by Deffenbacher et al., “Forgetting the Once-Seen Face.” Law enforcement authorities may have little control over the time required to identify a suspect and obtain the cooperation of the eyewitness to participate in an identification procedure. Thus, retention interval has commonly been considered an estimator variable in eyewitness identification studies.
ologies.96 A meta-analysis of published facial recognition and eyewitness identification studies found, for example, that an increase in the retention interval was associated with a decreased probability of an accurate identification of a previously seen but otherwise unfamiliar face.97 This same study also found that the rate of forgetting for an unfamiliar face is greatest soon after the initial observation and tends to level off over time, but was unable to specify the shape of this function.
The effect of the retention interval also is influenced by the strength and quality of the initial memory that is encoded, which, in turn, may be influenced by other estimator variables associated with witnessing the crime (such as the degree of visual attention) and viewing factors (such as distance, lighting, and exposure duration). As the retention interval becomes longer, the opportunity for intervening events to alter the memory also becomes greater, and other variables may interact with the retention interval to impair performance (see also discussion of memory in Chapter 4). During the retention interval, the ability to accurately identify faces of other races drops off especially quickly, relative to same-race accuracy.98 Also, for those eyewitnesses who initially express less confidence in their identification, there is a greater decrease in accuracy of identification when the retention interval is longer.99
Research on eyewitness identification has appropriately identified the variables that may affect an individual’s ability to make an accurate identification. Early research findings played an important role in alerting law enforcement, prosecutors, defense counsel, and the judiciary to factors that
96See J. Dysart and R. C. L. Lindsay, “The Effects of Delay on Eyewitness Identification Accuracy: Should We Be Concerned?” in The Handbook of Eyewitness Psychology: Volume II: Memory for People, ed. R. C. L. Lindsay, D. F. Ross, J. D. Read, and M. P. Toglia. (Mahwah: Lawrence Erlbaum and Associates, 2006), 361–373.
97Deffenbacher et al., “Forgetting the Once-Seen Face.” More than 20 of the published studies included in the meta-analysis found no significant effect of retention interval.
98J. L. Marcon et al., “Perceptual Identification and the Cross-Race Effect,” Visual Cognition 18(5): 767–779 (2010) (finding that the cross-race effect was more pronounced when the retention interval was lengthened). Meissner and Brigham, “Thirty Years of Investigating the Own-race Bias” [meta-analysis finding that as retention time increased “participants increasingly adopted a more liberal response criterion when responding to other-race faces. This liberal response criterion indicated that participants required less evidence from memory (e.g., familiarity or memorability of the face) to respond that they had previously seen an other-race face.”].
99J. Sauer et al., “The Effect of Retention Interval on the Confidence–Accuracy Relationship for Eyewitness Identification,” Law and Human Behavior 34: 337–347 (2010) (finding greater overconfidence at lengthy retention intervals).
might influence the accuracy of identifications. In some jurisdictions, eyewitness identification research was used to improve policies and procedures and to educate and train officers. However, much remains unsettled in many areas of eyewitness identification research.
While past research appropriately identified system and estimator variables that may affect an individual’s ability to make an accurate identification, this research might be strengthened in several ways. Greater collaboration between the police, courts, and researchers might lead to increased consensus on research agendas and the conceptualization of variables to be examined. More attention to reproducibility and transparency is needed in the selection of data collection strategies and reporting of data. Analyses need to be reported completely, including estimates of effects, confidence intervals, and significance levels. Further, in order to be useful to stakeholders, the statistical findings of this research need to be translated back into terms that can be readily understood by practice and policy decision-makers.
Further, our understanding of errors in eyewitness identification will benefit from more effective research designs, more informative statistical measures and analyses, more probing analyses of research findings, and more sophisticated systematic reviews and meta-analyses. In view of the complexity of the effects of both system and estimator variables, and their interactions, on eyewitness identification accuracy, better experimental designs that incorporate selected combinations of these variables (e.g., presence or absence of a weapon, lighting conditions, etc.) will elucidate those variables with meaningful influence on eyewitness performance, which can inform law enforcement practice of eyewitness identification procedures. To date, the eyewitness literature has evaluated procedures mostly in terms of a single diagnosticity ratio or an ROC curve; even if uncertainty is incorporated into the analysis, many other powerful tools for evaluating a “binary classifier” are worthy of consideration.100
When primary studies such as those described above are available in sufficient quantities, it is important that their results are synthesized using systematic reviews that conform to current best standards.101 These quantitative reviews would necessarily employ transparent, reproducible procedures for locating all relevant published and unpublished research; employ independent, duplicate procedures for selection of studies, extraction of data, and assessment of risk of bias; use meta-analytic procedures
100Hastie, Tibshirani, and Friedman, The Elements of Statistical Learning.
101See, e.g., A. Liberati, et al., “The PRISMA Statement for Reporting Systematic Reviews and Meta-Analyses of Studies That Evaluate Health Care Interventions: Explanation and Elaboration,” PLoS Medicine 6(7): e1000100. doi:10.1371/journal.pmed.1000100 (2009) and Institute of Medicine, Finding What Works in Health Care: Standards For Systematic Reviews (Washington, DC: The National Academies Press, 2011).
that account for the heterogeneity of outcomes both within and across studies; and interpret confidence intervals around pooled effects in a way that is readily understandable by stakeholders. These systematic reviews (which would be regularly updated as new studies are conducted) can be used to further refine the research agenda in eyewitness identification research and to establish priorities for funding of additional primary research.