Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 10
Measurement Problems in Criminal Justice Research: Workshop Summary 2 Measuring Crime and Crime Victimization: Methodological Issues Roger Tourangeau and Madeline E. McNeeley All surveys face measurement challenges, but few topics raise problems of the variety or seriousness of those involved in measuring crime and crime victimization. As Skogan (1981) points out in his thoughtful monograph, Issues in the Measurement of Victimization, the nature of crime and crime victimization adds wrinkles to virtually every standard source of error in surveys. For example, even in our relatively crime-ridden times, crimes remain a rare event and, as a result, survey estimates are subject to large sampling errors. One national survey (National Victims Center, 1992) estimated that 0.7 percent of American women had experienced a completed rape during the prior year. This estimate was based on a sample of 3,220 responding women, implying that the estimate reflected positive answers to the relevant survey items from about 23 women. Skogan details the large margins of sampling error for many key estimates from the National Crime Survey (NCS), a survey that used a very large sample (and which later evolved into the National Crime Victimization Survey). The clandestine nature of many crimes means that the victim may be unable to provide key details about the victimization and may not even be aware that a crime has been committed at all. Certain incidents that are supposed to be reported in an interview may seem irrelevant to respondents, since they do not think of these incidents as involving crimes. For example, victims of domestic violence or of sexual harassment may not think of these as discrete criminal incidents but as chronic family or interpersonal problems. It may be difficult to prompt the recall of such inci
OCR for page 11
Measurement Problems in Criminal Justice Research: Workshop Summary dents with the short, concrete items typically used in surveys (for a fuller discussion, see Skogan, 1981:7-10). NATIONAL CRIME VICTIMIZATION SURVEY The National Crime Survey and its successor, the National Crime Victimization Survey (NCVS), underwent lengthy development periods featuring record check studies and split-ballot experiments to determine the best way to measure crime victimization. In the records check studies, the samples included known crime victims selected from police records. In survey parlance, these were studies of reverse records check—the records had been “checked” before the survey reports were ever elicited. The studies were done in Washington, D.C., Akron, Cleveland, Dayton, San Jose, and Baltimore (see Lehnen and Skogan, 1981, for a summary). A key objective of these early studies was to determine the best length for the reporting period for a survey, balancing the need to increase the number of crime reports with the need to reduce memory errors. A second wave of studies informing the NCS design was carried out in the early 1980s by researchers at the Bureau of Social Science Research and the Survey Research Center at the University of Michigan (summarized by Martin et al., 1986). This second wave of developmental studies mainly involved split-ballot comparisons (in which random portions of the sample were assigned to different versions of the questionnaire) focusing on the “screening” items, in which respondents first indicate they have relevant incidents to report. Some of these studies were inspired by a conference (described in Biderman, 1980) that brought cognitive psychologists and survey researchers together to examine the memory issues raised by the NCS. Unfortunately, some of the most intriguing findings from the resulting experiments were never published and are buried in hard-to-find memoranda. For several reasons, the NCVS results are widely used as benchmarks to which statistics from other surveys on crime and crime victimization are compared. Conducted by the Bureau of the Census, the NCVS is the largest and oldest of the crime victimization studies. It uses a rotating panel design in which respondents are interviewed several times before they are “retired” from the sample, a design that greatly improves the precision of sample estimates. It uses a relatively short, six-month reporting period and “bounded” interviewing, in which respondents are instructed to report only incidents that have occurred since the previous interview and are reminded
OCR for page 12
Measurement Problems in Criminal Justice Research: Workshop Summary of the incidents they reported then. (Results of the first interview, which is necessarily unbounded, are discarded.) The initial interview is done face to face to ensure maximum coverage of the population; if necessary, subsequent interviews are also conducted in person. Examples of Measurement Problems Despite these impressive design features and the large body of methodological work that shaped it, the NCVS is not without its critics. Two recent controversies illustrate the problems of the NCVS and of crime surveys more generally. One controversy centers on the number of incidents of defensive gun use in the United States; the other concerns the number of incidents of rape. In both cases, seemingly similar surveys yield widely discrepant results; the ensuing methodological controversies point to unresolved issues in how to collect data on crime, gun use, and crime victimization in surveys. Defensive Gun Use In 1994, McDowall and Wiersema published an estimate of the number of incidents over a four-year period in which potential crime victims had used guns to protect themselves during an actual or attempted crime. Their estimate was based on data from the NCVS, which gathers information about several classes of crime—rape, assault, burglary, personal and household larceny, and car theft. When respondents report an incident in which they were victimized, they are asked several follow-up questions, including, “Was there anything you did or tried to do about the incident while it was going on?” and, “Did you do anything (else) with the idea of protecting yourself or your property while the incident was going on?” Responses to these follow-up probes are coded into a number of categories, several of which capture defensive gun use (e.g., “attacked offender with gun”). The key estimates McDowall and Wiersema presented were that between 1987 and 1990 there were some 260,000 incidents of defensive gun use in the United States, roughly 65,000 per year. Although big numbers, they pale by comparison with the total number of crimes reported during the same period—guns were used defensively in fewer than one in 500 victimizations reported in the NCVS; moreover, criminal offenders were armed about 10 times more often than their victims. These are just the sort of statistics dear to gun control advocates. McDowall and
OCR for page 13
Measurement Problems in Criminal Justice Research: Workshop Summary Wiersema conclude that “criminals face little threat from armed citizens” (1994:1984). McDowall and Wiersema note, however, that their estimates of defensive gun use differ markedly from those based on an earlier survey by Kleck (1991; see also Kleck and Gertz, 1995). Kleck’s results indicated 800,000 to 1 million incidents of defensive gun use annually. These numbers were derived from a national telephone survey of 1,228 registered voters who were asked: “Within the past five years have you, yourself, or another member of your household used a handgun, even if it was not fired, for selfprotection or for the protection of property at home, work, or elsewhere, excluding military service and police security work?” There are so many differences between the Kleck survey and the NCVS that it should come as no surprise that the results do not line up very well. The two surveys covered different populations (the civilian noninstitutional population in the NCVS versus registered voters with a telephone in the Kleck survey), interviewed respondents by different methods (in-person versus telephone), covered different recall periods (six months in NCVS versus five years in the Kleck study), and asked their respondents markedly different questions. The NCVS uses a bounded interview, the Kleck survey an unbounded interview. Still, the difference between 65,000 incidents a year and some 900,000 is quite dramatic and would seem to demand a less mundane explanation than one involving routine methodological differences. A later telephone survey by Kleck and Gertz (1995) yielded an even higher estimate—2.5 million incidents of defensive gun use. McDowall and Wiersema (1994) cite two other possible explanations of the differences between the results of the NCVS and the earlier Kleck study. First, they note that Kleck’s estimates rest on the reports of a mere 49 respondents. (The later Kleck and Gertz estimates also rest on a similarly small base of positive reports—66 out of nearly 5,000 completed interviews.) Even a few mistaken respondents could have a large impact on the results. In addition, the Kleck item covers a much broader range of situations than does the NCVS. The NCVS excludes preemptive use of firearms (e.g., motorists who keep a gun in their car “for protection” but never take it out of the glove compartment), focusing more narrowly on gun use during actual or attempted crimes. It is possible that much of the disparity between the NCVS estimates and those derived from the two Kleck studies reflects the broader net cast in the latter surveys. A methodological experiment by McDowall, Loftin, and Presser (2000) compared questions modeled on the ones used in the NCVS with ones like
OCR for page 14
Measurement Problems in Criminal Justice Research: Workshop Summary those used in the Kleck surveys. Respondents were asked both sets of questions—both written to cover a one-year recall period—and the experiment varied which ones came first in the interview. The sample included 3,006 respondents, selected from a list of likely gun owners. Overall, the Kleck items yielded three times more reports of defensive gun use than the NCVS-style items. What was particularly interesting in the results was that the two sets of items appeared to yield virtually nonoverlapping sets of incidents; of the 89 reports of defensive gun use, only 9 were mentioned in response to both sets of items. Prevalence of Rape An even more disparate set of figures surrounds the issue of the number of women in the United States who have been the victim of attempted or completed rapes. Once again, the studies from which the estimates are drawn differ in many crucial particulars—they sample different populations, ask different questions that are based on different definitions of rape, conduct data collection via different methods, and cover different recall periods. As with the estimates of defensive gun use, what is surprising is not that the estimates differ from each other but that they differ so widely. Several studies converge on the estimate that about one-quarter of American women have been victims of completed or attempted rape at some time in their lives (see, for example, Koss, 1993:Table 1). Most of these figures do not accord well with the rape estimates from the NCVS; the NCVS covers a more limited period—six months—and does not produce estimates of lifetime victimization. But the NCVS’s annual estimates—for example, fewer than 1 woman or girl in 1,000 experienced a rape or attempted rape in 1992 (see Koss, 1996)—imply that rape is much less common than indicated by most of the other surveys. Koss (1992, 1993, 1996) has been an energetic critic of the NCVS procedures for assessing the prevalence of rape, but at least two other papers have presented careful comparisons between the NCVS procedures and those of other surveys (Fisher and Cullen, 2000; Lynch, 1996) and support Koss’s contention that methodological details matter a great deal in assessing the prevalence of rape victimization. Lynch, for example, reports about a twofold difference between annual estimates for 1992 from the NCVS and the National Women’s Study (NWS) for the previous year. For 1992 the NCVS estimated 355,000 incidents of rape; by contrast, the NWS estimated that 680,000 women
OCR for page 15
Measurement Problems in Criminal Justice Research: Workshop Summary were rape victims. (The NCVS figure translates into fewer than 355,000 victims since the same person may have experienced multiple victimizations.) Lynch explores a number of differences between the two studies, including: the age range covered by the surveys (18 and older for the NWS, 12 and older for the NCVS); the sample sizes (4,008 for the NWS, 100,000+ for the NCVS); the length of the recall period (one year for the NWS, six months for the NCVS); the schedule of interviewing (the NWS estimates are based on data from the second interview from a three-wave longitudinal study, whereas the NCVS estimates reflect data from the second through last interviews from a seven-wave panel); and the questions used (brief yes-no items in the NWS versus detailed incident reports in the NCVS). Despite the methodological differences between the two surveys, the difference between the two estimates is probably not significant. The estimates from both surveys have large standard errors (approximately 190,000 for the NWS estimate and approximately 32,000 for the NCVS), and the standard error of the difference is on the order of 200,000. One major difference between the NCVS and most of the other surveys assessing the frequency of rape involves the basic strategy used to elicit reports about rapes and other crimes. The NCVS begins with a battery of yes-no items designed to prompt reports about a broad array of completed or attempted crimes. Only one of these initial screening items directly mentions rape (“Has anyone attacked or threatened you in any of these ways…any rape, attempted rape, or other type of sexual attack?”), although several other questions concern actual or threatened violence. Once the respondent completes these initial screening items, further questions gather more detailed information about each incident; the final classification of an incident in the NCVS reflects these detailed reports rather than the answers to the initial screening questions. Most of the other surveys on rape differ from this procedure in two key ways—first, they ask multiple screening questions specifically crafted to elicit reports about rape and, second, they omit the detailed follow-up questions. For example, a survey by Koss, Gidycz, and Wisniewski (1987) included five items designed to elicit reports of attempted or completed rape. The items are quite specific. For
OCR for page 16
Measurement Problems in Criminal Justice Research: Workshop Summary example, one asks, “Have you had a man attempt sexual intercourse (get on top of you, attempt to insert his penis) when you didn’t want to by threatening or using some degree of force (twisting your arm, holding you down, etc.) but intercourse did not occur?” The NWS adopted this same approach, employing five quite explicit items to elicit reports of attempted or completed rape. There is little doubt that including multiple concrete items will clarify the exact concepts involved and prompt fuller recall. Multiple items provide more memory cues and probably trigger more attempts at retrieval; both the added cues and the added time on task are likely to improve recall (Bradburn and Sudman, 1979; Burton and Blair, 1991; Cannell et al., 1981; Means et al., 1994; Wagenaar, 1986; Williams and Hollan, 1981). The NCVS is a general-purpose crime survey, and its probes cover a broad array of crimes. The NWS and the Koss surveys use much more detailed probes that focus on a narrower range of crimes. At the same time, the absence of detailed information about each incident could easily lead to classification errors. A study by Fisher and Cullen (2000) included both yes-no screening items of the type used by Koss and colleagues, the NWS, and many other studies of rape and the more detailed questions about each incident featured by the NCVS. They compared responses to the screening questions with the final classifications of the incidents based on the detailed reports. There were twice as many positive answers to the rape screening questions as there were incidents ultimately classified as rapes based on the detailed reports. (The rape screening items also captured many incidents involving some other type of sexual victimization.) In addition, some incidents classified as rapes on the basis of the detailed information were initially elicited by screening items designed to tap other forms of sexual victimization. The results suggest that, even when the wording of screening items is quite explicit, respondents can still misclassify incidents. Factors Affecting Reporting in Crime Surveys Many surveys on sensitive subjects adopt methods primarily designed to reduce underreporting—that is, the omission of events that should, in principle, be reported. And it is certainly plausible that women would be reluctant to report extremely painful and personal incidents such as attempted or completed rapes. Even with less sensitive topics, such as bur
OCR for page 17
Measurement Problems in Criminal Justice Research: Workshop Summary glary or car theft, a variety of processes—lack of awareness that a crime has been committed, forgetting, unwillingness to work hard at answering— can lead to systematic underreporting. There are also reasons to believe that crime surveys, like other surveys that depend on recall, may be prone to errors in the opposite direction as well. Because crime is a relatively rare event, most respondents are not in the position to omit eligible incidents; they do not have any to report. The vast majority of respondents can only overreport defensive gun use, rapes, or crime victimization more generally. In his discussion of the controversy over estimates of defensive gun use, Hemenway (1997) makes the same point. All survey questions are prone to errors, including essentially random reporting errors. For the moment, let us accept the view that 1 percent of all adults used a gun to defend themselves against a crime over the past year. If the sample accurately reflects this underlying distribution, then only 1 percent of respondents are in the position to underreport defensive gun use; the remaining 99 percent can only overreport it. Even if we suppose that an underreport is, say, 10 times more likely than an overreport, the overwhelming majority of errors will still be in the direction of overreporting. If, for example, one out of every four respondents who actually used a gun to defend himself denies it while only 1 in 40 respondents who did not use a gun in self-defense claim in error to have done so, the resulting estimate will nonetheless be sharply biased upward (1% × 75% + 99% × 2.5% = 3.25%). It is not hard to imagine an error rate of the magnitude of 1 in 40 arising from respondent inattention, misunderstanding of the questions, interviewer errors in recording the answers, and other essentially random factors. Even the simplest survey items—for instance, those asking about sex and age— yield less than perfectly reliable answers. Random errors can, in the aggregate, yield systematic biases when most of the respondents are in the position to make errors in only one direction. Aside from sheer unreliability, though, reporting in crime surveys may be affected by several systematic factors that can introduce additional distortions of their own. We focus on two of these systematic factors here. First, we address the potentially sensitive nature of the questions on many crime surveys and the impact of the mode of data collection on the answers to such questions. This is followed by an examination of the effects of the context in which survey items are presented, including the physical setting of the interview, the perceived purpose and sponsorship of the study, and prior questions in the interview.
OCR for page 18
Measurement Problems in Criminal Justice Research: Workshop Summary IMPACT OF THE MODE OF DATA COLLECTION Most of the surveys that have produced the widely varying estimates of defensive gun use and rape incidence use some form of interviewer administration of the questions. For example, Koss (1993) lists 20 surveys on sexual victimization of women; only 4 (all of them involving local samples from individual communities) appear to use self-administered questionnaires. The remainder rely on interviewers to collect the data, in either face-to-face or telephone interviews. (The NCVS uses both methods; the initial interview is done face to face, but later interviews are, to the extent possible, done by telephone.) The last decade has seen dramatic changes in the methods used to collect survey data, including the introduction of several new methods of computerized self-administration. For example, the National Household Survey of Drug Abuse, a large survey sponsored by the Substance Abuse and Mental Health Services Administration, has adopted audio computer-assisted self-interviewing (ACASI). With ACASI a computer simultaneously displays the item on screen and plays a recording of it to the survey respondent via earphones. The respondent enters an answer directly into the computer using the keypad. Other new methods for administering questions include computer-assisted self-interviewing without the audio (CASI), e-mail surveys, telephone ACASI, and World Wide Web surveys. Two trends have spurred the development and rapid adoption of new methods of computerized self-administration of surveys. First, various technological changes—such as the introduction of lighter, more powerful laptop computers, development of the World Wide Web, widespread adoption of e-mail, and improvements in sound card technology—have made the new methods possible. Second, the need for survey data on sensitive topics, such as illicit drug use and sexual behaviors related to the spread of AIDS, has made the new methods highly desirable, since they combine the privacy of self-administration with the power and flexibility of computer administration. Widespread interest in the new methods has spurred survey methodologists to reexamine the value of self-administration for collecting survey data on sensitive topics. Gains from Self-Administration There is strong evidence to support the value of self-administration for eliciting reports about sensitive behaviors. To illustrate the gains from self-
OCR for page 19
Measurement Problems in Criminal Justice Research: Workshop Summary FIGURE 2-1 Drug reporting. administration, Figure 2-1 plots the ratio between the level of illicit drug use reported when survey questions are self-administered to the level reported when interviewers administer the questions. For example, if 6 percent of respondents report using cocaine during the previous year under self-administration but only 4 percent report using cocaine under interviewer administration, the ratio would be 1:5. The data are from two of the largest mode comparisons done to date: one carried out by Turner, Lessler, and Devore (1992) and the other by Schober, Caces, Pergamit, and Branden (1992). The ratios range from a little over 1:0 to as high as 2:5— which is to say that self-administration more than doubled the reported rate of illicit drug use. Tourangeau, Rips, and Rasinski (2000:Table 10.2) reviewed a number of similar mode comparisons; they found a median increase of 30 percent in the reported prevalence of marijuana use with self-administration and similar gains for cocaine use. They also summarize the evidence that self-administration improves reporting about other sensitive topics, including sexual partners, abortion, smoking, and church attendance. Mode, Privacy, and the Presence of Third Parties It is natural to think that at least some of the gains from self-administration result from the reduced risk of disclosure to other household mem
OCR for page 20
Measurement Problems in Criminal Justice Research: Workshop Summary bers, but there is surprisingly little evidence that the presence of other household members has much effect on what respondents report during interviews. Interviews are often conducted under less than ideal conditions, and, although most survey organizations train their interviewers to try to find private settings for the interviews, other household members are often present. For example, Silver and colleagues examined the proportion of interviews done for the American National Election Studies (ANES) in which other household members were present. The ANES is a series of surveys funded by the National Science Foundation and carried out by the University of Michigan’s Survey Research Center. The proportion varied somewhat from one survey to the next, but roughly half of all interviews conducted between 1966 and 1982 were done in the presence of another household member (Silver et al., 1986). Similarly, Martin and colleagues (1986) noted that some 58 percent of NCS interviews were conducted within earshot of someone other than the interviewer and respondent (for a more recent estimate, see Coker and Stasny, 1995). Silver and colleagues looked at whether the presence of other people during interviews affected the overreporting of voting. In many jurisdictions, whether someone voted is a matter of public record, so it is a relatively easy matter to determine the accuracy of reports about voting. Voting is a socially desirable behavior, and many nonvoters—roughly a quarter, according to Silver and company—nonetheless report that they voted during the most recent election. What is somewhat surprising is that the rate of overreporting did not vary as a function of the privacy of the interview. A number of other national surveys have also recorded whether other people are present during the interviews, but researchers who have examined these data have found little evidence that the presence of others affects reports about such potentially sensitive topics as sexual behavior (Laumann et al., 1994) or illicit drug use (Schober et al., 1992; Turner et al., 1992). Smith’s (1997) review concludes that in general the effects of the presence of third parties during interviews are minimal. There are several possible explanations for the absence of third-party effects. As Martin and colleagues note, other household members may remember relevant incidents that a respondent has forgotten, offsetting any inhibiting effect their presence has. When another household member already knows the sensitive information, his or her presence may make it more difficult for the respondent to withhold it from the interviewer. In addition, interviewers are probably more likely to do interviews with other
OCR for page 32
Measurement Problems in Criminal Justice Research: Workshop Summary These effects of prior items on the interpretation and retrieval process for subsequent questions means that asking the “same” question in two different questionnaires will not necessarily yield the same answers. As Koss (1993) argues, the focus of the NCVS on criminal victimizations (along with other cues in that study) may promote a narrow interpretation of the type of incidents to be reported; in addition, the screening items in the NCVS may serve as relatively poor retrieval cues for incidents the respondents do not necessarily think of as crimes. On the other hand, some of the items used in the Sexual Experiences Survey (Koss et al., 1987) and later rape surveys—“Have you given in to sex play (fondling, kissing, or petting, but not intercourse) when you didn’t want to because you were overwhelmed by a man’s continual arguments and pressure?”—may help prompt fuller recall of more serious incidents but may also suggest that almost any unwanted sexual experience is of interest, encouraging overreporting. Bounding Another procedure used in some crime surveys may also help frame later questions for respondents and trigger the recall of relevant events; this is the review of incidents reported in the previous round as part of the bounding procedure. The purpose of bounding is to prevent respondents from reporting incidents that actually occurred before the start of the recall period; this type of error is known as telescoping in the survey literature. Although more sophisticated theories of telescoping have been proposed, it mostly appears to reflect our poor memory for dates (see, for example, Baddeley et al., 1978; Thompson et al., 1996). Because telescoping errors are common, bounding can have a big effect on the level of reporting in a survey. In fact, Neter and Waksberg’s (1964) initial demonstration of the benefits of bounding indicated that about 40 percent of all household repairs and more than half of related expenditures were reported in error in unbounded interviews. Table 2-1 summarizes the results of the Neter and Waksberg (1964) study, along with a series of studies by Loftus and Marburger (1983) that explored alternative procedures for bounding the recall period. Loftus and Marburger used several procedures to define the beginning of the recall period. They carried out two of their experiments exactly six months after the eruption of Mt. Saint Helens; in those studies, they compared answers to questions that began “Since the eruption of Mt. St. Helens
OCR for page 33
Measurement Problems in Criminal Justice Research: Workshop Summary …” with parallel items that began “During the last six months….” The eruption of Mt. Saint Helens served as what Loftus and Marburger called a temporal landmark. In subsequent experiments, they used New Year’s Day to mark off the boundary of the recall period or asked respondents to generate their own personally significant landmark event near the beginning of the recall period. As Table 2-1 indicates, whether bounding takes the form of reviewing with the respondents what they already reported in the previous interview (as in Neter and Waksberg, 1964), providing them with a public landmark event, like the eruption of Mt. St. Helens or New Year’s Day (Loftus and Marburger, 1983), or asking them to generate their own personal landmark events (Loftus and Marburger, 1983:Experiment 3), bounding sharply reduces the level of reporting. Bounding probably has several useful effects. First, as Biderman and Cantor (1984) noted, it helps communicate the importance of precision; just mentioning the specific date that marked the beginning of the recall period had a noticeable impact on the number of victimizations reported in Loftus and Marburger’s final experiment. It is quite likely that many respondents begin survey interviews thinking that it will be cooperative for them to mention incidents related to the topic of the interview, even if those incidents do not meet all the requirements set forth in the questions. Bounding procedures help alter this expectation. But the impact of bounding seems to go beyond its role in socializing the respondent to the task’s requirements. A variety of evidence suggests that people are much better at reconstructing the relative order of different events than recalling their exact dates (e.g., Friedman, 1993). Bounding converts the temporal judgment respondents have to make from an absolute one (Does the event fall in this period?) to a relative one (Did the event occur before or after the bounding event?); relative judgment is a lot more accurate. Bounding procedures can serve still another function—both previously reported incidents and landmark events can serve as powerful retrieval cues. When people are asked to remember events from a given period, they tend to recall incidents that fall near temporal boundaries, such as the beginning of the school year or major holidays (Kurbat et al., 1998; Robinson, 1986). Major temporal periods are an important organizing principle for our autobiographical memories; if our memories were written autobiographies, they would be made up of chapters corresponding to each of the major periods of our lives. The boundaries that separate these periods are powerful retrieval cues. Similarly, the review of events reported in an earlier interview
OCR for page 34
Measurement Problems in Criminal Justice Research: Workshop Summary TABLE 2-1 Impact of Bounding Procedures Study Bounding Procedure Ratio of Events Reported: Unbounded over Bounded Neter and Waksberg (1964) Prior interview Expenditures 1.40 Jobs 1.55 Loftus and Marburger (1983) Experiment 1 Landmark event Any victimizations 6.15 Victim of theft 1.51 Experiment 2 Landmark event Victim of assault 1.52 Reported crime 1.22 Experiment 3 Personal landmark Any victimizations 5.50 Experiment 4 New Year’s Day Any victimizations 2.00 Experiment 5 New Year’s Day Any victimizations 2.52 Specific date Any victimizations 1.32 may trigger the recall of similar or related incidents since then. Bounding procedures improve the accuracy of recall, helping respondents weed out ineligible events and remember eligible ones. As Table 2-1 suggests, the net effect can be dramatic. CONCLUSIONS: SOURCES OF DIFFERENCES ACROSS SURVEYS Most papers that examine discrepancies across surveys are limited to speculating about the sources of the differences in the results, and this paper is no exception. (McDowall et al., 2000, and Fisher and Cullen, 2000, are exceptions—they present evidence testing specific hypotheses about why different procedures gave different results.) Throughout, we have offered conjectures about the variables that affect reporting in crime surveys. In this final section we try to be a little more explicit about the variables we think are the key ones. One theme that runs through our discussion is that both overreporting and underreporting are possible; it simply cannot be
OCR for page 35
Measurement Problems in Criminal Justice Research: Workshop Summary taken for granted that relatively rare events, like defensive gun use—or even very sensitive ones, like rape victimizations—will necessarily be underreported in surveys. Respondents can only make the errors it is logically possible for them to make; if most of them have not in fact experienced the target events, they can only overreport them. Moreover, as the work of Loftus reminds us, forgetting does not necessarily make us underreport events. Forgetting when something happened or what exactly took place can lead us to report events that do not really count. And the same cues that can help us remember an event can also encourage us to report incidents that do not meet the requirements of a survey’s questions. A variable that has been relatively neglected in discussions of crime reporting has been the mode of data collection. There is strong evidence that self-administration produces fuller reporting of sensitive behaviors, sometimes dramatically so (as in Figure 2-1). Several new methods of computerized self-administration have become available over the past 10 years; these new methods have greatly extended the range of situations in which self-administration can be used and in some cases have sharply increased levels of reporting (e.g., see Figure 2-2). The new technologies can be used in conjunction with face-to-face (Tourangeau and Smith, 1996; Turner et al., 1998) or telephone interviews (Phipps and Tupek, 1990; Turner et al., 1998), or they can be used to administer stand-alone surveys via e-mail (e.g., Kiesler and Sproull, 1986) or the Internet (Dillman et al., 1998). Our first hypothesis, then, is that self-administration will dramatically increase reports of some types of crime, particularly those that carry stigma and those perpetrated by other household members; self-administration will reduce reports of incidents that put the respondent in a favorable light, including perhaps defensive gun use. A related hypothesis involves the presence of other household members; for the topics raised in crime surveys, we believe that the presence of other household members must make a difference (at least for crimes involving domestic violence). Crime surveys may prove to be a notable exception to the rule that the presence of third parties during an interview does not have much effect on a respondent’s answers (see Coker and Stasny, 1995, for some evidence supporting this conjecture). We also offer several hypotheses about the effects of the context of a survey, construing context broadly to include not only the previous items in the questionnaire but also the packaging of the survey to the respondent and the procedures used to bound the recall period. Our first hypothesis is that the apparent topic of the survey, the survey’s sponsorship, the organi
OCR for page 36
Measurement Problems in Criminal Justice Research: Workshop Summary zation responsible for collecting the data, the letterhead used on advance letters, and similar procedural details will affect respondents’ views about the need for accuracy in reporting and about the type of incidents they are supposed to report. It is easy to imagine an experiment that administers the same questions to all respondents but varies the framing of the survey. Such an experiment would examine how reports were affected by the framing of the survey; in addition, it might also compare respondents’ judgments as to whether they were supposed to report hypothetical incidents in vignettes describing borderline cases. Our guess is that the packaging of the survey will have a big impact on how respondents classify the incidents depicted in the vignettes. Our next hypothesis is that the context provided by earlier questions will have effects similar to those of the context provided by the external trappings of the survey. A rape survey loaded with crime items is likely to lead respondents to omit sexual victimizations that do not seem crimelike; a survey loaded with items on sexual victimizations will lead respondents to report incidents that are not, strictly speaking, rapes. Respondents want to help out by providing relevant information, but they are accustomed to the looser standards of conversation and take cognitive shortcuts to reduce the demands of the questions. As a result, it is important to gather detailed information about each incident that respondents report. Even when elaborate and explicit screening items are used, the researchers’ classification of an incident will not necessarily agree with the respondent’s (Fisher and Cullen, 2000). We discussed one additional contextual variable—the bounding procedure used to frame the recall period for the survey items. Our final hypothesis is that the exact bounding procedure a survey uses will sharply affect the final estimates. Surveys that ask about events during a vaguely defined recall period (e.g., the Kleck surveys on defensive gun use) will yield more reports than surveys that take the trouble to bound the recall period more sharply. An exact date is good and a landmark event is better. By itself a prior interview may not be all that effective as a bounding event; our final hypothesis is that a full bounding procedure that includes a review of the incidents reported in the previous interview will reduce reporting relative to the more truncated procedure used in many surveys that simply instructs respondents to report incidents that occurred since the last interview. Compared to temporal or personal landmarks, the prior interview may not mark off the relevant time period very clearly. Of course, lots of things affect reporting in surveys. Crime surveys are
OCR for page 37
Measurement Problems in Criminal Justice Research: Workshop Summary at an added disadvantage because many of their questions involve particularly stigmatized or traumatic events, such as rape, that respondents may simply not want to discuss. This is why it is especially important to do as much as possible to uncover the effects of those factors that are within the control of the researchers. We have tried to focus on a few of the variables—the mode of interviewing, the setting of the interview, the framing of the survey, and the context of the key items—that we think may have a big impact on reporting in crime surveys. These are variables that have been shown to have effects large enough to account for the very large differences in results across different surveys. Unfortunately, we will not know whether these are the culprits until someone does the right experiments. REFERENCES Anderson, J.R. 1983 The Architecture of Cognition. Cambridge, MA: Harvard University Press. Baddeley, A.D., V. Lewis, and I. Nimmo-Smith 1978 When did you last…? In Practical Aspects of Memory, M.M. Gruneberg, P.E. Morris, and R.N. Sykes, eds. London: London Academic Press. Beebe, T.J., P.A. Harrison, J.A. McRae, R.E. Anderson, and J.A. Fulkerson 1998 An evaluation of computer-assisted self-interviews in a school setting. Public Opinion Quarterly 11:623-632. Biderman, A. 1980 Report of a Workshop on Applying Cognitive Psychology to Recall Problems of the National Crime Survey. Washington, DC: Bureau of Social Science Research. Biderman, A., and D. Cantor 1984 A longitudinal analysis of bounding, respondent conditioning, and mobility as sources of panel bias in the National Crime Survey. In Proceedings of the American Statistical Association, Survey Research Methods Section. Washington, DC: American Statistical Association. Bradburn, N.M., and S. Sudman 1979 Improving Interview Method and Questionnaire Design. San Francisco: Jossey-Bass. Burton, S., and E. Blair 1991 Task conditions, response formulation processes, and response accuracy for behavioral frequency questions in surveys. Public Opinion Quarterly 55:50-79. Cannell, C., F.J. Fowler, and K. Marquis 1968 The influence of interviewer and respondent psychological and behavioral variables on the reporting in household interviews. Vital and Health Statistics 2:26. Cannell, C., P. Miller, and L. Oksenberg 1981 Research on interviewing techniques. Pp. 389-437 in Sociological Methodology, S. Leinhardt ed. San Francisco: Jossey-Bass.
OCR for page 38
Measurement Problems in Criminal Justice Research: Workshop Summary Clark, H.H., and E.F. Schaefer 1989 Contributing to discourse. Cognitive Science 13:259-294. Coker, A.L., and E.A. Stasny 1995 Adjusting the National Crime Victimization Survey’s Estimates of Rape and Domestic Violence for “Gag” Factors. Washington, DC: U.S. Department of Justice, National Institute of Justice. Dillman, D.A., R.D. Tortora, J. Conradt, and D. Bowker 1998 Influence of plain vs. fancy design on response rates for Web surveys. In Proceedings of the Survey Research Methods Section, American Statistical Association. Washington, DC: American Statistical Association. Fisher, B.S., and F.T. Cullen 2000 Measuring the Sexual Victimization of Women: Evolution, Current Controversies, and Future Research. Friedman, W.J. 1993 Memory for the time of past events. Psychological Bulletin 113:44-66. Grice, H.P. 1989 Studies in the Way of Words. Cambridge, MA: Harvard University Press. Hemenway, D. 1997 The myth of millions of annual self-defense gun uses: A case study of survey overestimates of rare events. Chance 10:6-10. Jobe, J.B., W.F. Pratt, R. Tourangeau, A. Baldwin, and K. Rasinski 1997 Effects of interview mode on sensitive questions in a fertility survey. Pp. 311-329 in Survey Measurement and Process Quality, L. Lyberg, P. Biemer, M. Collins, E. de Leeuw, C. Dippo, N. Schwarz, and D. Trewin, eds. New York: Wiley. Just, M.A., and P.A. Carpenter 1992 A capacity theory of comprehension. Psychological Review 99:122-149. Kiesler, S., and L. Sproull 1986 Response effects in the electronic survey. Public Opinion Quarterly 50:402-413. Kleck, G. 1991 Point Blank: Guns and Violence in America. Hawthorne, NY: Aldine de Gruyter. Kleck, G., and M. Gertz 1995 Armed resistance to crime: The prevalence and nature of self-defense with a gun. Journal of Criminal Law and Criminology 86:150-187. Koss, M. 1992 The underdetection of rape. Journal of Social Issues 48:63-75. 1993 Detecting the scope of rape: A review of prevalence research methods. Journal of Interpersonal Violence 8:198-222. 1996 The measurement of rape victimization in crime surveys. Criminal Justice and Behavior 23:55-69. Koss, M., C.A. Gidycz, and N. Wisniewski 1987 The scope of rape: Incidence and prevalence of sexual aggression and victimization in a national sample of higher education students. Journal of Consulting and Clinical Psychology 55:162-170. Krosnick, J.A. 1991 Response strategies for coping with the cognitive demands of attitude measures in surveys. Applied Cognitive Psychology 5:213-236.
OCR for page 39
Measurement Problems in Criminal Justice Research: Workshop Summary Krosnick, J.A., and D. Alwin 1987 An evaluation of a cognitive theory of response-order effects in survey measurement. Public Opinion Quarterly 51:201-219. Kurbat, M.A., S.K. Shevell, and L.J. Rips 1998 A year’s memories: The calendar effect in autobiographical recall. Memory and Cognition 26:532-552. Laumann, E., J. Gagnon, R. Michael, and S. Michaels 1994 The Social Organization of Sexuality: Sexual Practices in the United States. Chicago: University of Chicago Press. Lehnen, R.G., and W. Skogan 1981 The National Crime Survey: Working Papers. Volume 1: Current and Historical Perspectives. Washington, DC: Bureau of Justice Statistics. Lessler, J.T., and J.M. O’Reilly 1997 Mode of interview and reporting of sensitive issues: Design and implementation of audio computer-assisted self-interviewing. Pp. 366-382 in The Validity of Self-Reported Drug Use: Improving the Accuracy of Survey Estimates, L. Harrison and A. Hughes, eds. Rockville, MD: National Institute on Drug Abuse. Loftus, E.F., and W. Marburger 1983 Since the eruption of Mt. St. Helens, has anyone beaten you up? Improving the accuracy of retrospective reports with landmark events. Memory and Cognition 11:114-120. Lynch, J.P. 1996 Clarifying divergent estimates of rape from two national surveys. Public Opinion Quarterly 60:410-430. Martin, E., R.M. Groves, J. Matlin, and C. Miller 1986 Report on the Development of Alternative Screening Procedures for the National Crime Survey. Washington, DC: Bureau of Social Science Research. Mason, R., J. Carlson, and R. Tourangeau 1995 Contrast effects and subtraction in part-whole questions. Public Opinion Quarterly 58:569-578. McDowall, D., and B. Wiersema 1994 The incidence of defensive firearm use by U.S. crime victims: 1987 through 1990. American Journal of Public Health 84:1982-1984. McDowall, D., C. Loftin, and S. Presser 2000 Measuring civilian defensive firearm use: A methodological experiment. Journal of Quantitative Criminology 16:1-19. Means, B., G.E. Swan, J.B. Jobe, and J.L. Esposito 1994 An alternative approach to obtaining personal history data. Pp. 167-184 in Measurement Errors in Surveys, P. Biemer, R. Groves, L. Lyberg, N. Mathiowetz, and S. Sudman, eds. New York: Wiley. Moon, Y. 1998 Impression management in computer-based interviews: The effects of input modality, output modality, and distance. Public Opinion Quarterly 62:610-622. Mosher, W.D., and A.P. Duffer, Jr. 1994 Experiments in Survey Data Collection: The National Survey of Family Growth
OCR for page 40
Measurement Problems in Criminal Justice Research: Workshop Summary Pretest. Paper presented at the annual meeting of the Population Association of America, Miami, FL. Murphy, L. 1976 The Effects of the Attitude Supplement on NCS City Sample Victimization Data. Unpublished internal document, Bureau of the Census, Washington, DC. National Victims Center 1992 Rape in America: A Report to the Nation. Arlington, VA: National Victims Center. Neter, J., and J. Waksberg 1964 A study of response errors in expenditures data from household interviews. Journal of the American Statistical Association 59:17-55. Phipps, P., and A. Tupek 1990 Assessing Measurement Errors in a Touchtone Recognition Survey. Paper presented at the International Conference on Measurement Errors in Surveys, Tucson, AZ. Robinson, J.A. 1986 Temporal reference systems and autobiographical memory. Pp. 159-188 in Autobiographical Memory, D.C. Rubin, ed. Cambridge, England: Cambridge University Press. Russell, D.E.H. 1982 The prevalence and incidence of forcible rape and attempted rape of females. Victimology 7:81-93. Schaeffer, N.C. 1991 Conversation with a purpose—or conversation? Interaction in the standardized interview. Pp. 367-391 in Measurement Errors in Surveys, P.P. Biemer, R.M. Groves, L.E. Lyberg, N.A. Mathiowetz, and S. Sudman, eds. New York: Wiley. Schober, M. 1999 Making sense of questions: An interactional approach. Pp. 77-93 in Cognition and Survey Research, M.G. Sirken, D.J. Herrmann, S. Schechter, N. Schwarz, J.M. Tanur, and R. Tourangeau, eds. New York: Wiley. Schober, S., M.F. Caces, M. Pergamit, and L. Branden 1992 Effects of mode of administration on reporting of drug use in the National Longitudinal Survey. Pp. 267-276 in Survey Measurement of Drug Use: Methodological Studies, C. Turner, J. Lessler, and J. Gfroerer, eds. Rockville, MD: National Institute on Drug Abuse. Schwarz, N., and G.L. Clore 1983 Mood, misattribution, and judgments of well-being: Informative and directive functions of affective states. Journal of Personality and Social Psychology 45:513-523. Schwarz, N., H.J. Hippler, B. Deutsch, and F. Strack 1985 Response categories: Effects on behavioral reports and comparative judgments. Public Opinion Quarterly 49:388-395. Schwarz, N., B. Knauper, H.J. Hippler, E. Noelle-Neumann, and F. Clark 1991 Rating scales: Numeric values may change the meaning of scale labels. Public Opinion Quarterly 55:618-630.
OCR for page 41
Measurement Problems in Criminal Justice Research: Workshop Summary Schwarz, N., F. Strack, and H. Mai 1991 Assimilation and contrast effects in part-whole question sequences: A conversational logic analysis. Public Opinion Quarterly 55:3-23. Silver, B.D., P.R. Abramson, and B.A. Anderson 1986 The presence of others and overreporting of voting in American national elections. Public Opinion Quarterly 50:228-239. Sirken, M.G., D.J. Herrmann, S. Schechter, N. Schwarz, J. Tanur, and R. Tourangeau 1999 Cognition and Survey Research. New York: Wiley. Skogan, W. 1981 Issues in the Measurement of Victimization.Washington, DC: Bureau of Justice Statistics. Smith, T.W. 1997 The impact of the presence of others on a respondent’s answers to questions. International Journal of Public Opinion Research 9:33-47. Strack, F., N. Schwarz, and M. Wänke 1991 Semantic and pragmatic aspects of context effects in social and psychological research. Social Cognition 9:111-125. Suchman, L., and B. Jordan 1990 Interactional troubles in face-to-face survey interviews. Journal of the American Statistical Association 85:232-241. Sudman, S., A. Finn, and L. Lannom 1984 The use of bounded recall procedures in single interviews. Public Opinion Quarterly 48:520-524. Sudman, S., N. Bradburn, and N. Schwarz 1996 Thinking About Answers: The Application of Cognitive Processes to Survey Methodology. San Francisco: Jossey-Bass. Tangney, J.P., R.W. Miller, L. Flicker, and D.H. Barlow 1996 Are shame, guilt, and embarrassment distinct emotions? Journal of Personality and Social Psychology 70:1256-1269. Thompson, C.P., J.J. Skowronski, S.F. Larsen, and A.L. Betz 1996 Autobiographical Memory. Mahwah, NJ: Erlbaum. Tourangeau, R. 1999 Context effects on answers to attitude questions. Pp. 111-131 in Cognition and Survey Research, M.G. Sirken, D.J. Herrmann, S. Schechter, N. Schwarz, J. Tanur, and R. Tourangeau, eds. New York: Wiley. Tourangeau, R., and T.W. Smith 1996 Asking sensitive questions: The impact of data collection mode, question format, and question context. Public Opinion Quarterly 60:275-304. Tourangeau, R., K. Rasinski, and N. Bradburn 1991Measuring happiness in surveys: A test of the subtraction hypothesis. Public Opinion Quarterly 55:255-266. Tourangeau, R., L.J. Rips, and K.A. Rasinski 2000 The Psychology of Survey Response. New York: Cambridge University Press.
OCR for page 42
Measurement Problems in Criminal Justice Research: Workshop Summary Turner, C.F., J.T. Lessler, and J. Devore 1992 Effects of mode of administration and wording on reporting of drug use. Pp. 177-220 in Survey Measurement of Drug Use: Methodological Studies, C. Turner, J. Lessler, and J. Gfroerer, eds. Rockville, MD: National Institute on Drug Abuse. Turner, C.F., B.H. Forsyth, J.M. O’Reilly, P.C. Cooley, T.K. Smith, S.M. Rogers, and H.G. Miller 1998 Automated self-interviewing and the survey measurement of sensitive behaviors. In Computer-Assisted Survey Information Collection, M.P. Couper, R.P. Baker, J. Bethlehem, C.Z.F. Clark, J. Martin, W.L. Nicholls II, and J. O’Reilly, eds. New York: Wiley. Turner, C.F., L. Ku, S.M. Rogers, L.D. Lindberg, J.H. Pleck, and F.L. Sonenstein 1998 Adolescent sexual behavior, drug use, and violence: Increased reporting with computer survey technology. Science 280:867-873. Wagenaar, W.A. 1986 My memory: A study of autobiographical memory over six years. Cognitive Psychology 18:225-252. Williams, M.D., and J.D. Hollan 1981 The process of retrieval from very long-term memory. Cognitive Science 5:87-119.
Representative terms from entire chapter: