Measuring Crime and Crime Victimization: Methodological Issues
Roger Tourangeau and Madeline E. McNeeley
All surveys face measurement challenges, but few topics raise problems of the variety or seriousness of those involved in measuring crime and crime victimization. As Skogan (1981) points out in his thoughtful monograph, Issues in the Measurement of Victimization, the nature of crime and crime victimization adds wrinkles to virtually every standard source of error in surveys. For example, even in our relatively crime-ridden times, crimes remain a rare event and, as a result, survey estimates are subject to large sampling errors. One national survey (National Victims Center, 1992) estimated that 0.7 percent of American women had experienced a completed rape during the prior year. This estimate was based on a sample of 3,220 responding women, implying that the estimate reflected positive answers to the relevant survey items from about 23 women. Skogan details the large margins of sampling error for many key estimates from the National Crime Survey (NCS), a survey that used a very large sample (and which later evolved into the National Crime Victimization Survey).
The clandestine nature of many crimes means that the victim may be unable to provide key details about the victimization and may not even be aware that a crime has been committed at all. Certain incidents that are supposed to be reported in an interview may seem irrelevant to respondents, since they do not think of these incidents as involving crimes. For example, victims of domestic violence or of sexual harassment may not think of these as discrete criminal incidents but as chronic family or interpersonal problems. It may be difficult to prompt the recall of such inci
dents with the short, concrete items typically used in surveys (for a fuller discussion, see Skogan, 1981:7-10).
NATIONAL CRIME VICTIMIZATION SURVEY
The National Crime Survey and its successor, the National Crime Victimization Survey (NCVS), underwent lengthy development periods featuring record check studies and split-ballot experiments to determine the best way to measure crime victimization. In the records check studies, the samples included known crime victims selected from police records. In survey parlance, these were studies of reverse records check—the records had been “checked” before the survey reports were ever elicited. The studies were done in Washington, D.C., Akron, Cleveland, Dayton, San Jose, and Baltimore (see Lehnen and Skogan, 1981, for a summary). A key objective of these early studies was to determine the best length for the reporting period for a survey, balancing the need to increase the number of crime reports with the need to reduce memory errors.
A second wave of studies informing the NCS design was carried out in the early 1980s by researchers at the Bureau of Social Science Research and the Survey Research Center at the University of Michigan (summarized by Martin et al., 1986). This second wave of developmental studies mainly involved split-ballot comparisons (in which random portions of the sample were assigned to different versions of the questionnaire) focusing on the “screening” items, in which respondents first indicate they have relevant incidents to report. Some of these studies were inspired by a conference (described in Biderman, 1980) that brought cognitive psychologists and survey researchers together to examine the memory issues raised by the NCS. Unfortunately, some of the most intriguing findings from the resulting experiments were never published and are buried in hard-to-find memoranda.
For several reasons, the NCVS results are widely used as benchmarks to which statistics from other surveys on crime and crime victimization are compared. Conducted by the Bureau of the Census, the NCVS is the largest and oldest of the crime victimization studies. It uses a rotating panel design in which respondents are interviewed several times before they are “retired” from the sample, a design that greatly improves the precision of sample estimates. It uses a relatively short, six-month reporting period and “bounded” interviewing, in which respondents are instructed to report only incidents that have occurred since the previous interview and are reminded
of the incidents they reported then. (Results of the first interview, which is necessarily unbounded, are discarded.) The initial interview is done face to face to ensure maximum coverage of the population; if necessary, subsequent interviews are also conducted in person.
Examples of Measurement Problems
Despite these impressive design features and the large body of methodological work that shaped it, the NCVS is not without its critics. Two recent controversies illustrate the problems of the NCVS and of crime surveys more generally. One controversy centers on the number of incidents of defensive gun use in the United States; the other concerns the number of incidents of rape. In both cases, seemingly similar surveys yield widely discrepant results; the ensuing methodological controversies point to unresolved issues in how to collect data on crime, gun use, and crime victimization in surveys.
Defensive Gun Use
In 1994, McDowall and Wiersema published an estimate of the number of incidents over a four-year period in which potential crime victims had used guns to protect themselves during an actual or attempted crime. Their estimate was based on data from the NCVS, which gathers information about several classes of crime—rape, assault, burglary, personal and household larceny, and car theft. When respondents report an incident in which they were victimized, they are asked several follow-up questions, including, “Was there anything you did or tried to do about the incident while it was going on?” and, “Did you do anything (else) with the idea of protecting yourself or your property while the incident was going on?” Responses to these follow-up probes are coded into a number of categories, several of which capture defensive gun use (e.g., “attacked offender with gun”). The key estimates McDowall and Wiersema presented were that between 1987 and 1990 there were some 260,000 incidents of defensive gun use in the United States, roughly 65,000 per year. Although big numbers, they pale by comparison with the total number of crimes reported during the same period—guns were used defensively in fewer than one in 500 victimizations reported in the NCVS; moreover, criminal offenders were armed about 10 times more often than their victims. These are just the sort of statistics dear to gun control advocates. McDowall and
Wiersema conclude that “criminals face little threat from armed citizens” (1994:1984).
McDowall and Wiersema note, however, that their estimates of defensive gun use differ markedly from those based on an earlier survey by Kleck (1991; see also Kleck and Gertz, 1995). Kleck’s results indicated 800,000 to 1 million incidents of defensive gun use annually. These numbers were derived from a national telephone survey of 1,228 registered voters who were asked: “Within the past five years have you, yourself, or another member of your household used a handgun, even if it was not fired, for selfprotection or for the protection of property at home, work, or elsewhere, excluding military service and police security work?”
There are so many differences between the Kleck survey and the NCVS that it should come as no surprise that the results do not line up very well. The two surveys covered different populations (the civilian noninstitutional population in the NCVS versus registered voters with a telephone in the Kleck survey), interviewed respondents by different methods (in-person versus telephone), covered different recall periods (six months in NCVS versus five years in the Kleck study), and asked their respondents markedly different questions. The NCVS uses a bounded interview, the Kleck survey an unbounded interview. Still, the difference between 65,000 incidents a year and some 900,000 is quite dramatic and would seem to demand a less mundane explanation than one involving routine methodological differences. A later telephone survey by Kleck and Gertz (1995) yielded an even higher estimate—2.5 million incidents of defensive gun use.
McDowall and Wiersema (1994) cite two other possible explanations of the differences between the results of the NCVS and the earlier Kleck study. First, they note that Kleck’s estimates rest on the reports of a mere 49 respondents. (The later Kleck and Gertz estimates also rest on a similarly small base of positive reports—66 out of nearly 5,000 completed interviews.) Even a few mistaken respondents could have a large impact on the results. In addition, the Kleck item covers a much broader range of situations than does the NCVS. The NCVS excludes preemptive use of firearms (e.g., motorists who keep a gun in their car “for protection” but never take it out of the glove compartment), focusing more narrowly on gun use during actual or attempted crimes. It is possible that much of the disparity between the NCVS estimates and those derived from the two Kleck studies reflects the broader net cast in the latter surveys.
A methodological experiment by McDowall, Loftin, and Presser (2000) compared questions modeled on the ones used in the NCVS with ones like
those used in the Kleck surveys. Respondents were asked both sets of questions—both written to cover a one-year recall period—and the experiment varied which ones came first in the interview. The sample included 3,006 respondents, selected from a list of likely gun owners. Overall, the Kleck items yielded three times more reports of defensive gun use than the NCVS-style items. What was particularly interesting in the results was that the two sets of items appeared to yield virtually nonoverlapping sets of incidents; of the 89 reports of defensive gun use, only 9 were mentioned in response to both sets of items.
Prevalence of Rape
An even more disparate set of figures surrounds the issue of the number of women in the United States who have been the victim of attempted or completed rapes. Once again, the studies from which the estimates are drawn differ in many crucial particulars—they sample different populations, ask different questions that are based on different definitions of rape, conduct data collection via different methods, and cover different recall periods. As with the estimates of defensive gun use, what is surprising is not that the estimates differ from each other but that they differ so widely.
Several studies converge on the estimate that about one-quarter of American women have been victims of completed or attempted rape at some time in their lives (see, for example, Koss, 1993:Table 1). Most of these figures do not accord well with the rape estimates from the NCVS; the NCVS covers a more limited period—six months—and does not produce estimates of lifetime victimization. But the NCVS’s annual estimates—for example, fewer than 1 woman or girl in 1,000 experienced a rape or attempted rape in 1992 (see Koss, 1996)—imply that rape is much less common than indicated by most of the other surveys. Koss (1992, 1993, 1996) has been an energetic critic of the NCVS procedures for assessing the prevalence of rape, but at least two other papers have presented careful comparisons between the NCVS procedures and those of other surveys (Fisher and Cullen, 2000; Lynch, 1996) and support Koss’s contention that methodological details matter a great deal in assessing the prevalence of rape victimization.
Lynch, for example, reports about a twofold difference between annual estimates for 1992 from the NCVS and the National Women’s Study (NWS) for the previous year. For 1992 the NCVS estimated 355,000 incidents of rape; by contrast, the NWS estimated that 680,000 women
were rape victims. (The NCVS figure translates into fewer than 355,000 victims since the same person may have experienced multiple victimizations.) Lynch explores a number of differences between the two studies, including:
the age range covered by the surveys (18 and older for the NWS, 12 and older for the NCVS);
the sample sizes (4,008 for the NWS, 100,000+ for the NCVS);
the length of the recall period (one year for the NWS, six months for the NCVS);
the schedule of interviewing (the NWS estimates are based on data from the second interview from a three-wave longitudinal study, whereas the NCVS estimates reflect data from the second through last interviews from a seven-wave panel); and
the questions used (brief yes-no items in the NWS versus detailed incident reports in the NCVS).
Despite the methodological differences between the two surveys, the difference between the two estimates is probably not significant. The estimates from both surveys have large standard errors (approximately 190,000 for the NWS estimate and approximately 32,000 for the NCVS), and the standard error of the difference is on the order of 200,000.
One major difference between the NCVS and most of the other surveys assessing the frequency of rape involves the basic strategy used to elicit reports about rapes and other crimes. The NCVS begins with a battery of yes-no items designed to prompt reports about a broad array of completed or attempted crimes. Only one of these initial screening items directly mentions rape (“Has anyone attacked or threatened you in any of these ways…any rape, attempted rape, or other type of sexual attack?”), although several other questions concern actual or threatened violence. Once the respondent completes these initial screening items, further questions gather more detailed information about each incident; the final classification of an incident in the NCVS reflects these detailed reports rather than the answers to the initial screening questions. Most of the other surveys on rape differ from this procedure in two key ways—first, they ask multiple screening questions specifically crafted to elicit reports about rape and, second, they omit the detailed follow-up questions. For example, a survey by Koss, Gidycz, and Wisniewski (1987) included five items designed to elicit reports of attempted or completed rape. The items are quite specific. For
example, one asks, “Have you had a man attempt sexual intercourse (get on top of you, attempt to insert his penis) when you didn’t want to by threatening or using some degree of force (twisting your arm, holding you down, etc.) but intercourse did not occur?” The NWS adopted this same approach, employing five quite explicit items to elicit reports of attempted or completed rape.
There is little doubt that including multiple concrete items will clarify the exact concepts involved and prompt fuller recall. Multiple items provide more memory cues and probably trigger more attempts at retrieval; both the added cues and the added time on task are likely to improve recall (Bradburn and Sudman, 1979; Burton and Blair, 1991; Cannell et al., 1981; Means et al., 1994; Wagenaar, 1986; Williams and Hollan, 1981). The NCVS is a general-purpose crime survey, and its probes cover a broad array of crimes. The NWS and the Koss surveys use much more detailed probes that focus on a narrower range of crimes. At the same time, the absence of detailed information about each incident could easily lead to classification errors.
A study by Fisher and Cullen (2000) included both yes-no screening items of the type used by Koss and colleagues, the NWS, and many other studies of rape and the more detailed questions about each incident featured by the NCVS. They compared responses to the screening questions with the final classifications of the incidents based on the detailed reports. There were twice as many positive answers to the rape screening questions as there were incidents ultimately classified as rapes based on the detailed reports. (The rape screening items also captured many incidents involving some other type of sexual victimization.) In addition, some incidents classified as rapes on the basis of the detailed information were initially elicited by screening items designed to tap other forms of sexual victimization. The results suggest that, even when the wording of screening items is quite explicit, respondents can still misclassify incidents.
Factors Affecting Reporting in Crime Surveys
Many surveys on sensitive subjects adopt methods primarily designed to reduce underreporting—that is, the omission of events that should, in principle, be reported. And it is certainly plausible that women would be reluctant to report extremely painful and personal incidents such as attempted or completed rapes. Even with less sensitive topics, such as bur
glary or car theft, a variety of processes—lack of awareness that a crime has been committed, forgetting, unwillingness to work hard at answering— can lead to systematic underreporting. There are also reasons to believe that crime surveys, like other surveys that depend on recall, may be prone to errors in the opposite direction as well. Because crime is a relatively rare event, most respondents are not in the position to omit eligible incidents; they do not have any to report. The vast majority of respondents can only overreport defensive gun use, rapes, or crime victimization more generally.
In his discussion of the controversy over estimates of defensive gun use, Hemenway (1997) makes the same point. All survey questions are prone to errors, including essentially random reporting errors. For the moment, let us accept the view that 1 percent of all adults used a gun to defend themselves against a crime over the past year. If the sample accurately reflects this underlying distribution, then only 1 percent of respondents are in the position to underreport defensive gun use; the remaining 99 percent can only overreport it. Even if we suppose that an underreport is, say, 10 times more likely than an overreport, the overwhelming majority of errors will still be in the direction of overreporting. If, for example, one out of every four respondents who actually used a gun to defend himself denies it while only 1 in 40 respondents who did not use a gun in self-defense claim in error to have done so, the resulting estimate will nonetheless be sharply biased upward (1% × 75% + 99% × 2.5% = 3.25%). It is not hard to imagine an error rate of the magnitude of 1 in 40 arising from respondent inattention, misunderstanding of the questions, interviewer errors in recording the answers, and other essentially random factors. Even the simplest survey items—for instance, those asking about sex and age— yield less than perfectly reliable answers. Random errors can, in the aggregate, yield systematic biases when most of the respondents are in the position to make errors in only one direction.
Aside from sheer unreliability, though, reporting in crime surveys may be affected by several systematic factors that can introduce additional distortions of their own. We focus on two of these systematic factors here. First, we address the potentially sensitive nature of the questions on many crime surveys and the impact of the mode of data collection on the answers to such questions. This is followed by an examination of the effects of the context in which survey items are presented, including the physical setting of the interview, the perceived purpose and sponsorship of the study, and prior questions in the interview.
IMPACT OF THE MODE OF DATA COLLECTION
Most of the surveys that have produced the widely varying estimates of defensive gun use and rape incidence use some form of interviewer administration of the questions. For example, Koss (1993) lists 20 surveys on sexual victimization of women; only 4 (all of them involving local samples from individual communities) appear to use self-administered questionnaires. The remainder rely on interviewers to collect the data, in either face-to-face or telephone interviews. (The NCVS uses both methods; the initial interview is done face to face, but later interviews are, to the extent possible, done by telephone.) The last decade has seen dramatic changes in the methods used to collect survey data, including the introduction of several new methods of computerized self-administration. For example, the National Household Survey of Drug Abuse, a large survey sponsored by the Substance Abuse and Mental Health Services Administration, has adopted audio computer-assisted self-interviewing (ACASI). With ACASI a computer simultaneously displays the item on screen and plays a recording of it to the survey respondent via earphones. The respondent enters an answer directly into the computer using the keypad. Other new methods for administering questions include computer-assisted self-interviewing without the audio (CASI), e-mail surveys, telephone ACASI, and World Wide Web surveys.
Two trends have spurred the development and rapid adoption of new methods of computerized self-administration of surveys. First, various technological changes—such as the introduction of lighter, more powerful laptop computers, development of the World Wide Web, widespread adoption of e-mail, and improvements in sound card technology—have made the new methods possible. Second, the need for survey data on sensitive topics, such as illicit drug use and sexual behaviors related to the spread of AIDS, has made the new methods highly desirable, since they combine the privacy of self-administration with the power and flexibility of computer administration. Widespread interest in the new methods has spurred survey methodologists to reexamine the value of self-administration for collecting survey data on sensitive topics.
Gains from Self-Administration
There is strong evidence to support the value of self-administration for eliciting reports about sensitive behaviors. To illustrate the gains from self-
administration, Figure 2-1 plots the ratio between the level of illicit drug use reported when survey questions are self-administered to the level reported when interviewers administer the questions. For example, if 6 percent of respondents report using cocaine during the previous year under self-administration but only 4 percent report using cocaine under interviewer administration, the ratio would be 1:5. The data are from two of the largest mode comparisons done to date: one carried out by Turner, Lessler, and Devore (1992) and the other by Schober, Caces, Pergamit, and Branden (1992). The ratios range from a little over 1:0 to as high as 2:5— which is to say that self-administration more than doubled the reported rate of illicit drug use.
Tourangeau, Rips, and Rasinski (2000:Table 10.2) reviewed a number of similar mode comparisons; they found a median increase of 30 percent in the reported prevalence of marijuana use with self-administration and similar gains for cocaine use. They also summarize the evidence that self-administration improves reporting about other sensitive topics, including sexual partners, abortion, smoking, and church attendance.
Mode, Privacy, and the Presence of Third Parties
It is natural to think that at least some of the gains from self-administration result from the reduced risk of disclosure to other household mem
bers, but there is surprisingly little evidence that the presence of other household members has much effect on what respondents report during interviews. Interviews are often conducted under less than ideal conditions, and, although most survey organizations train their interviewers to try to find private settings for the interviews, other household members are often present. For example, Silver and colleagues examined the proportion of interviews done for the American National Election Studies (ANES) in which other household members were present. The ANES is a series of surveys funded by the National Science Foundation and carried out by the University of Michigan’s Survey Research Center. The proportion varied somewhat from one survey to the next, but roughly half of all interviews conducted between 1966 and 1982 were done in the presence of another household member (Silver et al., 1986). Similarly, Martin and colleagues (1986) noted that some 58 percent of NCS interviews were conducted within earshot of someone other than the interviewer and respondent (for a more recent estimate, see Coker and Stasny, 1995).
Silver and colleagues looked at whether the presence of other people during interviews affected the overreporting of voting. In many jurisdictions, whether someone voted is a matter of public record, so it is a relatively easy matter to determine the accuracy of reports about voting. Voting is a socially desirable behavior, and many nonvoters—roughly a quarter, according to Silver and company—nonetheless report that they voted during the most recent election. What is somewhat surprising is that the rate of overreporting did not vary as a function of the privacy of the interview. A number of other national surveys have also recorded whether other people are present during the interviews, but researchers who have examined these data have found little evidence that the presence of others affects reports about such potentially sensitive topics as sexual behavior (Laumann et al., 1994) or illicit drug use (Schober et al., 1992; Turner et al., 1992). Smith’s (1997) review concludes that in general the effects of the presence of third parties during interviews are minimal.
There are several possible explanations for the absence of third-party effects. As Martin and colleagues note, other household members may remember relevant incidents that a respondent has forgotten, offsetting any inhibiting effect their presence has. When another household member already knows the sensitive information, his or her presence may make it more difficult for the respondent to withhold it from the interviewer. In addition, interviewers are probably more likely to do interviews with other
household members who are around when the respondent seems unconcerned about their presence.
Few of the studies examining the impact of third parties have used experimental designs that systematically varied the privacy of the interview. Still, in crime surveys one would expect the presence of family members to have an impact, particularly on reports involving domestic violence; similarly, in surveys on rape the presence of family members is likely to inhibit reports of spousal rape, if not rape in general. In fact, there is some recent evidence that the presence of a spouse during an interview is associated with reduced reporting of rape and domestic violence (Coker and Stasny, 1995). Since more than half of the NCVS interviews are conducted with at least one other person present, it is likely that estimates of certain victimizations are affected by the presence of a third party.
Variations Across Methods of Self-Administration
The lack of evidence of third-party effects—in contrast to the large and consistent effects of self-administration—suggests that respondents are less concerned about the reactions of other household members than about those of the interviewer. Tourangeau, Rips, and Rasinski (2000) argue that what survey respondents worry about is that they will be embarrassed during an interview; the prototypical embarrassment situation involves disclosure to strangers rather than friends or family members (Tangney et al., 1996). Despite their efforts to create rapport with respondents, when interviewers ask the questions and record the answers, it raises the specter they will react negatively to what the respondent tells them and embarrass the respondent. That risk is reduced when respondents can answer on paper or interact directly with computers.
It is possible that some of the newer computer-assisted forms of self-administration confer an even greater sense of privacy than traditional paper-and-pencil self-administered questionnaires (SAQs). Turner and colleagues compared data collected via ACASI with data from a traditional SAQ; their experiment involved random subgroups of participants from the National Survey of Adolescent Males (Turner et al., 1998). On some of the most sensitive questions (such as one asking about receptive oral sex), ACASI yielded four times as many reports as the paper SAQ (see Figure 2-2). Tourangeau and Smith (1996) also found increased reporting for some items under ACASI relative to CASI.
What all of these findings suggest is that crime reports—especially reports about victimizations involving crimes that carry stigma—could be dramatically affected by the mode of data collection. Self-administration of the questions is likely to increase the number of rape victimizations reported; on the other hand, it may sharply reduce reports of defensive gun use, since defensive gun use is likely to be seen as a positive or socially desirable response to crime (as Hemenway, 1997, argues).
IMPACT OF CONTEXT
The context of a survey question encompasses a wide range of potential influences, including the perceived climate of opinion as the survey is being done, the auspices under which the survey is conducted, the purpose of the survey (as it is presented to the respondent), the topics preceding a given question, characteristics of the interviewers related to the survey topic, the physical setting for the interview, and even the weather (see Schwarz and Clore, 1983). Most of the research on context effects has focused more narrowly on the impact of prior items on answers to later questions, but even under this restricted definition of survey context, context has a range of effects on subsequent items, altering the overall direction of answers to specific questions or changing the relationship among survey items.
Setting of the Interview
Partly because of concerns about the inhibiting effects of the presence of third parties during interviews, some survey researchers have experimented with changing the overall context by conducting survey interviews in settings outside respondents’ homes. For example, Jobe and colleagues did an experiment in which half of the sample of women in their study were interviewed at a neutral site and the other half were interviewed at home (Jobe et al., 1997). The interview touched on a number of sensitive topics, including sexual partners, sexually transmitted diseases, illicit drug use, and abortion. No consistent impact on reporting by interview site was found. A second study by Mosher and Duffer (1994; see also Lessler and O’Reilly, 1997) found an increase in the number of abortions reported when interviews were conducted outside the home; unfortunately, the outside-the-home group showed no significant increase over an in-home group given the same incentive to take part in the study.
Still, a couple of more recent studies suggest that answers to survey questions can be affected by the physical setting in which the data are collected. In one study (Beebe et al., 1998), students were more likely to admit illicit drug use, fighting, and other sensitive behaviors on a paper SAQ than on a computer-administered version of the same questions. Both versions were administered in a school setting. As noted earlier, computerized self-administration generally preserves or increases the gains from paper self-administration, but Beebe and colleagues argue that various features of the survey setting can reduce the sense of privacy that computers usually confer and, as a result, affect the answers. In their study the computerized questions were administered in the school’s computer lab on networked computers via terminals that were located next to one another. Any of these features may have reduced the apparent privacy of the data collection process.
In another study (Moon, 1998), respondents were sensitive to the description of the computer that administered the questions. When the respondents thought a distant computer was administering the questions to them over a network, they gave more socially desirable answers than when they thought the computer directly in front of them was administering the questions. Even when respondents do not have to interact with an interviewer, the characteristics of the data collection setting can enhance or reduce the sense of privacy and affect the answers that respondents give.
Context and the Inferred Purpose of the Question
Survey items, like everyday utterances, may convey a lot more to the respondent than they literally state. Consider this item from the General Social Survey (GSS), a national attitude survey carried out by the National Opinion Research Center since 1972: “Are there any situations you can imagine in which you would approve of a policeman striking an adult male citizen?” Taken literally, the item invites a positive answer. After all, anyone who has ever seen a cop show on television would have little trouble conjuring up situations in which a police officer would be fully justified in using force—say, in subduing an escaping criminal. Yet in recent years, fewer than 10 percent of respondents have answered “yes” to this GSS question. Clearly, respondents—a national cross-section of adults in the United States—do not take the question literally but base their answers on their reading of the intent behind the question. They seem to infer that the item is intended to tap their feelings about police brutality or police violence, not to test their imaginative powers. In the same way, in everyday life, what is literally a question may in fact convey a request (“Can you pass the salt?”) or a command (“Will you close the door on your way out?”). Part of the task of the survey respondent is to determine the intent of the question, to ferret out the information it is really seeking.
Interviews as Conversations
Unfortunately, in the process of interpreting survey questions, respondents often draw unintended inferences about their meaning. They draw these incorrect inferences because they carry over into the survey interview interpretive habits developed over a lifetime in other settings, such as conversations (Schaeffer, 1991). In a conversation, usage is often loose—words and phrases may be used in a nonstandard way or intended nonliterally— but little harm is done because the participants can interact freely to make sure they understand each other. Speakers pause to be sure their listeners are still with them; listeners use phrases like “okay” and “uh-huh” to signal that they are following (Clark and Schaefer, 1989; Schober, 1999). The resources available in conversation to forestall or repair misunderstandings are sharply curtailed in survey interviews, which are typically standardized to eliminate much of this unscripted interaction (Suchman and Jordan, 1990).
Based on their experiences in other settings, respondents may have
incorrect expectations about survey interviews. In a conversation a question such as, “Has anyone attacked or threatened you in the last year?” might be intended simply to determine the general topic for the next few exchanges. It would be quite reasonable for the listener to respond, “Well, a few years ago someone threw a beer bottle at my car.” The incident may not really constitute an attack and it falls outside the time frame set by the question, but it is in the ballpark, which is all that is required for everyday conversation. In a survey, though, such an item is generally intended as a request for exact and accurate information; “last year” is typically intended to refer to the date exactly one year ago. Many survey respondents do not seem to realize that surveys are actually seeking this sort of precise information (Cannell et al., 1968).
Grice’s Conversational Maxims
Conversations are partly guided by what the philosopher Paul Grice called the cooperative principle—the unstated assumption that the participants in a conversation are trying to make their contributions useful. Grice (1989) distinguished four specific manifestations of the cooperative principle—four conversational maxims that shape everyday conversations and serve as a kind of etiquette for them:
the maxim of quantity, which requires us to make our contribution as informative as necessary, but not more informative;
the maxim of quality, which says to tell the truth and to avoid statements we cannot support;
the maxim of relation, which enjoins us to stick to the topic; and
the maxim of manner, which tells us to be clear and to avoid obscurity.
When a speaker seems to violate one of these norms, the listener generally cuts him or her some interpretive slack, drawing the inferences that are needed to continue to see the speaker as cooperative. For example, if the speaker seems to have changed the topic, the listener will assume there is some connection with what came before and will attempt to identify the thread.
Surveys violate the Gricean maxims all the time. They flit from one subject to another without any of the transitional devices that would mark a change of subject in ordinary conversation. Sometimes respondents make
the adjustment and behave as though Grice’s maxims do not apply to surveys, but researchers have marshaled a large of number of examples in which survey respondents apparently apply the Gricean maxims anyway, drawing unintended inferences from incidental features of the survey questions or their order. A study by Schwarz and colleagues illustrates the problem (Schwarz et al., 1991). The study compared two items asking respondents how successful they had been in life; the only difference between the items was in the labeling of the scale points. One group got an 11-point scale in which the scale values ranged from 0 to 10; the other group got scale values that ranged from -5 to +5. These values were displayed to the respondents in a show card. Figure 2-3 presents the key results—a sharp change in the distribution of the answers depending on the numerical labels on the scale points. Respondents who got the scale labeled with negative values shied away from that end of the scale. If “0” conveys essentially no success in life, “-5” suggests catastrophic failure.
Respondents may also draw unintended inferences based on other features of the questions or their order. When the questionnaire juxtaposes two similar items (“How much did you like the food on your most recent visit to McDonald’s?” “Overall, how satisfied were you with your meal at McDonald’s?”), respondents may think they are intended to convey quite different questions (Strack et al., 1991). After all, according to the maxim
of quantity, each contribution to an ongoing conversation is supposed to convey new information. As a result, the correlation between the two items can be sharply reduced as compared to when they are separated in the interview (see also Schwarz et al., 1991; Tourangeau et al., 1991). Surveys often use closed questions that offer respondents a fixed set of answer categories. Respondents may infer that the options offered by the researchers reflect their knowledge about the distribution of the answers; the middle option, they may reason, must represent the typical answer (Schwarz et al., 1985). Tourangeau and Smith (1996) found that the ranges used in the answer categories even affected respondents’ reports about the number of sexual partners they had in the past five years.
Cooperativeness, Satisficing, and Survey Auspices
Respondents interpret the questions in surveys not necessarily in the way the surveys’ designers had hoped. They sometimes lean on relatively subtle cues—the numbers assigned to the scale points, the order of the questions, the arrangement of the response categories—to make inferences about the intent behind the questions and about their job as respondents. Interpretive habits acquired over a lifetime are applied—not always appropriately—to the task of understanding and answering survey questions. A key point left out of the discussions of the defensive gun use and rape prevalence controversies is how the presentation of a survey to the respondents may have affected their understanding of the survey task and their interpretation of specific questions.
Over the past 15 years or so, survey researchers have begun to apply findings from the cognitive sciences, particularly cognitive psychology, in a systematic program to understand reporting errors in surveys (see Sirken et al., 1999; Sudman et al., 1996; Tourangeau et al., 2000, for recent reviews of this work). One theme that clearly emerges from this literature is that respondents take whatever shortcuts they can to reduce the cognitive effort needed to answer the questions (see, for example, Krosnick and Alwin, 1987; Krosnick, 1991). There are probably several reasons for this.
In the first place, as Cannell, Fowler, and Marquis (1968) noted more than 30 years ago, respondents may simply not realize they are supposed to work hard and provide exact, accurate information. One of the reasons that bounding procedures, like the one used in the NCVS, are thought to be effective is that they convey to respondents the need for precision (cf. Biderman and Cantor, 1984). Some of the gains from bounding can be
realized simply by using an exact date in the question (e.g., “Since June 4 …” as opposed to “During the past month…”; see Loftus and Marburger, 1983) or by dividing the recall period into shorter periods and asking about each one separately (Sudman et al., 1984). Each of these methods of defining the recall period implicitly conveys the need for precision. In everyday conversation, of course, participants typically adopt a much looser criterion in framing answers to questions.
Even if respondents do realize they are supposed to be exact and accurate, they may be unwilling or unable to oblige. Consider the questions used by Kleck and colleagues, cited earlier, to measure defensive gun use. The key question asks about a rather long time period—five years. It covers both the respondents themselves and other members of their households; it lists several possible relevant scenarios (“for self-protection or for the protection of property at home, work, or elsewhere”); and it notes several exclusions (incidents involving “military service and police security work”). The demands of interpreting this item, mentally implementing its complicated logical requirements, and searching memory for relevant incidents over such a long period are likely to exceed the working memory capacity of many respondents, even well-motivated ones (Just and Carpenter, 1992).
It is not hard to believe that many respondents do not take the question literally. After all, taking the question literally might entail putting the telephone down and canvassing other household members; just reconstructing what you were doing five years ago might require more thought and a longer pause than is usually considered appropriate for a telephone conversation. Chances are that many respondents opt out of the literal requirements of the question and adopt a less exacting, “satisficing” criterion; they try to formulate an answer that is good enough to satisfy the spirit of the question rather the letter. Of course, a less than perfect answer will not always yield a false positive response. Some respondents may be unaware of or overlook gun use by other household members. But as we argued earlier, even random errors are likely to yield an upward bias in this case because, according to all the estimates, gun use is still very rare.
In determining the real intent behind survey questions, respondents may rely on a variety of cues, but the apparent purpose and auspices of a survey are likely to be among the most important. What are respondents likely to conclude about the intent of a survey like the NCVS? The study is conducted by the Bureau of the Census, the source of many important official statistics, on behalf of another federal agency, the Bureau of Justice
Statistics. The interviewers wear identification badges and carry an advance letter printed on government stationery that explains the purpose of the survey. All of these cues—even the very name of the survey—are likely to convey to the respondents that the topic is crimes, narrowly construed. Even when the questionnaire probes for incidents that might not ordinarily be seen as crimes—say, domestic violence or unwanted sex when the victim was drunk—respondents may not take the items literally, inferring a narrower scope to the questions than is actually intended. Respondents want to cooperate with the perceived demands of the interview, but they do not want to work hard at it. When construing the topic narrowly fits their impression of the intent of the study and allows them to get through the interview without working very hard, that is what most respondents are likely to do.
If the easy way to meet the apparent demands of the NCVS is to construe the questions narrowly, omitting borderline incidents, atypical victimizations, and incidents that may fall outside the time frame for the survey, respondents may adopt the opposite approach in many other surveys. For example, many of the rape surveys cited by Koss and by Fisher and Cullen (2000) are local in scope, involving a single college or community (see, e.g., Table 1 in Koss, 1993); they generally do not have federal sponsorship and are likely to appear rather informal to the respondents, at least as compared to the NCVS. Many of the surveys are not bounded and cover very long time periods (e.g., the respondent’s entire life). The names of these surveys (e.g., Russell, 1982, called her study the Sexual Victimization Survey; Koss’s questionnaire is called the Sexual Experiences Survey), their sponsorship, their informal trappings, their content (numerous items on sexual assault and abuse), and their long time frame are likely to induce quite a different mindset among respondents than that induced by the NCVS.
Many of the rape studies seem to invite a positive response; indeed, their designs seem predicated on the assumption that rape is generally underreported. It seems likely that many respondents in these surveys infer that the intent is to broadly document female victimizations, even though the items used are very explicit. The surveys and the respondents both seem to cast a wide net. When Fisher and Cullen (2000) compared detailed reports about incidents with responses to the rape screening items in the National Violence Against College Women Study, they classified only about a quarter of the incidents mentioned in response to the rape screening items as actually involving rapes. (Additional incidents that qualified as
rapes were reported in response to some of the other screening items as well.) Respondents want to help; they have volunteered to take part in the survey and are probably generally sympathetic to the aims of the survey sponsors. When being helpful seems to require reporting relevant incidents, they report whatever events seem most relevant, even if they do not quite meet the literal demands of the question. When the surveys do not include detailed follow-up items, there is no way to weed out reports that meet the perceived intent but not the literal meaning of the questions.
Of course, we do not know how the sponsorship and other trappings of these surveys affect reporting, but it seems to be well worth exploring. Other aspects of survey context are known to have a large impact on reporting; it would be easy to find out how variations in the way a survey is presented to respondents affect their perceptions of the task and the answers they ultimately provide.
Impact of Prior Questions
The presentation of a survey—in the survey introduction, the advance letter, even the name of the study—can shape respondents’ understanding of their task and their interpretation of individual questions in the interview. Prior questions in the survey can have a similar impact, affecting what respondents think about as they answer later questions and how they understand the questions (see Tourangeau, 1999, for a review). The impact of prior items on answers to later ones reflects several distinct mechanisms, two of which are especially relevant here: The earlier items sometimes provide an interpretive context for later questions, and sometimes they trigger the recall of information useful in answering later items.
According to Grice’s maxim of relation, the participants in a conversation are supposed to stick to the topic; they are not supposed to shift gears without giving proper warning. For the most part, survey questions follow these rules, signaling shifts in topics with introductions or transitional phrases (“The next few items are about…”). Relying on the maxim of relation, survey respondents may look to prior items to clarify the meaning of a new question. For example, a study by Strack and his colleagues (1991) asked German students if they would support an “educational contribution.” German universities are tuition free, and it was not clear from the question whether the educational contribution was to be given to the students in the form of financial aid or taken from them in the form of tuition. When the item came after a question on college tuition in the
United States, support for the “educational contribution” dropped; when it followed a question on government aid to students in Sweden, support increased. Respondents apparently looked to the previous question to decide what “educational contribution” meant. In this case the maxim of relation led respondents to see more similarity than they should have between adjacent items.
Alternatively, the maxim of quantity (“Be informative”) may lead them to make the opposite error—to exaggerate the differences among successive questions. For instance, in one study, respondents were asked to evaluate the economy in their state and the economy in their local communities (Mason et al., 1995). They tended to cite different reasons in explaining their evaluations of the local economy when that item came first than when it followed the question on the state economy. Apparently, they were reluctant to give the same rationale (“good job prospects” or “growth in local industries”) to explain both answers and so were forced to come up with new considerations to justify their views about the local economy.
Another way that prior items can affect answers to later questions is by reminding respondents of things they would not have otherwise recalled. The process of retrieving information from memory is partly deliberate and partly automatic. The deliberate part consists of generating incomplete descriptions of the incident in question. In the case of survey items, we might start with the description given in the question itself (“Let’s see, a time when I was attacked or threatened”) and supplement it with inferences or guesses about the sought-after incidents (“Didn’t something happen on Gough Street a couple of months ago?”). The automatic component consists of our unconsciously tracing links between associated ideas in memory; thinking about one incident makes it easier for us to retrieve other events that are associated with it.
In the jargon of cognitive psychology, “activation” spreads automatically from a single concept to related concepts (e.g., Anderson, 1983). This component is automatic in the sense that it operates without our willing it, indeed without our being aware of it. Because of the spread of activation, prior questions can set retrieval processes in motion that alter our answers to later questions. One example of this kind of context effect was apparently found during one of the developmental studies for the NCS. Respondents who had first answered a series of questions designed to assess their fear of crime reported more victimizations than their counterparts who answered the victimization items first (Murphy, 1976). The fear of crime items apparently facilitated recall of victimizations.
These effects of prior items on the interpretation and retrieval process for subsequent questions means that asking the “same” question in two different questionnaires will not necessarily yield the same answers. As Koss (1993) argues, the focus of the NCVS on criminal victimizations (along with other cues in that study) may promote a narrow interpretation of the type of incidents to be reported; in addition, the screening items in the NCVS may serve as relatively poor retrieval cues for incidents the respondents do not necessarily think of as crimes. On the other hand, some of the items used in the Sexual Experiences Survey (Koss et al., 1987) and later rape surveys—“Have you given in to sex play (fondling, kissing, or petting, but not intercourse) when you didn’t want to because you were overwhelmed by a man’s continual arguments and pressure?”—may help prompt fuller recall of more serious incidents but may also suggest that almost any unwanted sexual experience is of interest, encouraging overreporting.
Another procedure used in some crime surveys may also help frame later questions for respondents and trigger the recall of relevant events; this is the review of incidents reported in the previous round as part of the bounding procedure. The purpose of bounding is to prevent respondents from reporting incidents that actually occurred before the start of the recall period; this type of error is known as telescoping in the survey literature. Although more sophisticated theories of telescoping have been proposed, it mostly appears to reflect our poor memory for dates (see, for example, Baddeley et al., 1978; Thompson et al., 1996). Because telescoping errors are common, bounding can have a big effect on the level of reporting in a survey. In fact, Neter and Waksberg’s (1964) initial demonstration of the benefits of bounding indicated that about 40 percent of all household repairs and more than half of related expenditures were reported in error in unbounded interviews. Table 2-1 summarizes the results of the Neter and Waksberg (1964) study, along with a series of studies by Loftus and Marburger (1983) that explored alternative procedures for bounding the recall period.
Loftus and Marburger used several procedures to define the beginning of the recall period. They carried out two of their experiments exactly six months after the eruption of Mt. Saint Helens; in those studies, they compared answers to questions that began “Since the eruption of Mt. St. Helens
…” with parallel items that began “During the last six months….” The eruption of Mt. Saint Helens served as what Loftus and Marburger called a temporal landmark. In subsequent experiments, they used New Year’s Day to mark off the boundary of the recall period or asked respondents to generate their own personally significant landmark event near the beginning of the recall period. As Table 2-1 indicates, whether bounding takes the form of reviewing with the respondents what they already reported in the previous interview (as in Neter and Waksberg, 1964), providing them with a public landmark event, like the eruption of Mt. St. Helens or New Year’s Day (Loftus and Marburger, 1983), or asking them to generate their own personal landmark events (Loftus and Marburger, 1983:Experiment 3), bounding sharply reduces the level of reporting.
Bounding probably has several useful effects. First, as Biderman and Cantor (1984) noted, it helps communicate the importance of precision; just mentioning the specific date that marked the beginning of the recall period had a noticeable impact on the number of victimizations reported in Loftus and Marburger’s final experiment. It is quite likely that many respondents begin survey interviews thinking that it will be cooperative for them to mention incidents related to the topic of the interview, even if those incidents do not meet all the requirements set forth in the questions. Bounding procedures help alter this expectation. But the impact of bounding seems to go beyond its role in socializing the respondent to the task’s requirements. A variety of evidence suggests that people are much better at reconstructing the relative order of different events than recalling their exact dates (e.g., Friedman, 1993). Bounding converts the temporal judgment respondents have to make from an absolute one (Does the event fall in this period?) to a relative one (Did the event occur before or after the bounding event?); relative judgment is a lot more accurate.
Bounding procedures can serve still another function—both previously reported incidents and landmark events can serve as powerful retrieval cues. When people are asked to remember events from a given period, they tend to recall incidents that fall near temporal boundaries, such as the beginning of the school year or major holidays (Kurbat et al., 1998; Robinson, 1986). Major temporal periods are an important organizing principle for our autobiographical memories; if our memories were written autobiographies, they would be made up of chapters corresponding to each of the major periods of our lives. The boundaries that separate these periods are powerful retrieval cues. Similarly, the review of events reported in an earlier interview
TABLE 2-1 Impact of Bounding Procedures
Ratio of Events Reported: Unbounded over Bounded
Neter and Waksberg (1964)
Loftus and Marburger (1983)
Victim of theft
Victim of assault
New Year’s Day
New Year’s Day
may trigger the recall of similar or related incidents since then. Bounding procedures improve the accuracy of recall, helping respondents weed out ineligible events and remember eligible ones. As Table 2-1 suggests, the net effect can be dramatic.
CONCLUSIONS: SOURCES OF DIFFERENCES ACROSS SURVEYS
Most papers that examine discrepancies across surveys are limited to speculating about the sources of the differences in the results, and this paper is no exception. (McDowall et al., 2000, and Fisher and Cullen, 2000, are exceptions—they present evidence testing specific hypotheses about why different procedures gave different results.) Throughout, we have offered conjectures about the variables that affect reporting in crime surveys. In this final section we try to be a little more explicit about the variables we think are the key ones. One theme that runs through our discussion is that both overreporting and underreporting are possible; it simply cannot be
taken for granted that relatively rare events, like defensive gun use—or even very sensitive ones, like rape victimizations—will necessarily be underreported in surveys. Respondents can only make the errors it is logically possible for them to make; if most of them have not in fact experienced the target events, they can only overreport them. Moreover, as the work of Loftus reminds us, forgetting does not necessarily make us underreport events. Forgetting when something happened or what exactly took place can lead us to report events that do not really count. And the same cues that can help us remember an event can also encourage us to report incidents that do not meet the requirements of a survey’s questions.
A variable that has been relatively neglected in discussions of crime reporting has been the mode of data collection. There is strong evidence that self-administration produces fuller reporting of sensitive behaviors, sometimes dramatically so (as in Figure 2-1). Several new methods of computerized self-administration have become available over the past 10 years; these new methods have greatly extended the range of situations in which self-administration can be used and in some cases have sharply increased levels of reporting (e.g., see Figure 2-2). The new technologies can be used in conjunction with face-to-face (Tourangeau and Smith, 1996; Turner et al., 1998) or telephone interviews (Phipps and Tupek, 1990; Turner et al., 1998), or they can be used to administer stand-alone surveys via e-mail (e.g., Kiesler and Sproull, 1986) or the Internet (Dillman et al., 1998).
Our first hypothesis, then, is that self-administration will dramatically increase reports of some types of crime, particularly those that carry stigma and those perpetrated by other household members; self-administration will reduce reports of incidents that put the respondent in a favorable light, including perhaps defensive gun use. A related hypothesis involves the presence of other household members; for the topics raised in crime surveys, we believe that the presence of other household members must make a difference (at least for crimes involving domestic violence). Crime surveys may prove to be a notable exception to the rule that the presence of third parties during an interview does not have much effect on a respondent’s answers (see Coker and Stasny, 1995, for some evidence supporting this conjecture).
We also offer several hypotheses about the effects of the context of a survey, construing context broadly to include not only the previous items in the questionnaire but also the packaging of the survey to the respondent and the procedures used to bound the recall period. Our first hypothesis is that the apparent topic of the survey, the survey’s sponsorship, the organi
zation responsible for collecting the data, the letterhead used on advance letters, and similar procedural details will affect respondents’ views about the need for accuracy in reporting and about the type of incidents they are supposed to report. It is easy to imagine an experiment that administers the same questions to all respondents but varies the framing of the survey. Such an experiment would examine how reports were affected by the framing of the survey; in addition, it might also compare respondents’ judgments as to whether they were supposed to report hypothetical incidents in vignettes describing borderline cases. Our guess is that the packaging of the survey will have a big impact on how respondents classify the incidents depicted in the vignettes.
Our next hypothesis is that the context provided by earlier questions will have effects similar to those of the context provided by the external trappings of the survey. A rape survey loaded with crime items is likely to lead respondents to omit sexual victimizations that do not seem crimelike; a survey loaded with items on sexual victimizations will lead respondents to report incidents that are not, strictly speaking, rapes. Respondents want to help out by providing relevant information, but they are accustomed to the looser standards of conversation and take cognitive shortcuts to reduce the demands of the questions. As a result, it is important to gather detailed information about each incident that respondents report. Even when elaborate and explicit screening items are used, the researchers’ classification of an incident will not necessarily agree with the respondent’s (Fisher and Cullen, 2000).
We discussed one additional contextual variable—the bounding procedure used to frame the recall period for the survey items. Our final hypothesis is that the exact bounding procedure a survey uses will sharply affect the final estimates. Surveys that ask about events during a vaguely defined recall period (e.g., the Kleck surveys on defensive gun use) will yield more reports than surveys that take the trouble to bound the recall period more sharply. An exact date is good and a landmark event is better. By itself a prior interview may not be all that effective as a bounding event; our final hypothesis is that a full bounding procedure that includes a review of the incidents reported in the previous interview will reduce reporting relative to the more truncated procedure used in many surveys that simply instructs respondents to report incidents that occurred since the last interview. Compared to temporal or personal landmarks, the prior interview may not mark off the relevant time period very clearly.
Of course, lots of things affect reporting in surveys. Crime surveys are
at an added disadvantage because many of their questions involve particularly stigmatized or traumatic events, such as rape, that respondents may simply not want to discuss. This is why it is especially important to do as much as possible to uncover the effects of those factors that are within the control of the researchers. We have tried to focus on a few of the variables—the mode of interviewing, the setting of the interview, the framing of the survey, and the context of the key items—that we think may have a big impact on reporting in crime surveys. These are variables that have been shown to have effects large enough to account for the very large differences in results across different surveys. Unfortunately, we will not know whether these are the culprits until someone does the right experiments.
Anderson, J.R. 1983 The Architecture of Cognition. Cambridge, MA: Harvard University Press.
Baddeley, A.D., V. Lewis, and I. Nimmo-Smith 1978 When did you last…? In Practical Aspects of Memory, M.M. Gruneberg, P.E. Morris, and R.N. Sykes, eds. London: London Academic Press.
Beebe, T.J., P.A. Harrison, J.A. McRae, R.E. Anderson, and J.A. Fulkerson 1998 An evaluation of computer-assisted self-interviews in a school setting. Public Opinion Quarterly 11:623-632.
Biderman, A. 1980 Report of a Workshop on Applying Cognitive Psychology to Recall Problems of the National Crime Survey. Washington, DC: Bureau of Social Science Research.
Biderman, A., and D. Cantor 1984 A longitudinal analysis of bounding, respondent conditioning, and mobility as sources of panel bias in the National Crime Survey. In Proceedings of the American Statistical Association, Survey Research Methods Section. Washington, DC: American Statistical Association.
Bradburn, N.M., and S. Sudman 1979 Improving Interview Method and Questionnaire Design. San Francisco: Jossey-Bass.
Burton, S., and E. Blair 1991 Task conditions, response formulation processes, and response accuracy for behavioral frequency questions in surveys. Public Opinion Quarterly 55:50-79.
Cannell, C., F.J. Fowler, and K. Marquis 1968 The influence of interviewer and respondent psychological and behavioral variables on the reporting in household interviews. Vital and Health Statistics 2:26.
Cannell, C., P. Miller, and L. Oksenberg 1981 Research on interviewing techniques. Pp. 389-437 in Sociological Methodology, S. Leinhardt ed. San Francisco: Jossey-Bass.
Clark, H.H., and E.F. Schaefer 1989 Contributing to discourse. Cognitive Science 13:259-294.
Coker, A.L., and E.A. Stasny 1995 Adjusting the National Crime Victimization Survey’s Estimates of Rape and Domestic Violence for “Gag” Factors. Washington, DC: U.S. Department of Justice, National Institute of Justice.
Dillman, D.A., R.D. Tortora, J. Conradt, and D. Bowker 1998 Influence of plain vs. fancy design on response rates for Web surveys. In Proceedings of the Survey Research Methods Section, American Statistical Association. Washington, DC: American Statistical Association.
Fisher, B.S., and F.T. Cullen 2000 Measuring the Sexual Victimization of Women: Evolution, Current Controversies, and Future Research.
Friedman, W.J. 1993 Memory for the time of past events. Psychological Bulletin 113:44-66.
Grice, H.P. 1989 Studies in the Way of Words. Cambridge, MA: Harvard University Press.
Hemenway, D. 1997 The myth of millions of annual self-defense gun uses: A case study of survey overestimates of rare events. Chance 10:6-10.
Jobe, J.B., W.F. Pratt, R. Tourangeau, A. Baldwin, and K. Rasinski 1997 Effects of interview mode on sensitive questions in a fertility survey. Pp. 311-329 in Survey Measurement and Process Quality, L. Lyberg, P. Biemer, M. Collins, E. de Leeuw, C. Dippo, N. Schwarz, and D. Trewin, eds. New York: Wiley.
Just, M.A., and P.A. Carpenter 1992 A capacity theory of comprehension. Psychological Review 99:122-149.
Kiesler, S., and L. Sproull 1986 Response effects in the electronic survey. Public Opinion Quarterly 50:402-413.
Kleck, G. 1991 Point Blank: Guns and Violence in America. Hawthorne, NY: Aldine de Gruyter.
Kleck, G., and M. Gertz 1995 Armed resistance to crime: The prevalence and nature of self-defense with a gun. Journal of Criminal Law and Criminology 86:150-187.
Koss, M. 1992 The underdetection of rape. Journal of Social Issues 48:63-75.
1993 Detecting the scope of rape: A review of prevalence research methods. Journal of Interpersonal Violence 8:198-222.
1996 The measurement of rape victimization in crime surveys. Criminal Justice and Behavior 23:55-69.
Koss, M., C.A. Gidycz, and N. Wisniewski 1987 The scope of rape: Incidence and prevalence of sexual aggression and victimization in a national sample of higher education students. Journal of Consulting and Clinical Psychology 55:162-170.
Krosnick, J.A. 1991 Response strategies for coping with the cognitive demands of attitude measures in surveys. Applied Cognitive Psychology 5:213-236.
Krosnick, J.A., and D. Alwin 1987 An evaluation of a cognitive theory of response-order effects in survey measurement. Public Opinion Quarterly 51:201-219.
Kurbat, M.A., S.K. Shevell, and L.J. Rips 1998 A year’s memories: The calendar effect in autobiographical recall. Memory and Cognition 26:532-552.
Laumann, E., J. Gagnon, R. Michael, and S. Michaels 1994 The Social Organization of Sexuality: Sexual Practices in the United States. Chicago: University of Chicago Press.
Lehnen, R.G., and W. Skogan 1981 The National Crime Survey: Working Papers. Volume 1: Current and Historical Perspectives. Washington, DC: Bureau of Justice Statistics.
Lessler, J.T., and J.M. O’Reilly 1997 Mode of interview and reporting of sensitive issues: Design and implementation of audio computer-assisted self-interviewing. Pp. 366-382 in The Validity of Self-Reported Drug Use: Improving the Accuracy of Survey Estimates, L. Harrison and A. Hughes, eds. Rockville, MD: National Institute on Drug Abuse.
Loftus, E.F., and W. Marburger 1983 Since the eruption of Mt. St. Helens, has anyone beaten you up? Improving the accuracy of retrospective reports with landmark events. Memory and Cognition 11:114-120.
Lynch, J.P. 1996 Clarifying divergent estimates of rape from two national surveys. Public Opinion Quarterly 60:410-430.
Martin, E., R.M. Groves, J. Matlin, and C. Miller 1986 Report on the Development of Alternative Screening Procedures for the National Crime Survey. Washington, DC: Bureau of Social Science Research.
Mason, R., J. Carlson, and R. Tourangeau 1995 Contrast effects and subtraction in part-whole questions. Public Opinion Quarterly 58:569-578.
McDowall, D., and B. Wiersema 1994 The incidence of defensive firearm use by U.S. crime victims: 1987 through 1990. American Journal of Public Health 84:1982-1984.
McDowall, D., C. Loftin, and S. Presser 2000 Measuring civilian defensive firearm use: A methodological experiment. Journal of Quantitative Criminology 16:1-19.
Means, B., G.E. Swan, J.B. Jobe, and J.L. Esposito 1994 An alternative approach to obtaining personal history data. Pp. 167-184 in Measurement Errors in Surveys, P. Biemer, R. Groves, L. Lyberg, N. Mathiowetz, and S. Sudman, eds. New York: Wiley.
Moon, Y. 1998 Impression management in computer-based interviews: The effects of input modality, output modality, and distance. Public Opinion Quarterly 62:610-622.
Mosher, W.D., and A.P. Duffer, Jr. 1994 Experiments in Survey Data Collection: The National Survey of Family Growth
Pretest. Paper presented at the annual meeting of the Population Association of America, Miami, FL.
Murphy, L. 1976 The Effects of the Attitude Supplement on NCS City Sample Victimization Data. Unpublished internal document, Bureau of the Census, Washington, DC.
National Victims Center 1992 Rape in America: A Report to the Nation. Arlington, VA: National Victims Center.
Neter, J., and J. Waksberg 1964 A study of response errors in expenditures data from household interviews. Journal of the American Statistical Association 59:17-55.
Phipps, P., and A. Tupek 1990 Assessing Measurement Errors in a Touchtone Recognition Survey. Paper presented at the International Conference on Measurement Errors in Surveys, Tucson, AZ.
Robinson, J.A. 1986 Temporal reference systems and autobiographical memory. Pp. 159-188 in Autobiographical Memory, D.C. Rubin, ed. Cambridge, England: Cambridge University Press.
Russell, D.E.H. 1982 The prevalence and incidence of forcible rape and attempted rape of females. Victimology 7:81-93.
Schaeffer, N.C. 1991 Conversation with a purpose—or conversation? Interaction in the standardized interview. Pp. 367-391 in Measurement Errors in Surveys, P.P. Biemer, R.M. Groves, L.E. Lyberg, N.A. Mathiowetz, and S. Sudman, eds. New York: Wiley.
Schober, M. 1999 Making sense of questions: An interactional approach. Pp. 77-93 in Cognition and Survey Research, M.G. Sirken, D.J. Herrmann, S. Schechter, N. Schwarz, J.M. Tanur, and R. Tourangeau, eds. New York: Wiley.
Schober, S., M.F. Caces, M. Pergamit, and L. Branden 1992 Effects of mode of administration on reporting of drug use in the National Longitudinal Survey. Pp. 267-276 in Survey Measurement of Drug Use: Methodological Studies, C. Turner, J. Lessler, and J. Gfroerer, eds. Rockville, MD: National Institute on Drug Abuse.
Schwarz, N., and G.L. Clore 1983 Mood, misattribution, and judgments of well-being: Informative and directive functions of affective states. Journal of Personality and Social Psychology 45:513-523.
Schwarz, N., H.J. Hippler, B. Deutsch, and F. Strack 1985 Response categories: Effects on behavioral reports and comparative judgments. Public Opinion Quarterly 49:388-395.
Schwarz, N., B. Knauper, H.J. Hippler, E. Noelle-Neumann, and F. Clark 1991 Rating scales: Numeric values may change the meaning of scale labels. Public Opinion Quarterly 55:618-630.
Schwarz, N., F. Strack, and H. Mai 1991 Assimilation and contrast effects in part-whole question sequences: A conversational logic analysis. Public Opinion Quarterly 55:3-23.
Silver, B.D., P.R. Abramson, and B.A. Anderson 1986 The presence of others and overreporting of voting in American national elections. Public Opinion Quarterly 50:228-239.
Sirken, M.G., D.J. Herrmann, S. Schechter, N. Schwarz, J. Tanur, and R. Tourangeau 1999 Cognition and Survey Research. New York: Wiley.
Skogan, W. 1981 Issues in the Measurement of Victimization.Washington, DC: Bureau of Justice Statistics.
Smith, T.W. 1997 The impact of the presence of others on a respondent’s answers to questions. International Journal of Public Opinion Research 9:33-47.
Strack, F., N. Schwarz, and M. Wänke 1991 Semantic and pragmatic aspects of context effects in social and psychological research. Social Cognition 9:111-125.
Suchman, L., and B. Jordan 1990 Interactional troubles in face-to-face survey interviews. Journal of the American Statistical Association 85:232-241.
Sudman, S., A. Finn, and L. Lannom 1984 The use of bounded recall procedures in single interviews. Public Opinion Quarterly 48:520-524.
Sudman, S., N. Bradburn, and N. Schwarz 1996 Thinking About Answers: The Application of Cognitive Processes to Survey Methodology. San Francisco: Jossey-Bass.
Tangney, J.P., R.W. Miller, L. Flicker, and D.H. Barlow 1996 Are shame, guilt, and embarrassment distinct emotions? Journal of Personality and Social Psychology 70:1256-1269.
Thompson, C.P., J.J. Skowronski, S.F. Larsen, and A.L. Betz 1996 Autobiographical Memory. Mahwah, NJ: Erlbaum.
Tourangeau, R. 1999 Context effects on answers to attitude questions. Pp. 111-131 in Cognition and Survey Research, M.G. Sirken, D.J. Herrmann, S. Schechter, N. Schwarz, J. Tanur, and R. Tourangeau, eds. New York: Wiley.
Tourangeau, R., and T.W. Smith 1996 Asking sensitive questions: The impact of data collection mode, question format, and question context. Public Opinion Quarterly 60:275-304.
Tourangeau, R., K. Rasinski, and N. Bradburn 1991Measuring happiness in surveys: A test of the subtraction hypothesis. Public Opinion Quarterly 55:255-266.
Tourangeau, R., L.J. Rips, and K.A. Rasinski 2000 The Psychology of Survey Response. New York: Cambridge University Press.
Turner, C.F., J.T. Lessler, and J. Devore 1992 Effects of mode of administration and wording on reporting of drug use. Pp. 177-220 in Survey Measurement of Drug Use: Methodological Studies, C. Turner, J. Lessler, and J. Gfroerer, eds. Rockville, MD: National Institute on Drug Abuse.
Turner, C.F., B.H. Forsyth, J.M. O’Reilly, P.C. Cooley, T.K. Smith, S.M. Rogers, and H.G. Miller 1998 Automated self-interviewing and the survey measurement of sensitive behaviors. In Computer-Assisted Survey Information Collection, M.P. Couper, R.P. Baker, J. Bethlehem, C.Z.F. Clark, J. Martin, W.L. Nicholls II, and J. O’Reilly, eds. New York: Wiley.
Turner, C.F., L. Ku, S.M. Rogers, L.D. Lindberg, J.H. Pleck, and F.L. Sonenstein 1998 Adolescent sexual behavior, drug use, and violence: Increased reporting with computer survey technology. Science 280:867-873.
Wagenaar, W.A. 1986 My memory: A study of autobiographical memory over six years. Cognitive Psychology 18:225-252.
Williams, M.D., and J.D. Hollan 1981 The process of retrieval from very long-term memory. Cognitive Science 5:87-119.