4

Approaches to Improving Survey Response

In previous chapters, we have summarized evidence that survey nonresponse is a growing problem. In a paper that has been cited often in this report, Brick and Williams (2013) raised the disturbing possibility, based on their analyses, that the intrinsic rate of increase in nonresponse in U.S. household surveys might be 0.5 percentage points or so per year. We have provided evidence that survey nonresponse is more prevalent with some modes of data collection than others, that it can produce errors in survey estimates, and that sophisticated adjustment techniques are required to ameliorate the impact it has on estimates.

In Chapter 1, we laid out many potential reasons for the growth in nonresponse, concluding that the decision of a person to respond or not respond to a survey involves several key factors. The elaboration of these factors provides a convenient conceptual point of departure for the review in this chapter of approaches to improving response.

Ultimately, responding or not responding to a survey is a decision made by a sample member (or a proxy). These decisions are informed by social factors (e.g., social disorganization, crime); membership in a social category or group (e.g., age, gender, political party); the survey setting (e.g., interviewer-mediated or self-administered); the social climate (e.g., time pressures, general concerns about privacy); the proliferation of surveys; and so on. Approaches to improving survey response must take these factors into account.

One possibility suggested by researchers is that the decline in response rates reflects a corresponding increase in the overall level of burden that surveys place on sample populations. Thus, this chapter begins with a



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 61
4 Approaches to Improving Survey Response I n previous chapters, we have summarized evidence that survey nonre- sponse is a growing problem. In a paper that has been cited often in this report, Brick and Williams (2013) raised the disturbing possibility, based on their analyses, that the intrinsic rate of increase in nonresponse in U.S. household surveys might be 0.5 percentage points or so per year. We have provided evidence that survey nonresponse is more prevalent with some modes of data collection than others, that it can produce errors in survey estimates, and that sophisticated adjustment techniques are required to ameliorate the impact it has on estimates. In Chapter 1, we laid out many potential reasons for the growth in nonresponse, concluding that the decision of a person to respond or not respond to a survey involves several key factors. The elaboration of these factors provides a convenient conceptual point of departure for the review in this chapter of approaches to improving response. Ultimately, responding or not responding to a survey is a decision made by a sample member (or a proxy). These decisions are informed by social factors (e.g., social disorganization, crime); membership in a social category or group (e.g., age, gender, political party); the survey setting (e.g., interviewer-mediated or self-administered); the social climate (e.g., time pressures, general concerns about privacy); the proliferation of surveys; and so on. Approaches to improving survey response must take these factors into account. One possibility suggested by researchers is that the decline in response rates reflects a corresponding increase in the overall level of burden that surveys place on sample populations. Thus, this chapter begins with a 61

OCR for page 61
62 NONRESPONSE IN SOCIAL SCIENCE SURVEYS discussion of respondent burden and the relationship of real and perceived burden with the willingness to take part in surveys. Several of the meth- ods we discuss in detail, such as matrix sampling or greater reliance on administrative records, represent attempts to greatly reduce the burden on respondents. We then discuss several approaches that are being taken or have been proposed to increase survey response rates. The first group of approaches involve sampling procedures—respondent-driven sampling (RDS), matrix sampling, and address-based sampling (ABS)—that may have implications for response rates. Other approaches are aimed at increasing our under- standing of the conditions and motivations underlying nonresponse; chang- ing the interaction of interviewer and respondent; making better use of information collected in the survey process to adjust the collection strategy in an attempt to achieve higher response rates, lower costs, or both; using other data sources (e.g., transaction data and administrative data) as strate- gies to reduce burden; and using mixed-mode methods of data collection. UNDERSTANDING AND REDUCING RESPONDENT BURDEN It is widely accepted that nonresponse is, at least in part, related to the perceived burden of taking part in a survey. It is less clear how to define and measure burden. Two flawed but widely used indicators of burden are the number of questions in the survey and the average time taken by respondents to complete those questions. The notion that the time used in responding is directly related to burden seems to be the working prin- ciple behind the federal government’s Paperwork Reduction Act. This act requires the computation of burden hours for proposed federal data collec- tions and has provisions that encourage limiting those burden hours. The use of a time-to-complete measure (in hours) for response burden is fairly widespread among the national statistical agencies (Hedlin et al., 2005, pp. 3–7). The factors to be taken into account in the calculation of burden hours are important considerations. Burden could relate only to the actual time spent on completing the instrument, but it also could take into account the time respondents need to collect relevant information before the interviewer arrives (for example, in keeping diaries) and any time after the interview is completed. For example, the time incurred when respondents are re- contacted to validate data could also be taken into account. Without these additions, a measure that uses administration time or total respondent time per interview as a metric for burden is clearly problematic. Bradburn (1978) suggested that the definition of respondent burden should include four elements: interview length, required respondent ef- fort, respondent stress, and the frequency of being interviewed. The effort

OCR for page 61
APPROACHES TO IMPROVING SURVEY RESPONSE 63 required of respondents could refer to the cognitive challenges of a task (e.g., remembering the number of doctor visits in a year) or the irritation of answering poorly written questions. The frequency of being interviewed could refer either to the multiple interviews required by longitudinal sur- veys or to the increased likelihood of being selected for a study as society’s demands for information increase. In addition, some complex studies may involve requests for biomarkers, record linkages, multiple modes of re- sponse, and more. Some of these requests (e.g., for waist measurement) may be perceived as intrusive; if so, this may increase the sense of burden. Furthermore, multiple requests require many decisions using different cri- teria (e.g., the decision to allow waist measurement may use criteria differ- ent from those used to decide about providing a DNA sample), and these decisions may add to burden. Difficult or upsetting questions are, in this view, more burdensome than easy or enjoyable ones, and any measure of burden should reflect the cognitive and emotional costs associated with the questions as well as the time spent answering them. Presser and McCulloch (2011) documented a sharp increase in the number of federal surveys. Although the probabilities of being interviewed for a survey are likely still relatively small, members of the general popu- lation are at somewhat greater risk of being interviewed because of that proliferation. Presser and McCulloch argued that the increased number of survey requests people are subjected to may be one reason for the decline in response rates. It is clear that “burden” has many possible aspects. Progress in un- derstanding burden and its impact on survey response must begin with an analysis of the concept, its dimensions, and how it is operationalized. Unfortunately, there is little research to show conclusively that there is a causal relationship between measured burden and propensity to respond. The research design needed to examine such a relationship would be in- fluenced by the type and extent of burden that is imposed, and in many cases, sample members cannot know the extent of the burden of a specific request until they complete the survey—or at least until they are contacted. An increasing number of sample members are not contacted; consequently, measuring the number of survey requests that households or individuals receive and relating this to their overall response propensity is problematic. To fully understand the impact of burden on response, more testing of the so-called burden–participation hypothesis is needed. The literature includes a few studies of the factors affecting perceptions of burden, which usually focus on survey instrument length and difficulty. In a 1983 paper, Sharp and Frankel examined the length of the survey instrument, the effort required to answer some of the questions, and the impact of a request for a second interview approximately one year after the first. Behavioral indicators and responses to a follow-up questionnaire were

OCR for page 61
64 NONRESPONSE IN SOCIAL SCIENCE SURVEYS used to measure the perception of burden. The study found that the instru- ment length produced a statistically significant (although generally small) effect on perceived burden. The perception of burden was more strongly influenced by attitudinal factors than by the survey length. Respondents who see surveys as useful rated the survey as less burdensome than those who did not. Similarly, those who saw the survey questions as an invasion of privacy rated the survey as more burdensome. A literature review by Bogen (1996) found mixed results concerning the relationship between questionnaire length and response rate. Bogen reviewed both non-experimental and experimental studies that were avail- able in the mid-1990s. She concluded that “the non-experimental litera- ture paints a picture about the relationship between interview length and response rates that is not uniform” (p. 1021). Likewise, the experimental literature produced some studies that found that shorter interviews yielded higher response, others that found longer interviews to yield higher re- sponse, and still others that suggested that the length of the interview did not matter. She concluded that the experimental studies could have been affected by logistical and scheduling considerations and interviewer ex- pectations. Clearly, reasons other than interview length are at play in the decision of an individual to respond or not respond. Very little is known about where in the process of receiving and re- sponding to the request to participate in a survey the sample member’s perception of burden is formed or how well-formed or fluid this perception is. Attitudes toward burden may precede any request and may insulate the sample member from processing new information about a specific survey, or attitudes may be quickly formed based on an impression of a specific request. The survey topic and other information are often communicated in the advance letters used in many surveys, but whether the letters are received, read, and understood is not known. Without a very basic understanding of the dimensions of burden and the factors that generate the perception of burden, it is difficult to take the next step and determine the relationship between perception of burden and the propensity to respond. Recommendation 4-1: Research is needed on the overall level of burden from survey requests and on the role of that burden in the decision to participate in a specific survey. The questions to be addressed in the recommended research program include: What are the dimensions of response burden? How should they be operationalized? What factors (e.g., time, cognitive difficulty, or invasive- ness, such as with the collection of biomarkers) determine how potential respondents assess the burden involved in taking part in a survey? How

OCR for page 61
APPROACHES TO IMPROVING SURVEY RESPONSE 65 much can interviewers, advance letters, or other explanatory or motiva- tional material alter perceptions about the likely burden of a survey? IMPROVING RESPONSE IN TELEPHONE AND MAIL SURVEYS Telephone Surveys This report has documented that some of the most troublesome declines in response rates in social science survey operations have taken place in telephone surveys. This is particularly vexing because of the extensive reli- ance on this mode for sample member recruitment and data collection. This reliance was summarized during a panel workshop by Paul Lavrakas, who chaired an American Association for Public Opinion Research (AAPOR) task force on including cell phones in telephone surveys (American Associa- tion for Public Opinion Research, 2010a). Drawing on examples from six ongoing cell phone collections (see American Association for Public Opinion Research, 2010a), he described the current environment for telephone surveying as one in which only 67 percent to 72 percent of households have a landline and just 8 percent to 12 percent of households have only a landline. On the other hand, 86 per- cent to 91 percent of households have a cell phone, and 27 percent to 31 percent of households have only a cell phone. Only a very few (1 percent to 2 percent) of households have neither a landline nor a cell phone. The growth in cell phone usage poses a severe challenge to telephone surveys. Lavrakas noted that federal regulations that limit calling options for cell phones and the telephony environment in the United States create special challenges for researchers trying to conduct surveys that include cell phone numbers. Despite these obstacles, many random digit dialing (RDD) surveys now include cell phones. The Centers for Disease Control and Prevention (CDC) has been in the forefront of testing and implementing cell phone data collec- tion. In 2006, the CDC Behavioral Risk Factor Surveillance System (BRFSS) responded to the growing percentage of cell phone–only households by testing changes in BRFSS survey methods to accommodate cell phone data collection. The tests included pilot studies in 18 states in 2008, and in 2010 the test was expanded to 48 states. These pilot studies gathered data from test samples including landline and cell phone–only households. This ex- tension to cell phone collection has increased the complexity of the survey operation and data processing, including the need for different weighting techniques by mode. In 2012, the proportion of all completed BRFSS in- terviews conducted by cellular telephone was approximately 20 percent (Centers for Disease Control and Prevention, 2012). In terms of response rates, the AAPOR panel found that landline RDD

OCR for page 61
66 NONRESPONSE IN SOCIAL SCIENCE SURVEYS surveys rarely had response rates higher than 40 percent; they are mostly in the 10 percent to 25 percent range, and often less than 10 percent. Response rates for cell phone RDD surveys are even lower: rarely above 30 percent, and mostly in the 10 percent to 15 percent range. The AAPOR panel concluded that, as with other surveys, the main reasons for telephone survey nonresponse are noncontact, refusals, and lan- guage barriers. (Language barriers involve a failure to communicate, which often results in a nonresponse if an interpreter is not available to translate the questions and answers.) Noncontacts are higher with shorter periods of field collection and are affected by the increased availability of caller ID, which allows households to screen incoming calls. Calling rules that are imposed by survey manage- ment may limit the number and timing of callbacks and thus may raise the noncontact rates. On the other hand, messages left on voice mail and answering machines may reduce noncontacts. There are many reasons for refusals. Among the main reasons are the failure to contact sample members ahead of time; negative attitudes toward the sponsor and the survey organization; the survey topic; the timing of the request; confidentiality and privacy concerns; and a belief that responding will be burdensome. In some cases, the interviewers use poor introductory scripts, and they may not be able to offer incentives, or they may use incen- tives poorly (Lynn, 2008). Mail Surveys Low response rates have long been considered a major problem for mail surveys, so much so that much of the early research on improving re- sponse rates focused on mail surveys. In 1978, Heberlein and Baumgartner carried out a meta-analysis to test the effects of a large number of survey characteristics on mail response rates. Their final model predicted about two-thirds of the variation in the final response rate. Variables that had a positive effect on response rates were (a) more contacts with the sample household via advance letters, reminder postcards, sending replacement questionnaires, and telephone prompts; (b) a topic of interest to members of the target group; (c) government sponsorship of the survey; (d) target populations, such as students and military personnel, that were more likely to take part in surveys than the general population as a whole; (e) the use of special follow-up procedures, such as more expensive mailing procedures (e.g., certified mail) or personal contacts; and (f) incentives included with the first mailing. However, three factors had a negative effect on response rate: (1) the collection of marketing research information to benefit a spe- cific firm; (2) a general population sample; and (3) long questionnaires.

OCR for page 61
APPROACHES TO IMPROVING SURVEY RESPONSE 67 Goyder (1982) replicated this study with similar results, except that the negative effect of market research sponsorship disappeared. Other studies have elaborated on these basic findings, paying particular attention to the effects of respondent incentives (Fox et al., 1988; Church, 1993). The lessons of this early research were codified in the development of a comprehensive system designed to achieve higher response rates for mail surveys. The total design method (TDM), developed by Dillman (1978), was guided primarily by social exchange theory, which posits that question- naire recipients are most likely to respond if they expect that the perceived benefits of responding will outweigh the costs of responding. TDM em- phasizes how the elements fit together more than the effectiveness of any individual technique. Specific well-known TDM recommendations that have been shown to be likely to help improve responses include the following: • Use graphics and various question-writing techniques to ease the task of reading and answering the questions. • Put some interesting questions first. • Make the questions user-friendly. • Print the questionnaire in a booklet format with an interesting cover. • Use bold letters. • Reduce the size of the booklet or use photos to make the survey seem smaller and easier to complete. • Conduct four carefully spaced mailings beginning with the ques- tionnaire and a cover letter and ending with a replacement questionnaire and cover letter to nonrespondents seven weeks after the original mailing. • Include an individually printed, addressed, and signed letter. • Print the address on the envelopes rather than use address labels. • Explain that an ID number is used and that the respondent’s con- fidentiality is protected. • Fold the materials in a way that differs from an advertisement. To adapt the original TDM to different survey situations, such as those used in mixed-mode surveys, Dillman developed the tailored design method (Dillman et al., 2009), in which the basic elements of survey design and implementation are shaped further for particular populations, sponsorship, and content. Despite these advances in understanding the determinants of high response rates in mail surveys, which are grounded in research cover- ing more than a quarter of a century, the challenges continue.

OCR for page 61
68 NONRESPONSE IN SOCIAL SCIENCE SURVEYS NEW FRAMES AND METHODS OF SAMPLING It is appropriate to begin consideration of approaches to improving survey response or lowering survey costs with the design of the survey. This section discusses several options, ranging from adopting a whole new ap- proach to survey design (ABS) to more traditional methods for improving sample design. Address-Based Sampling Survey researchers have recently begun to explore the use of lists of mailing addresses as sampling frames. There are several reasons for this development, including the potential for cost savings (for surveys relying on area probability samples) and the potential for better response rates (for surveys relying on RDD sampling). Iannacchione et al. (2003) were the first to publish results on the use of the U.S. Postal Service Delivery Sequence File (DSF) as a potential sampling frame, a method that has come to be known as ABS. Link et al. (2008) were the first to publish work on switch- ing from telephone to mail data collection and from RDD to ABS sampling. They also coined the term “address-based sampling.” Link and his colleagues (2008) compared mail surveys based on ABS with telephone surveys based on RDD using the BRFSS questionnaire in six low-response rate states (California, Illinois, New Jersey, North Carolina, Texas, and Washington). The BRFSS covers all 50 states plus the District of Columbia. The pilot survey was conducted in parallel with the March, April, and May 2005 regular RDD data collection process. In five of the six states, the mail/ABS BRFSS achieved a higher response rate than the regular telephone/RDD BRFSS. However, after this testing, the ABS design was not implemented in the BRFSS for reasons that have not been documented. The National Household Education Survey (NHES) program has also undertaken to transition from RDD to an ABS methodology. This new methodology was used recently in a very large NHES field test. The field test included several experiments to discover the best methods for a mail ABS approach. The experiments compared different questionnaires and survey materials, levels of incentives and mailing services, and the effects of including a pre-notice letter. Preliminary results from the field test indicate that ABS response rates were substantially higher than those attained in the last round of RDD surveys (Montaquila and Brick, 2012). In addition to the testing and experimentation conducted with the BRFSS and NHES surveys, several other surveys have adopted an ABS design. The Health Information National Trends Survey (HINTS) of the National Cancer Institute (which used an ABS component in addition to an RDD component in 2007), the Nielsen TV Ratings Diary (which moved

OCR for page 61
APPROACHES TO IMPROVING SURVEY RESPONSE 69 from a landline RDD frame to ABS), and Knowledge Networks (which switched from RDD to ABS recruitment for its online panel surveys) will yield additional information on the ability of this design to increase re- sponse over time. In summary, research has so far indicated that ABS provides good cov- erage and is also cost-effective. In conjunction with mail data collection, it appears to produce higher response rates than telephone interviewing and RDD sampling produce. However, it has been pointed out that when eligibility rates fall below a certain point, it is no longer cost-effective (Amaya and Ward, 2011). There are still major issues to be researched con- cerning ABS, including within-household selection of a single respondent (­ ontaquila et al., 2009). M Recommendation 4-2: Research is needed on how to best make a switch from the telephone survey mode (and frame) to mail, including how to ensure that the right person completes a mail survey. Respondent-Driven Sampling Some populations are hard to include in surveys because they are very rare, difficult to identify, or elusive. When these groups are the target population for a survey, they have very high non-interview and nonresponse rates. According to a presentation to the panel by Heckathorn (2011), many hard-to-reach populations cannot be sampled using standard methods be- cause they lack a sampling frame (list of population members), represent small proportions of the general population, have privacy concerns (e.g., stigmatized groups), or are part of networks that are hard for outsiders to penetrate (e.g., jazz musicians). The traditional methods for sampling such hard-to-reach populations all have problems. One traditional method is to sample population mem- bers through location sampling (e.g., selecting a sample of homeless per- sons by selecting persons who sleep at a homeless shelter). However, such samples would exclude members who avoid those contacts; as a result, those with contacts at sample locations may differ systematically from those without them. Another approach is to draw a probability sample of popu- lation members who are accessible in public venues, but the coverage of those samples is limited because it excludes those who shun public settings. Snowball samples (or chain-referral methods) may offer better cover- age because respondents are reached through their social networks, but they produce convenience samples rather than probability samples. Hence, there is a dilemma. There is a trade-off between maximizing coverage of hard-to-reach populations and realizing the statistical advantages offered

OCR for page 61
70 NONRESPONSE IN SOCIAL SCIENCE SURVEYS by probability sampling. Heckathorn (2011) argued that RDS resolves this dilemma by turning chain referral into a probability sampling method. RDS starts with eligible “seeds” to gain entry into the network. Then the seeds recruit other members of the population. There are often incen- tives both for participation and for recruiting. Advocates claim that there is a lower cost per case than with traditional designs; that it reduces time and demands on interviewers; that it can reach populations that traditional methods cannot; and that it eliminates travel and personal safety issues. However, the method relies on a number of critical “assumptions that must be met to determine if it is an appropriate sampling method to be used with a particular group” (Lansky et al., 2012, p. 77). Included among the assumptions is that the recruited population must know one another as members of the group, and that the members are adequately linked so that the whole population is covered. There are several approaches for measuring nonresponse in network samples. One approach is to compare the reported network composition with the yield of actual recruits. For example, in Bridgeport, Connecticut, a sample of drug injectors yielded only blacks, although respondents reported knowing many Hispanic injectors. In this case, recruitment excluded an important group. The interview site was in a black neighborhood, where Hispanics did not feel comfortable. The solution was to move the interview site to neutral ground in downtown Bridgeport. Subsequently, recruitment of both blacks and Hispanics was more successful, and the reported net- work converged with the composition of the recruited sample. Comparing self-reported network composition and peer recruitment patterns provided a qualitative measure of representativeness even though it could not be expressed in a traditional response rate.1 Another approach is to ask those who are responsible for recruiting respondents about those who refused to be recruited. This technique was used in a CDC study of young drug injectors in Meriden, Connecticut. The most common reason for refusing recruitment was being “too busy” (see also Iguchi et al., 2009). Experience to date suggests that the operational aspects of reducing nonresponse in RDS are challenging, to say the least, and the ability of the method to yield results much like probability samples is not yet proven. In her presentation to the panel’s workshop, Sandra Berry (2011) suggested that it is important for the future of this survey technique that research be conducted on the following operational aspects of this still-in-development method: 1  his study was conducted as part of a research grant from CDC to the Institute for T Community Research; see http://www.incommunityresearch.org/research/nhbsidu.htm [March 2013].

OCR for page 61
APPROACHES TO IMPROVING SURVEY RESPONSE 71 • How well does RDS perform in community survey contexts? How do we judge this? • How can we get better measures of network size from individuals? • What features of RDS can be altered and at what cost to response rates, overall bias, or the variance of the estimates? • In what situations (populations or modes of contact and data col- lection) does RDS work well? • Which of RDS’s assumptions are likely to be met in practice, and which are likely to be violated? • How can RDS enhance and integrate with traditional data collection? Matrix Sampling While the goal of RDS is to identify and maximize responses from a hard-to-reach population at a reasonable cost, the goal of matrix sampling is to reduce any particular respondent’s burden and thereby improve survey response rates. Matrix sampling is a procedure in which a questionnaire is split into sections of questions, and each section is then administered to subsamples of the main sample. Even though individual survey respondents answer only a part of the questionnaire, estimates can be obtained for all the variables derived from survey items (Shoemaker, 1973). Partitioning a long questionnaire into smaller, bite-sized pieces is a way to encourage people to respond more readily. There are several examples from the fields of educational assessment, federal statistics, and public health (Gonzalez and Eltinge, 2007) in which matrix sampling has been applied: • The largest ongoing example of matrix sampling is the National Assessment of Educational Progress (NAEP), which surveys the educational accomplishments of students in the United States. Because NAEP assesses a large number of subject-matter areas, it uses a matrix sampling design to assess students in each subject. Blocks of items drawn from each content domain are administered to groups of students, thereby making it possible to administer a large number and range of items while keeping individual testing time to an hour. Because of its design, NAEP reports only group- level results.2 • One of the major U.S. surveys to have investigated matrix sampling as a way to reduce burden and improve response is the Consumer Expendi- ture Quarterly Interview Survey (CEQ). Gonzalez and Eltinge (2009) con- ducted a simulation study using CEQ data from April 2007 to March 2008 2  See http://nces.ed.gov/nationsreportcard [March 2013] for information about NAEP.

OCR for page 61
90 NONRESPONSE IN SOCIAL SCIENCE SURVEYS Effects in Interviewer-Mediated Surveys A meta-analysis of 39 experiments by Singer et al. (1999) found results for surveys using interviewers that were similar to those in mail surveys, although the effects of incentives were generally smaller. The analysis of 114 RDD surveys by Holbrook et al. (2008) found that surveys offering incentives had significantly higher response rates than those offering no incentives; the effect came mainly from a reduction in refusals. The 2008 analysis by Cantor et al. of 23 RDD experiments found that: • a prepayment of $1 to $5 increased response rates from 2 to 12 percentage points; • larger incentives led to higher response rates; • the effect of incentives has not declined over time, but baseline response rates have dropped substantially; • incentives at refusal conversion had about the same effect as those sent at initial contact; and • promised incentives of $5 and $25 did not increase response rates compared to no incentives, but promising larger incentives sometimes did. These findings are generally consistent with other experiments involv- ing interviewer-mediated surveys, including face-to-face surveys. Effects in Cell Phone Surveys Brick et al. (2007) conducted a dual-frame survey including both cell phones and landlines that include an incentive experiment. This study used two promised incentive conditions ($10, $5) and two message conditions (sample members notified of the survey and incentive by text messaging, or not notified). They found that the $10 group had a higher screener response rate than the $5 group as well as a higher cooperation rate. The message had no effect on either screener or survey response rates, and there were no interaction effects. Incentives in Longitudinal Studies Longitudinal surveys have special issues, because incentives are usually part of a larger motivational package designed to retain respondents. As in cross-sectional studies, initial survey round incentives have been found to increase response rates, usually by reducing refusals but sometimes by reducing non-contacts (e.g., McGrath, 2006). Some studies suggest that an initial payment may continue to motivate participation in subsequent waves (Singer and Kulka, 2001; McGrath, 2006; Creighton et al., 2007;

OCR for page 61
APPROACHES TO IMPROVING SURVEY RESPONSE 91 G ­ oldenberg et al., 2009). Singer concluded that incentives appear to in- crease response among those who have previously refused, but not among those who have previously cooperated (Zagorsky and Rhoton, 2008). In a study by Jaeckle and Lynn (2008) of incentive payments in a U.K. longitudinal study, the researchers found that (1) attrition was significantly reduced by incentives in all waves, (2) the attrition was reduced propor- tionately among subgroups and so did not reduce attrition bias, (3) the effect of the incentive decreased across waves, (4) incentives increased item nonresponse, but (5) there was a net gain in information. The NLSY97 has been a rich source of analysis of the effects of incen- tive payments on participation in a longitudinal survey because, from the beginning, NLSY management has had discretion over the level of incen- tives to be offered to participants. The amount of the incentive has also been adjusted on an experimental basis. In an early study conducted in NLSY97 Round 4 and extended into Round 5, Datta et al. (2001) found that sample members who were paid $20 had higher participation rates than those paid $10 or $15. However, there were no measurable effects on data quality from the higher level of incentives. Subsequent NLSY97 experiments found that higher incentives had a particular effect on bringing those who dropped from prior rounds back into participation in later rounds. Pierret et al. (2007) studied the results of the incentive experiments and concluded that incentives moderately increased response rates and had a greater impact on those respondents who did not participate in the previous round relative to those who did participate. An incentive experiment was conducted as part of the 2000 wave of HRS, in which the incentive amount was increased from $20 to $30 or $50. Rodgers (2011) found an improvement in response rates as the incen- tive increased. A lowered incentive amount of $40 in subsequent rounds or waves did not result in lowered response rates. He also found a statisti- cally significant decrease in item nonresponse among respondents receiving larger incentives. Other Findings on Incentive Effects Two experiments failed to find a role for interviewers in mediating incentive effects. Singer et al. (2000) kept interviewers blind to house- holds’ receipt of incentives in one condition but not in another and found that there were no differences in incentive effects between the two condi- tions. Lynn (2001) randomly offered promised incentives to half of each interviewer’s assigned households and then asked interviewers how use- ful they thought the incentives had been. Interviewers’ judgments were almost uniformly negative, but incentives had significant positive effects

OCR for page 61
92 NONRESPONSE IN SOCIAL SCIENCE SURVEYS on completion of a household interview, completion of interviews with ­ individual members, and completion of time use diaries. Nevertheless, as Singer (2011) pointed out to the panel, interviewer expectations may have an independent effect on respondent behavior—for example, Lynn’s effects might have been larger had interviewers had a more positive view of incen- tives. Also, the possibility of contamination in Singer’s experiment cannot be entirely ruled out since the same interviewers administered both condi- tions. It may also be that incentives vary in their effect at different points over the field period of a survey. An important consideration is the effect of incentives on response quality. Singer and Kulka (2001) found no decline in quality of response to incentives in terms of differences in nonresponse or length of open-ended answers. Since then, the small number of studies (mail, RDD, and face to face) that have examined incentive effects on data quality have, with one exception, found no effects. The exception is Jaeckle and Lynn (2008), who found that incentives increased item nonresponse. Cantor et al. (2008) ar- gued for additional tests that would control for such factors as survey topic, size and type of incentive (e.g., prepaid, promised, refusal conversion), and whether studies are cross-sectional or longitudinal. Do incentives affect sample composition? Cantor et al. (2008), in their review of 23 RDD studies, concluded that incentives, whether prepaid or promised, have little effect on measures of sample composition. Never­ theless, a number of studies have demonstrated such effects on specific characteristics (see Singer, 2013, pp. 128–129). But specific attempts to use incentives to bring groups into the sample that are less disposed to respond because of lower topic interest have received only qualified support (Groves et al., 2004, 2006). Singer points out that very few studies have considered the sample composition effect of Web survey incentives, and she concluded that more research is clearly needed. A key question concerns the effect of incentives on the responses that respondents provide. The research findings are mixed. James and Bolstein (1990), Brehm (1994), and Schwarz and Clore (1996) reported results consistent with the mood hypothesis—that incentives boost mood and therefore affect responses—and Curtin et al. (2007) found an interaction between race and receipt of incentives (nonwhites receiving an incentive gave more optimistic answers on the Index of Consumer Confidence). Groves et al. (2004, 2006) reduced nonresponse due to lack of topic interest by offering incentives, and the change in bias due to increased participation of those with less interest was not statistically significant. The possibility that incentives bias responses directly through an effect on attitudes has found no support in the experimental literature, although Dirmaier et al. (2007) specifically tested such a hypothesis. There is no evidence that

OCR for page 61
APPROACHES TO IMPROVING SURVEY RESPONSE 93 randomly administered incentives increase bias, but not enough is known about the effect to use them effectively to reduce bias. Whether incentives have an effect on Internet surveys is still unknown. Singer (2011) pointed out that research in this area is limited. Much of the published experimental research has been done by Göritz (2006), who finds that incentives increase the proportion of invitees starting a survey and the proportion completing it over a no-incentive group. Lotteries are the incen- tives most often used. Her literature review concluded that specific tests of lotteries against other types of incentives or against no incentives show that lotteries are no more effective in Web surveys than in other kinds of surveys. In most tests, lotteries did not significantly increase response rates over a no-incentive or alternative incentive group. Incentives are sometimes differential: different amounts are offered primarily to convert refusals, often on the basis that differential incen- tives are more economical than prepaid incentives and that they are more effective in reducing bias. But there is also a question of fairness; many sample members are likely to consider differential incentives to be unfair. Nonetheless, Singer et al. (1999) found that respondents would be willing to participate in a new survey by the same organization even when told that differential incentives would be paid. When differential incentives are used to reduce bias, they are commonly paired with a small prepaid incentive to all possible participants, which serves to increase the sample size and helps to satisfy the fairness criterion. Whether or not there are long-term incentive effects is not yet known. Singer (2011) said that there is no evidence of long-term effects thus far, but studies have been done only over short intervals. Conclusions on Incentives Singer (2011) drew the following conclusions regarding incentives: • Incentives increase response rates to surveys in all modes and in cross-sectional as well as panel studies. • Monetary incentives increase response rates more than gifts do, and prepaid incentives increase them more than promised incentives or lotteries do. • There is no good evidence for how large an incentive should be. In general, although response rates increase as the size of the incentive in- creases, they do so at a declining rate. Also, there may be ceiling effects to the extent that people come to expect incentives in all surveys. • Few studies have evaluated the effect of incentives on the quality of response; most of these have found no effects.

OCR for page 61
94 NONRESPONSE IN SOCIAL SCIENCE SURVEYS • Few studies have examined the effect of incentives on sample com- position and response distributions; most of these have found no effects. • Effects on sample composition and response distributions have been demonstrated in a small number of studies in which incentives have brought into the sample a larger-than-expected number of members of particular demographic categories or interest groups. • Incentives have the potential for both increasing and reducing non- response bias. They may reduce bias if they can be targeted to groups that might otherwise fail to respond. They may increase bias if they bring into the sample more of those who are already overrepresented. And if they af- fect all subgroups equally, they will leave the nonresponse bias unchanged. Recommendation 4-5: Research is needed on the differential effects of incentives offered to respondents (and interviewers) and the extent to which incentives affect nonresponse bias. PARADATA AND AUXILIARY DATA Paradata are data about the survey process itself and are collected as part of the survey operation (Couper, 1998). These data may include r ­ ecords of calls, reasons for refusals, responses to incentive offers, and char- acteristics of the interviewers (Couper and Lyberg, 2005; Bates et al., 2008; Laflamme et al., 2008; Lynn and Nicolaas, 2010; Stoop et al., 2010; Olson, 2013). As discussed in Chapter 3, paradata are used for many purposes: to monitor the status of field collection, to confirm that fieldwork has been carried out according to instructions, to compute response rates, to identify reasons for nonresponse, to implement responsive design strategies, and to adjust for nonresponse bias. They can be in the form of macro paradata, sometimes also termed metadata, which are aggregate statistics about the survey (e.g., response rates, coverage rates, editing rates). Studies have found such aggregate data to be useful for coming to an understanding of the survey information and in the weighting process (Dippo and Sundgren, 2000). Paradata can also be in micro form and include information carried on individual records, such as imputation flags, together with call records and the like.7 Sometimes paradata comprise auxiliary information external to the information collected on the survey questionnaire. Auxiliary data “can be thought of as encompassing all information not obtained directly from the interview of a respondent or case-specific informant” (Smith, 2011, p. 389). 7  However, care should be taken in using call record data because the process that causes such records to be created is often non-neutral, in that their presence or absence reflects a decision to pursue or not pursue a case.

OCR for page 61
APPROACHES TO IMPROVING SURVEY RESPONSE 95 Auxiliary data can be information about the sample frame taken from ex- ternal sources such as census data by block group, census tract, and other geographic areas (Smith and Kim, 2009). Auxiliary data can also be in the form of observational data about the interview environment. For example, the European Social Survey collects data about the type of dwellings and about neighborhood characteristics such as accumulated litter and signs of vandalism (Stoop et al., 2010). The researchers found that refusals and non-contacts were more likely in areas in which buildings were poorly maintained and where litter abounded. However, these data have not been found useful for nonresponse adjustment purposes (Stoop et al., 2010). The power of employing paradata and auxiliary data for improving response rates is now becoming recognized. In her presentation to the panel, panel member Kristen Olson observed that, in response to declining response rates, survey practitioners can often introduce design features on the basis of paradata to recruit previously uncontacted or uncooperative sample members into the respondent pool (Olson, 2013). Paradata may also be helpful in tailoring a survey to increase its saliency to sample members. Paradata can be used as well to create nonresponse adjustments, reflecting each respondent’s probability of being selected and observed in the respondent pool (see Chapter 2). There is an increasing recognition that auxiliary data can be used in sample design (i.e., to achieve large samples of rare or hard-to-find groups) or to improve approaches to the interview (Smith, 2011). In this regard, Smith (2011, p. 392) concludes that research is needed on methods for link- ing databases in order to augment “sample-frame data with information from other databases and sources.” Auxiliary data may also be useful to improve imputation techniques (Smith and Kim, 2009). Olson (2013) observed that few auxiliary variables are available on both respondents and nonrespondents and that there is a dearth of research on predicting survey variables. She concluded that paradata have tradition- ally been developed as measures of participation, not as survey variables. Paradata have the potential to assist in nonresponse adjustment and may be useful in developing responsive designs. However, the use of paradata requires upfront investment and additional research to demonstrate when the application of paradata is effective and when it is not. The use of para- data also requires the development of standards for its collection. Recommendation 4-6: Research leading to the development of minimal standards for call records and similar data is needed in order to im- prove the management of data collection, increase response rates, and reduce nonresponse errors.

OCR for page 61
96 NONRESPONSE IN SOCIAL SCIENCE SURVEYS RESPONSIVE DESIGN Paradata make it possible for survey managers to monitor the survey process in real time and to make decisions and alterations in order to improve response rates. This approach to survey design has been termed responsive design (Groves and Heeringa, 2006). As envisioned by Groves and Heeringa (2006, p. 1), responsive designs “pre-identify a set of design features potentially affecting costs and errors of survey statistics, identify a set of indicators of the cost and error properties of those features, monitor those indicators in initial phases of data collection, alter the active features of the survey in subsequent phases based on cost/error tradeoff decision rules, and combine data from separate design phases into a single estima- tor.” Responsive design is a flexible menu of design approaches that can be employed in real time to ameliorate the damage caused by reduced response rates to surveys. The effectiveness of this approach depends critically on the ability to pre-identify variables that can provide basic data on costs and error sources so that survey managers can make rational decisions about the trade-offs between costs and errors. One particular application of responsive design was described by Laflamme (2011) at the panel’s workshop. This application, called respon- sive collection design (RCD), uses the information available prior to and during data collection to adjust the collection strategy for the remaining in-progress cases. Statistics Canada has conducted two experimental sur- veys with RCD and control groups for two major CATI surveys: the 2009 Households and Environment Survey, using the 2009 Canadian Community Health Survey sampling frame, and the 2010 Survey of Labour and Income Dynamics (SLID). The SLID 2011 was designed with full RDC techniques in which there was an embedded experiment for the first call. The paradata consisted of the information on the Blaise transaction file (i.e., calls and contact information), interviewer payroll hours, budget and target figures, previous and current collection cycle information, and response propensity model results. These produced indicators that were used to identify when to start RCD. The results of the Statistics Canada tests indicated that there was a higher overall response rate when RCD was used compared to the previous survey cycle. The responsive design group achieved the same response rate with less effort. On the basis of the evidence, it was concluded that RCD is technically feasible. However, Laflamme (2011) asserted that RCD is not a “magic” solution. Recommendation 4-7: Research is needed on the theory and practice of responsive design, including its effect on nonresponse bias, informa-

OCR for page 61
APPROACHES TO IMPROVING SURVEY RESPONSE 97 tion requirements for its implementation, types of surveys for which it is most appropriate, and variance implications. ADMINISTRATIVE RECORDS Administrative records may be helpful in reducing potential bias due to nonresponse, and they may be helpful in correcting for bias. John Czajka’s remarks (2009) addressed the use of administrative records to reduce non- response bias. He gave the example of the use of Internal Revenue Service (IRS) tax records by the Census Bureau. The IRS conducts an annual enu- meration, and Social Security numbers (SSNs) provide a link to age, sex, race, and Spanish origin data stored in other files, such as the Social Secu- rity files. These administrative records yield population coverage estimated at 95 percent of U.S. residents. The potential of these administrative data led to the development of the Census Bureau’s Statistical Administrative Records System (StARS), which combines data from six large federal files: IRS 1040 and 1099, Selective Service, Medicare, Indian Health Service, and Housing and Urban Develop- ment’s Tenant Rental Assistance Certification System (TRACS). However, administrative records have limitations. For example, IRS records do not include undocumented residents, dependents of non-filers, and non-filers with no reported income. In addition, the unit of observation is not the household, and the reported address may not be residential. The records may also be incomplete. Race is often missing from Social Security files and, when present, may not reflect current definitions. Although administrative records hold promise for helping to improve survey operations, in Czajka’s judgment, it is unlikely that they can be sub- stituted for survey reports. Reasons include that the set of survey items for which there is a high quality administrative records alternative is small and largely limited to federal records; the concepts underlying administrative records may differ from survey concepts (e.g., tax versus survey income); the records may be outdated; and there are severe limitations on the abil- ity to use administrative records because of confidentiality concerns. Little work has been done on informed consent procedures to enable the use of the confidential administrative records. Yet work has moved forward on ways to use administrative records to make data collection programs more cost-effective. Thus, as part of plan- ning for the 2020 census, the Census Bureau has continued and expanded its acquisition of administrative records and its research on ways to use administrative records with census and survey data (see http://www.census. gov/2010census/pdf/2010_Census_Match_Study_Report.pdf [April 2013]). Other statistical agencies have explored the uses of administrative records

OCR for page 61
98 NONRESPONSE IN SOCIAL SCIENCE SURVEYS to augment survey data, such as matching health survey data with health expenditure claims data (see, e.g., http://www.cdc.gov/nchs/data_access/ data_linkage/cms_medicare.htm [April 2013]). Recommendation 4-8: Research is needed on the availability, quality, and application of administrative records to augment (or replace) sur- vey data collections. OTHER MEANS OF COLLECTING SOCIAL SCIENCE DATA The problems with obtaining cooperation with social science surveys do not mean that probability-based sampling should be abandoned. How- ever, it would be useful for the survey community to continue to prepare for a time when current modes are no longer tenable, due to excessive costs and burden. Several emerging methods for gathering social science data are briefly discussed here because they warrant consideration in the development of a research agenda for dealing with the problem of declining response rates. Non-probability Samples Non-probability samples are a troubling alternative to traditional prob- ability samples, but, largely owing to their cost and timeliness advantages, they are rapidly growing as a means of gathering data. Much of that growth is associated with the growth of online survey methods. In fact, non­ probability samples now account for the largest share of online research, according to the AAPOR Report on Online Panels (American Association for Public Opinion Research, 2010b). Nonresponse, according to the AAPOR report, is an issue for non­ probability samples, just as it is for probability-based samples. Nonresponse in the respondent recruitment phase is likely to be considerable, but since the target population is not known as it is with probability-based samples, it is not easily measured. Relatively little is known about nonresponse in non-probability samples, but nonresponse is not likely to be random and is likely to include the effects of self-selection. For example, the AAPOR report notes that self-selected, non-probability-based online panels are more likely to include white, younger, more active Internet users and those with higher levels of educational attainment than the general population. The report concludes that these surveys offer no foundation on which to draw inferences to the general population.

OCR for page 61
APPROACHES TO IMPROVING SURVEY RESPONSE 99 Internet Scraping Technologies Another possible approach to the growing problem of survey nonre- sponse is not to survey at all, but instead to gather information via data mining techniques, essentially mining the Internet to gather information. There are several examples of the use of this technique for the generation of economic and social statistics. • MIT Billion Prices Project. The Billion Prices Project (BPP) is an initiative by economists Roberto Rigobon and Alberto Cavallo to collect prices from hundreds of online retailers around the world on a daily basis (see http://bpp.mit.edu [April 2013]). The project monitors daily price fluc- tuations of about 5 million items sold by approximately 300 online retailers in more than 70 countries. For the United States, the project collects about 500,000 prices. It has been collecting prices since 2007. The BPP is said to have closely tracked the Consumer Price Index. • Google Price Index. The Google Price Index is a project of the company’s chief economist, Hal Varian. Varian uses Google’s vast database of Web prices to construct the constantly updated measure of price changes and inflation. Google has not yet decided whether it will publish the price index, and it has not released its methodology, but it has reported that the preliminary index tracks the Consumer Price Index closely. • Flu Epidemic Prediction. For several years, studies have been con- ducted to detect the onset of U.S. seasonal flu epidemics by extracting patterns of flu-related search terms from the billions of queries stored by Google and Yahoo! Inc. (Butler, 2008). These studies have been taken to provide real-time indicators to complement the CDC reports that are com- piled using a combination of data about hospital admissions, laboratory test results, and clinical symptoms. These reports are often weeks old by the time hospitals get them, and so they do not allow frontline health-care workers enough time to prepare for a surge in flu cases. The studies have found that patterns of searches matched official flu surveillance data almost perfectly—and often weeks in advance of these official data. • Predicting the Stock Market Using Twitter Feeds. Recent research has tested whether measurements of collective mood states derived from large-scale Twitter feeds are correlated with the value of the Dow Jones Industrial Average (DJIA) over time. Bollen et al. (2011) analyzed the text content of daily Twitter feeds with two mood-tracking tools and were able to predict the daily up and down changes in the closing values of the DJIA with an accuracy of 87.6 percent. Admittedly, these examples of the use of Internet data mining to pro- duce socioeconomic indicators are currently developmental, but their ap-

OCR for page 61
100 NONRESPONSE IN SOCIAL SCIENCE SURVEYS parent success in replicating data collected administratively or through costly surveys suggests that further development and testing could be war- ranted. However, caution is in order. Availability of many types of informa- tion on the Web is a matter of choice; thus, the data available on a given topic may not fully reflect the underlying range of information, raising the possibility of unknown biases. Often, there is no built-in constraint to make measures conceptually compatible. In addition, there is no guarantee that biases would necessarily be stationary in a meaningful sense over time, which would compromise the validity of trend analysis. Recommendation 4-9: Research is needed to determine the capability of information gathered by mining the Internet to augment (or replace) official survey statistics.