In previous chapters, we have summarized evidence that survey nonresponse is a growing problem. In a paper that has been cited often in this report, Brick and Williams (2013) raised the disturbing possibility, based on their analyses, that the intrinsic rate of increase in nonresponse in U.S. household surveys might be 0.5 percentage points or so per year. We have provided evidence that survey nonresponse is more prevalent with some modes of data collection than others, that it can produce errors in survey estimates, and that sophisticated adjustment techniques are required to ameliorate the impact it has on estimates.
In Chapter 1, we laid out many potential reasons for the growth in nonresponse, concluding that the decision of a person to respond or not respond to a survey involves several key factors. The elaboration of these factors provides a convenient conceptual point of departure for the review in this chapter of approaches to improving response.
Ultimately, responding or not responding to a survey is a decision made by a sample member (or a proxy). These decisions are informed by social factors (e.g., social disorganization, crime); membership in a social category or group (e.g., age, gender, political party); the survey setting (e.g., interviewer-mediated or self-administered); the social climate (e.g., time pressures, general concerns about privacy); the proliferation of surveys; and so on. Approaches to improving survey response must take these factors into account.
One possibility suggested by researchers is that the decline in response rates reflects a corresponding increase in the overall level of burden that surveys place on sample populations. Thus, this chapter begins with a
discussion of respondent burden and the relationship of real and perceived burden with the willingness to take part in surveys. Several of the methods we discuss in detail, such as matrix sampling or greater reliance on administrative records, represent attempts to greatly reduce the burden on respondents.
We then discuss several approaches that are being taken or have been proposed to increase survey response rates. The first group of approaches involve sampling procedures—respondent-driven sampling (RDS), matrix sampling, and address-based sampling (ABS)—that may have implications for response rates. Other approaches are aimed at increasing our understanding of the conditions and motivations underlying nonresponse; changing the interaction of interviewer and respondent; making better use of information collected in the survey process to adjust the collection strategy in an attempt to achieve higher response rates, lower costs, or both; using other data sources (e.g., transaction data and administrative data) as strategies to reduce burden; and using mixed-mode methods of data collection.
It is widely accepted that nonresponse is, at least in part, related to the perceived burden of taking part in a survey. It is less clear how to define and measure burden. Two flawed but widely used indicators of burden are the number of questions in the survey and the average time taken by respondents to complete those questions. The notion that the time used in responding is directly related to burden seems to be the working principle behind the federal government’s Paperwork Reduction Act. This act requires the computation of burden hours for proposed federal data collections and has provisions that encourage limiting those burden hours. The use of a time-to-complete measure (in hours) for response burden is fairly widespread among the national statistical agencies (Hedlin et al., 2005, pp. 3–7).
The factors to be taken into account in the calculation of burden hours are important considerations. Burden could relate only to the actual time spent on completing the instrument, but it also could take into account the time respondents need to collect relevant information before the interviewer arrives (for example, in keeping diaries) and any time after the interview is completed. For example, the time incurred when respondents are re-contacted to validate data could also be taken into account. Without these additions, a measure that uses administration time or total respondent time per interview as a metric for burden is clearly problematic.
Bradburn (1978) suggested that the definition of respondent burden should include four elements: interview length, required respondent effort, respondent stress, and the frequency of being interviewed. The effort
required of respondents could refer to the cognitive challenges of a task (e.g., remembering the number of doctor visits in a year) or the irritation of answering poorly written questions. The frequency of being interviewed could refer either to the multiple interviews required by longitudinal surveys or to the increased likelihood of being selected for a study as society’s demands for information increase. In addition, some complex studies may involve requests for biomarkers, record linkages, multiple modes of response, and more. Some of these requests (e.g., for waist measurement) may be perceived as intrusive; if so, this may increase the sense of burden. Furthermore, multiple requests require many decisions using different criteria (e.g., the decision to allow waist measurement may use criteria different from those used to decide about providing a DNA sample), and these decisions may add to burden. Difficult or upsetting questions are, in this view, more burdensome than easy or enjoyable ones, and any measure of burden should reflect the cognitive and emotional costs associated with the questions as well as the time spent answering them.
Presser and McCulloch (2011) documented a sharp increase in the number of federal surveys. Although the probabilities of being interviewed for a survey are likely still relatively small, members of the general population are at somewhat greater risk of being interviewed because of that proliferation. Presser and McCulloch argued that the increased number of survey requests people are subjected to may be one reason for the decline in response rates.
It is clear that “burden” has many possible aspects. Progress in understanding burden and its impact on survey response must begin with an analysis of the concept, its dimensions, and how it is operationalized. Unfortunately, there is little research to show conclusively that there is a causal relationship between measured burden and propensity to respond. The research design needed to examine such a relationship would be influenced by the type and extent of burden that is imposed, and in many cases, sample members cannot know the extent of the burden of a specific request until they complete the survey—or at least until they are contacted. An increasing number of sample members are not contacted; consequently, measuring the number of survey requests that households or individuals receive and relating this to their overall response propensity is problematic. To fully understand the impact of burden on response, more testing of the so-called burden–participation hypothesis is needed.
The literature includes a few studies of the factors affecting perceptions of burden, which usually focus on survey instrument length and difficulty. In a 1983 paper, Sharp and Frankel examined the length of the survey instrument, the effort required to answer some of the questions, and the impact of a request for a second interview approximately one year after the first. Behavioral indicators and responses to a follow-up questionnaire were
used to measure the perception of burden. The study found that the instrument length produced a statistically significant (although generally small) effect on perceived burden. The perception of burden was more strongly influenced by attitudinal factors than by the survey length. Respondents who see surveys as useful rated the survey as less burdensome than those who did not. Similarly, those who saw the survey questions as an invasion of privacy rated the survey as more burdensome.
A literature review by Bogen (1996) found mixed results concerning the relationship between questionnaire length and response rate. Bogen reviewed both non-experimental and experimental studies that were available in the mid-1990s. She concluded that “the non-experimental literature paints a picture about the relationship between interview length and response rates that is not uniform” (p. 1021). Likewise, the experimental literature produced some studies that found that shorter interviews yielded higher response, others that found longer interviews to yield higher response, and still others that suggested that the length of the interview did not matter. She concluded that the experimental studies could have been affected by logistical and scheduling considerations and interviewer expectations. Clearly, reasons other than interview length are at play in the decision of an individual to respond or not respond.
Very little is known about where in the process of receiving and responding to the request to participate in a survey the sample member’s perception of burden is formed or how well-formed or fluid this perception is. Attitudes toward burden may precede any request and may insulate the sample member from processing new information about a specific survey, or attitudes may be quickly formed based on an impression of a specific request. The survey topic and other information are often communicated in the advance letters used in many surveys, but whether the letters are received, read, and understood is not known.
Without a very basic understanding of the dimensions of burden and the factors that generate the perception of burden, it is difficult to take the next step and determine the relationship between perception of burden and the propensity to respond.
Recommendation 4-1: Research is needed on the overall level of burden from survey requests and on the role of that burden in the decision to participate in a specific survey.
The questions to be addressed in the recommended research program include: What are the dimensions of response burden? How should they be operationalized? What factors (e.g., time, cognitive difficulty, or invasiveness, such as with the collection of biomarkers) determine how potential respondents assess the burden involved in taking part in a survey? How
much can interviewers, advance letters, or other explanatory or motivational material alter perceptions about the likely burden of a survey?
This report has documented that some of the most troublesome declines in response rates in social science survey operations have taken place in telephone surveys. This is particularly vexing because of the extensive reliance on this mode for sample member recruitment and data collection. This reliance was summarized during a panel workshop by Paul Lavrakas, who chaired an American Association for Public Opinion Research (AAPOR) task force on including cell phones in telephone surveys (American Association for Public Opinion Research, 2010a).
Drawing on examples from six ongoing cell phone collections (see American Association for Public Opinion Research, 2010a), he described the current environment for telephone surveying as one in which only 67 percent to 72 percent of households have a landline and just 8 percent to 12 percent of households have only a landline. On the other hand, 86 percent to 91 percent of households have a cell phone, and 27 percent to 31 percent of households have only a cell phone. Only a very few (1 percent to 2 percent) of households have neither a landline nor a cell phone.
The growth in cell phone usage poses a severe challenge to telephone surveys. Lavrakas noted that federal regulations that limit calling options for cell phones and the telephony environment in the United States create special challenges for researchers trying to conduct surveys that include cell phone numbers.
Despite these obstacles, many random digit dialing (RDD) surveys now include cell phones. The Centers for Disease Control and Prevention (CDC) has been in the forefront of testing and implementing cell phone data collection. In 2006, the CDC Behavioral Risk Factor Surveillance System (BRFSS) responded to the growing percentage of cell phone–only households by testing changes in BRFSS survey methods to accommodate cell phone data collection. The tests included pilot studies in 18 states in 2008, and in 2010 the test was expanded to 48 states. These pilot studies gathered data from test samples including landline and cell phone–only households. This extension to cell phone collection has increased the complexity of the survey operation and data processing, including the need for different weighting techniques by mode. In 2012, the proportion of all completed BRFSS interviews conducted by cellular telephone was approximately 20 percent (Centers for Disease Control and Prevention, 2012).
In terms of response rates, the AAPOR panel found that landline RDD
surveys rarely had response rates higher than 40 percent; they are mostly in the 10 percent to 25 percent range, and often less than 10 percent. Response rates for cell phone RDD surveys are even lower: rarely above 30 percent, and mostly in the 10 percent to 15 percent range.
The AAPOR panel concluded that, as with other surveys, the main reasons for telephone survey nonresponse are noncontact, refusals, and language barriers. (Language barriers involve a failure to communicate, which often results in a nonresponse if an interpreter is not available to translate the questions and answers.)
Noncontacts are higher with shorter periods of field collection and are affected by the increased availability of caller ID, which allows households to screen incoming calls. Calling rules that are imposed by survey management may limit the number and timing of callbacks and thus may raise the noncontact rates. On the other hand, messages left on voice mail and answering machines may reduce noncontacts.
There are many reasons for refusals. Among the main reasons are the failure to contact sample members ahead of time; negative attitudes toward the sponsor and the survey organization; the survey topic; the timing of the request; confidentiality and privacy concerns; and a belief that responding will be burdensome. In some cases, the interviewers use poor introductory scripts, and they may not be able to offer incentives, or they may use incentives poorly (Lynn, 2008).
Low response rates have long been considered a major problem for mail surveys, so much so that much of the early research on improving response rates focused on mail surveys. In 1978, Heberlein and Baumgartner carried out a meta-analysis to test the effects of a large number of survey characteristics on mail response rates. Their final model predicted about two-thirds of the variation in the final response rate. Variables that had a positive effect on response rates were (a) more contacts with the sample household via advance letters, reminder postcards, sending replacement questionnaires, and telephone prompts; (b) a topic of interest to members of the target group; (c) government sponsorship of the survey; (d) target populations, such as students and military personnel, that were more likely to take part in surveys than the general population as a whole; (e) the use of special follow-up procedures, such as more expensive mailing procedures (e.g., certified mail) or personal contacts; and (f) incentives included with the first mailing. However, three factors had a negative effect on response rate: (1) the collection of marketing research information to benefit a specific firm; (2) a general population sample; and (3) long questionnaires.
Goyder (1982) replicated this study with similar results, except that the negative effect of market research sponsorship disappeared. Other studies have elaborated on these basic findings, paying particular attention to the effects of respondent incentives (Fox et al., 1988; Church, 1993).
The lessons of this early research were codified in the development of a comprehensive system designed to achieve higher response rates for mail surveys. The total design method (TDM), developed by Dillman (1978), was guided primarily by social exchange theory, which posits that questionnaire recipients are most likely to respond if they expect that the perceived benefits of responding will outweigh the costs of responding. TDM emphasizes how the elements fit together more than the effectiveness of any individual technique.
Specific well-known TDM recommendations that have been shown to be likely to help improve responses include the following:
• Use graphics and various question-writing techniques to ease the task of reading and answering the questions.
• Put some interesting questions first.
• Make the questions user-friendly.
• Print the questionnaire in a booklet format with an interesting cover.
• Use bold letters.
• Reduce the size of the booklet or use photos to make the survey seem smaller and easier to complete.
• Conduct four carefully spaced mailings beginning with the questionnaire and a cover letter and ending with a replacement questionnaire and cover letter to nonrespondents seven weeks after the original mailing.
• Include an individually printed, addressed, and signed letter.
• Print the address on the envelopes rather than use address labels.
• Explain that an ID number is used and that the respondent’s confidentiality is protected.
• Fold the materials in a way that differs from an advertisement.
To adapt the original TDM to different survey situations, such as those used in mixed-mode surveys, Dillman developed the tailored design method (Dillman et al., 2009), in which the basic elements of survey design and implementation are shaped further for particular populations, sponsorship, and content. Despite these advances in understanding the determinants of high response rates in mail surveys, which are grounded in research covering more than a quarter of a century, the challenges continue.
It is appropriate to begin consideration of approaches to improving survey response or lowering survey costs with the design of the survey. This section discusses several options, ranging from adopting a whole new approach to survey design (ABS) to more traditional methods for improving sample design.
Survey researchers have recently begun to explore the use of lists of mailing addresses as sampling frames. There are several reasons for this development, including the potential for cost savings (for surveys relying on area probability samples) and the potential for better response rates (for surveys relying on RDD sampling). Iannacchione et al. (2003) were the first to publish results on the use of the U.S. Postal Service Delivery Sequence File (DSF) as a potential sampling frame, a method that has come to be known as ABS. Link et al. (2008) were the first to publish work on switching from telephone to mail data collection and from RDD to ABS sampling. They also coined the term “address-based sampling.”
Link and his colleagues (2008) compared mail surveys based on ABS with telephone surveys based on RDD using the BRFSS questionnaire in six low-response rate states (California, Illinois, New Jersey, North Carolina, Texas, and Washington). The BRFSS covers all 50 states plus the District of Columbia. The pilot survey was conducted in parallel with the March, April, and May 2005 regular RDD data collection process. In five of the six states, the mail/ABS BRFSS achieved a higher response rate than the regular telephone/RDD BRFSS. However, after this testing, the ABS design was not implemented in the BRFSS for reasons that have not been documented.
The National Household Education Survey (NHES) program has also undertaken to transition from RDD to an ABS methodology. This new methodology was used recently in a very large NHES field test. The field test included several experiments to discover the best methods for a mail ABS approach. The experiments compared different questionnaires and survey materials, levels of incentives and mailing services, and the effects of including a pre-notice letter. Preliminary results from the field test indicate that ABS response rates were substantially higher than those attained in the last round of RDD surveys (Montaquila and Brick, 2012).
In addition to the testing and experimentation conducted with the BRFSS and NHES surveys, several other surveys have adopted an ABS design. The Health Information National Trends Survey (HINTS) of the National Cancer Institute (which used an ABS component in addition to an RDD component in 2007), the Nielsen TV Ratings Diary (which moved
from a landline RDD frame to ABS), and Knowledge Networks (which switched from RDD to ABS recruitment for its online panel surveys) will yield additional information on the ability of this design to increase response over time.
In summary, research has so far indicated that ABS provides good coverage and is also cost-effective. In conjunction with mail data collection, it appears to produce higher response rates than telephone interviewing and RDD sampling produce. However, it has been pointed out that when eligibility rates fall below a certain point, it is no longer cost-effective (Amaya and Ward, 2011). There are still major issues to be researched concerning ABS, including within-household selection of a single respondent (Montaquila et al., 2009).
Recommendation 4-2: Research is needed on how to best make a switch from the telephone survey mode (and frame) to mail, including how to ensure that the right person completes a mail survey.
Some populations are hard to include in surveys because they are very rare, difficult to identify, or elusive. When these groups are the target population for a survey, they have very high non-interview and nonresponse rates. According to a presentation to the panel by Heckathorn (2011), many hard-to-reach populations cannot be sampled using standard methods because they lack a sampling frame (list of population members), represent small proportions of the general population, have privacy concerns (e.g., stigmatized groups), or are part of networks that are hard for outsiders to penetrate (e.g., jazz musicians).
The traditional methods for sampling such hard-to-reach populations all have problems. One traditional method is to sample population members through location sampling (e.g., selecting a sample of homeless persons by selecting persons who sleep at a homeless shelter). However, such samples would exclude members who avoid those contacts; as a result, those with contacts at sample locations may differ systematically from those without them. Another approach is to draw a probability sample of population members who are accessible in public venues, but the coverage of those samples is limited because it excludes those who shun public settings.
Snowball samples (or chain-referral methods) may offer better coverage because respondents are reached through their social networks, but they produce convenience samples rather than probability samples. Hence, there is a dilemma. There is a trade-off between maximizing coverage of hard-to-reach populations and realizing the statistical advantages offered
by probability sampling. Heckathorn (2011) argued that RDS resolves this dilemma by turning chain referral into a probability sampling method.
RDS starts with eligible “seeds” to gain entry into the network. Then the seeds recruit other members of the population. There are often incentives both for participation and for recruiting. Advocates claim that there is a lower cost per case than with traditional designs; that it reduces time and demands on interviewers; that it can reach populations that traditional methods cannot; and that it eliminates travel and personal safety issues. However, the method relies on a number of critical “assumptions that must be met to determine if it is an appropriate sampling method to be used with a particular group” (Lansky et al., 2012, p. 77). Included among the assumptions is that the recruited population must know one another as members of the group, and that the members are adequately linked so that the whole population is covered.
There are several approaches for measuring nonresponse in network samples. One approach is to compare the reported network composition with the yield of actual recruits. For example, in Bridgeport, Connecticut, a sample of drug injectors yielded only blacks, although respondents reported knowing many Hispanic injectors. In this case, recruitment excluded an important group. The interview site was in a black neighborhood, where Hispanics did not feel comfortable. The solution was to move the interview site to neutral ground in downtown Bridgeport. Subsequently, recruitment of both blacks and Hispanics was more successful, and the reported network converged with the composition of the recruited sample. Comparing self-reported network composition and peer recruitment patterns provided a qualitative measure of representativeness even though it could not be expressed in a traditional response rate.1
Another approach is to ask those who are responsible for recruiting respondents about those who refused to be recruited. This technique was used in a CDC study of young drug injectors in Meriden, Connecticut. The most common reason for refusing recruitment was being “too busy” (see also Iguchi et al., 2009).
Experience to date suggests that the operational aspects of reducing nonresponse in RDS are challenging, to say the least, and the ability of the method to yield results much like probability samples is not yet proven. In her presentation to the panel’s workshop, Sandra Berry (2011) suggested that it is important for the future of this survey technique that research be conducted on the following operational aspects of this still-in-development method:
1This study was conducted as part of a research grant from CDC to the Institute for Community Research; see http://www.incommunityresearch.org/research/nhbsidu.htm [March 2013].
• How well does RDS perform in community survey contexts? How do we judge this?
• How can we get better measures of network size from individuals?
• What features of RDS can be altered and at what cost to response rates, overall bias, or the variance of the estimates?
• In what situations (populations or modes of contact and data collection) does RDS work well?
• Which of RDS’s assumptions are likely to be met in practice, and which are likely to be violated?
• How can RDS enhance and integrate with traditional data collection?
While the goal of RDS is to identify and maximize responses from a hard-to-reach population at a reasonable cost, the goal of matrix sampling is to reduce any particular respondent’s burden and thereby improve survey response rates. Matrix sampling is a procedure in which a questionnaire is split into sections of questions, and each section is then administered to subsamples of the main sample. Even though individual survey respondents answer only a part of the questionnaire, estimates can be obtained for all the variables derived from survey items (Shoemaker, 1973). Partitioning a long questionnaire into smaller, bite-sized pieces is a way to encourage people to respond more readily.
There are several examples from the fields of educational assessment, federal statistics, and public health (Gonzalez and Eltinge, 2007) in which matrix sampling has been applied:
• The largest ongoing example of matrix sampling is the National Assessment of Educational Progress (NAEP), which surveys the educational accomplishments of students in the United States. Because NAEP assesses a large number of subject-matter areas, it uses a matrix sampling design to assess students in each subject. Blocks of items drawn from each content domain are administered to groups of students, thereby making it possible to administer a large number and range of items while keeping individual testing time to an hour. Because of its design, NAEP reports only group-level results.2
• One of the major U.S. surveys to have investigated matrix sampling as a way to reduce burden and improve response is the Consumer Expenditure Quarterly Interview Survey (CEQ). Gonzalez and Eltinge (2009) conducted a simulation study using CEQ data from April 2007 to March 2008
for the full questionnaire. They then split the dataset into six subsamples, each containing a subset of items and explored different ways of imputing a full dataset for each subsample.
• Munger and Loyd (1988) looked at the viability of matrix sampling in a survey of 307 randomly selected school principals in the state of Virginia. The principals were randomly assigned to four separate groups. The first group, which consisted of 100 principals, was assigned the full questionnaire containing 61 items. The remaining three groups, each consisting of 69 principals, were each assigned a shortened questionnaire containing 27 items. The study found that the survey sample members were more likely to respond to a shortened questionnaire than to the lengthy version, even though a larger percentage of those assigned the long questionnaire said they always responded to surveys, while a larger percentage of those assigned to one of the short questionnaires said they seldom responded to surveys. The matrix sampling design required a larger overall sample to achieve the same reliability.
• Thompson et al. (2009) used matrix sampling for a survey on library services assessment to explore burden reduction and response rates. The long form of the questionnaire consisted of 22 survey items. Randomly selected participants were asked to complete a short version that contained 8 of the 22 items. The completion rates were higher for short-form survey participants relative to long-form survey participants. Moreover, the long form elicited participation from respondents who were more positive about library services, thereby exaggerating the positive assessment of library services.
Available research indicates that for lengthy surveys, matrix sampling methodology may improve cooperation rates and reduce break-offs, straight-line responses, and nonresponse to “filter” questions in order to avoid answering subsequent and more specific questions. The matrix sampling procedure is also said to have the advantage of reducing costs because a short questionnaire requires less interviewing time (Gonzalez and Eltinge, 2007). To the extent that this advantage holds, survey administrators should be able to use matrix sampling to achieve higher response rates with lower costs.
The method poses challenges to those who analyze the data. Joint analysis of data that are not included in all versions of the questionnaire requires strong assumptions about the distribution of the unobserved correlations; for some types of data, this is a severe limitation. In addition, the sampling variance of estimates may be increased without an increase in the overall sample size.
Another promising avenue for investigation is tailoring the mode of data collection to the target population. This tailoring of modes has long been a key consideration in the survey design stage. Today, with technological advances and new communications options, survey managers have new and exciting options of employing targeted modes to maximize response and minimize cost in real time through the intelligent use of paradata. In this section, we discuss cell phone options, the use of the Internet, and self-administered modes.
Cell Phone Surveys
The explosive growth in cell phone usage has created challenges for survey managers even as it has opened new possibilities for survey operations. A recent AAPOR task force report on cell phone survey techniques (American Association for Public Opinion Research, 2010a) suggested several strategies for improving response rates for this mode.
Among the report’s suggestions are using longer field periods, making advance contact (which is not possible with cell RDD numbers), tailoring the caller ID display, leaving voice messages to encourage cooperation, and preparing well-written introductory scripts that allow for easy tailoring to individual respondents. The introductory contact is especially important in calling cell telephones, for which an advance letter is not usually possible. Offering remuneration for cell phone costs and contingent incentives to try to stimulate cooperation among sample members who might otherwise refuse are two strategies that are often effective, provided that interviewers are well trained on when and how to offer the incentives. In addition, offering a short version of the questionnaire, thus lowering respondent burden, may help, as may offering multiple modes to respond.
In discussing the AAPOR report, its chair, Paul Lavrakas (2011), said that there is a need for research on countering nonresponse. Traditionally all sample members have been approached with a “one-size-fits-all” recruitment method. Although this approach makes practical and operational sense, it fails to take advantage of the computer-assisted environments that support surveys today.
One line of inquiry would be to test matching interviewer and respondent characteristics, including language and dialect, and to examine the impact of those characteristics on participation. Even on the telephone, an interviewer’s voice may convey information to a respondent about that interviewer’s characteristics. In theory, a respondent will have a greater affinity for a stranger (the interviewer) who is thought to be similar to the respondent. A recent review found no experimental studies on matching
interviewers and respondents on social characteristics (Schaeffer et al., 2010). One social characteristic for which the interaction between the sample member’s characteristic and that of the interviewer has been examined is race; the available non-experimental studies found no significant effect of race of interviewer on participation (Singer et al., 1983; Merkle and Edelman, 2002).
As researchers pursue means of increasing response, it should be recognized that there are limits as to what efforts can be effective. For example, Brick and Williams (2009) speculated that the increased number of callbacks in telephone surveys may actually increase households’ inclination to refuse.
Internet Panel Surveys
Many survey researchers see increased use of the Web as the key to controlling escalating data collection costs in surveys. In the committee’s workshop, Reg Baker, chair of the AAPOR panel on online surveys, summarized the results of the AAPOR panel’s study of these surveys (American Association for Public Opinion Research, 2010b). The AAPOR task force concluded that probability-based online panels can provide good coverage of the general population (since they provide Internet access to those lacking it), but overall response rates tend to be very low (5 to 15 percent). Nonprobability designs, involving pre-selected respondents (“panels”), generally ignore coverage error, and they report participation rates to specific survey requests anywhere from less than 1 to 20 percent.
Thus, the reduced costs from the use of online panels come at a price. Most panels use non-probability samples, provide poor coverage, and obtain low rates of participation. These issues with Internet panels have led to the development and publication of international quality standards for access panels, which are becoming a key tool of market, opinion, and research (ISO 26362). The standard lays out criteria for assessing the quality of access panels and applies to all types of access panels, whether Internet or not. The ISO standard aims to provide international criteria to help compare the results of access panels worldwide (International Standards Organization, 2009).
Online surveys are one type of self-administered survey, but there are other types as well. Couper (2011a, 2011b) categorizes self-administered modes as fully self-administered or as involving interviewers. Those that are fully self-administered include surveys conducted by mail, Web, and inbound or automated outbound interactive voice response (IVR). Those
that are self-administered with interviewer involvement include computer-assisted self-interviewing (CASI), audio computer-assisted self-interviewing (ACASI), recruit-and-switch IVR or telephone audio computer-assisted self-interviewing (T-ACASI), and paper-and-pencil self-administered questionnaire (SAQ).
Couper (2011a) makes the argument that self-administered modes have some measurement advantages and are generally cheaper but do not solve the inferential issues facing surveys (especially coverage and nonresponse). In Couper’s view, self-administered modes will increasingly supplement rather than replace interviewer administration. He outlined the opportunities and challenges facing each of the modes.
Fully self-administered modes are less expensive than interviewer-administered modes, and they reduce social desirability effects (that is, respondents providing answers that they believe are more socially acceptable). With mail surveys, the respondent can take time to consider answers, look up records, and consult other household members. Mail surveys have the potential to allow a respondent to reread complex questions, thus reducing the load on working memory. The Web has all the advantages of mail, plus those of computerization.
The mode that is selected for the self-administered questionnaire makes a difference in eliciting survey responses. Kim et al. (2010) examined the nonresponse correlates for self-administered questionnaires using paper-and-pencil personal interview (PAPI) versus those conducted in a computer-assisted personal interview (CAPI) and CASI format. The authors found that CASI not only was associated with lower response rates compared with the other modes but also affected response dynamics. Those age 45 to 64 and blacks and other ethnic groups were more likely to be nonrespondents with CASI.
Fully self-administered modes have disadvantages that can affect the quality of the responses. There is no interviewer available to motivate the sample member or to provide clarifications. IVR surveys likely experience more nonresponse break-offs than other modes (see, for example, Kreuter et al., 2008).
Researchers wanting to use the Web as a principal data collection mode face sampling and coverage issues. There is no general population frame of Internet users nor is there an RDD-like mechanism to generate one. That means, for probability samples, that the frame must come from elsewhere (e.g., RDD, ABS, or traditional area-probability samples). Although Internet penetration is more than 70 percent, considerable disparities exist between those who have Internet access and those who do not and that may bias the estimates for the general population.
Some approaches to Internet surveys restrict inference to the population with access to the Internet (which may be a poor substitute for a general
population) or dispense with probability sampling altogether. Non-probability online samples may be based on such techniques as “river sampling” (in which participants are recruited using banner ads, pop-up ads, or similar methods, screened for their demographic characteristics, and assigned to an appropriate survey) and RDS as described above. While these techniques may yield a willing population, they do not result in a representative population and thus cannot yield generalizable inferences. In some panels, the survey researchers have provided equipment for those without Internet access (e.g., the Knowledge Network’s KnowledgePanel, the Face-to-Face Recruited Internet Survey Platform, the Measurement and Experimentation in the Social Sciences and Longitudinal Internet Studies for the Social Sciences panels in the Netherlands, and the RAND American Life Panel).
Some surveys address the coverage problem by using a mixed-mode design with mail for non-Internet cases. The Gallup panel is one example of this approach. A research experiment in 2007 tested the effects of various approaches and incentives for improving response in this multimode panel with Internet and mail components (Rao et al., 2010).
There has been increased interest over the last two decades in mixed-mode alternatives. The thinking is that if surveys that rely on a single mode have unacceptably low response rates, then combining modes may take advantage of different modes to increase response rates and potentially reduce nonresponse bias. In a presentation to the panel, Mick Couper (2011a) suggested that the research evidence to date is quite mixed and that success may depend on how the modes are mixed and on the evaluation criteria used (e.g., cost, coverage, nonresponse, or measurement error).
Some mixed-mode methods have proven more productive than others, while some may actually increase nonresponse. Research findings have determined that mail-plus-phone designs produce higher response rates than Web-plus-phone designs and that giving respondents a choice of mode is less effective than offering each mode in sequence (Cantor et al., 2009). Couper observed that while mixed modes may reduce errors of nonobservation by improving coverage versus Internet or telephone-only modes or may reduce nonresponse bias relating to literacy relative to mail-only methods, mixing modes may add complications in terms of measurement error (Couper, 2011a). Nonetheless, the mixed-mode approach has gained in popularity over time, particularly for large government-sponsored social science surveys.
Growth of Multiple Mode Surveys
The use of mixed survey modes for conducting surveys has been growing fairly extensively and over a long period.3 With the continued growth in the use of mixed modes, the methodology has advanced from buzzword (see Dillman and Tarnai, 1988) to widespread usage.
The research interest in various modes has changed over time. Telephone and personal interview modes have played a dominant role in the mix for some time. Mail has increasingly become a part of the mix. The resurgence in the use of mail as a mode has probably been due to the large drop in response rates in telephone studies and the development of near-comprehensive ABS frames, such as the U.S. Postal Service’s DSF. More recently, research has focused on the use of mail to induce respondents to use the Internet, which has significant cost savings over interviewer-administered modes (and perhaps over mail self-administered questionnaires) and the additional benefit of more complex instruments being made possible by the Web.
In research focusing on one statewide general public household survey, the 2008 and 2009 Washington Community Survey, Messer and Dillman (2010, 2011) sampled from the DSF and asked respondents in nine and six treatment groups, respectively, to respond by Internet or mail or both. The treatment groups varied the procedures and incentives for the Web–mail implementations. The mail-only groups responded at higher rates than the Web panels, but both achieved higher response rates than might be expected with only an RDD telephone survey. Yet despite this and other research on Web-mail surveys, Messer and Dillman conclude that it “remains unclear as to what procedures are most effective in using the DSF with mail and the Internet survey modes to obtain acceptable levels of non-response” (2010, p. v). (See also Messer and Dillman, 2011.)
Shih and Fan (2008) conducted a meta-analysis of experiments comparing Web and mail response rates in some 39 studies. They observed “a preference of mail survey mode over the Web survey mode, with the mail survey mode response rate being 14 percent higher than the Web-survey mode response rate” (p. 269). However, when sample members were offered both mode options at the same time, there was no significant difference in response rates. This suggested to the researchers that it would be advantageous to offer nonrespondents in one mode a different mode in the follow-up.
Their meta-analysis considered several study features that might have affected response rate differences between modes, including “(1) whether
3The literature uses the terms “mixed mode,” “multimode,” and “multiple modes” interchangeably. In this report, we simply refer to mixed mode.
potential respondents in a comparative study were randomly assigned to receive Web or mail surveys; (2) what type of population was involved; (3) what incentive was provided; (4) whether there was a follow-up reminder for initial nonrespondents; and (5) the year a study was published” (p. 255). They found that two of the study features (population types and follow-up reminders) contributed to the response rate differences between Web and paper surveys. College sample members appeared to be more responsive to Web surveys, while some other sample member types (e.g., medical doctors, school teachers, and general consumers) appeared to prefer traditional mail surveys. Follow-up reminders appeared to be less effective for Web surveys than for mail surveys.
American Community Survey: A Sequential Mixed-Mode Case Study4
The most ambitious use of a mixed-mode approach to improve survey response rates is the approach in the American Community Survey (ACS). The ACS is an ongoing survey designed to provide information about small areas. It was developed to replace the long-form survey that was part of the decennial census for many decades. The ACS is conducted on a continuous basis. The data from a given year are released in the fall of the following year. Each month, the ACS questionnaire—similar in content to the census long form—is mailed to 250,000 housing units that have been sampled from the Census Bureau’s Master Address File.5
The ACS adopted a mixed-mode approach based on extensive research. Three sequential modes were selected for monthly data collection: mail, telephone, and personal visit. For the mail option, the residential housing unit with usable mailing addresses—about 95 percent of each month’s sample—are sent a pre-notification letter, followed four days later by a questionnaire booklet. A reminder postcard is sent three days after the questionnaire mailing. Whenever a questionnaire is not returned by mail within three weeks, a second questionnaire is mailed to the address. If there is still no response and if the Census Bureau is able to obtain a telephone number for the address, trained interviewers will conduct telephone follow-up surveys using computer-assisted telephone interviewing (CATI) equipment. Interviewers also follow up on a sample of the following: households at addresses for which no mail or CATI responses are received after two months, households for which the postal service returned the questionnaire because it could not be delivered as addressed, and households for which a questionnaire could not be sent because the address was not in the proper street name and number format. The interviewers visit housing units in
4This discussion is based on a presentation by Deborah Griffin (2011).
5The monthly sample size was increased in June 2011 to almost 300,000 housing units.
person and collect the ACS data through CAPI (or, in 20 percent of the cases, the follow-up is conducted by telephone).
The pattern of response rates across three modes shows that self-selection does take place. For example, sample members from households in less economically advantaged areas and ethnic enclaves are less likely to respond to the mail surveys than are other households. ACS 2006 data also showed that individuals not in the labor force were more likely than those who were employed to respond to the mail mode, while those with no high school education had low response to the initial mail questionnaire and were more likely to participate by telephone or personal interview.
ACS data show how the sequential mode design improves not only participation across different social groups but also overall response rates. The weighted mail response rate has stayed between 55 and 57 percent in the first five survey rounds. For the same period, by contrast, the weighted telephone response rate dropped from 60.4 to 50 percent, while the weighted personal visit response rate increased from 94.3 to 95.6 percent. The weighted combined-mode response rate was around 98 percent from 2005 to 2009.
Recently, the Census Bureau conducted research on using the Internet as a response mode for the ACS with the goal of reducing costs.6 Based on favorable results in response rates and data quality, an Internet response option was implemented in mid-December 2012. Most households are sent a letter urging them to respond via the Internet and providing secure sign-on information. Only if they do not respond within two weeks are they sent a paper questionnaire.
Mixed Modes in Panel Studies
Panel studies provide a rich source of data for understanding mode effects because interview modes may change between waves, and the effects of changes in modes can be examined for individual sample members as well as in the aggregate. Longitudinal studies, such as the National Longitudinal Survey of Youth (NLSY), the Panel Study of Income Dynamics (PSID), and the Health and Retirement Study (HRS), commonly show variations in aggregate response rates from wave to wave, but these changes may reflect not only changes in mode but also other changes in field procedures, the aging of the sample or increasing fatigue with participation, and secular changes affecting sample members’ propensity to respond.
6See http://www.census.gov/acs/www/Downloads/library/2012/2012_Matthews_01.pdf; http://www.census.gov/acs/www/Downloads/library/2012/2012_Matthews_01.pdf; and http://www.census.gov/newsroom/releases/archives/american_community_survey_acs/cb12-247.html [January 2013].
Longitudinal studies have increased their use of telephone interviewing in order to contain costs. They have also replaced PAPI interviews with CAPI interviewing. These mode changes do not seem to have affected response rates, likely because respondents in longitudinal studies have already made a commitment to the survey and have already had some experience with the interview process. (Refer back to Tables 1-6 through 1-9 in Chapter 1 for response rate information for the NLSY79, NLSY97, PSID, and HRS.)
One in-depth study of mode effects in a longitudinal study used the Round 11 CAPI experiment data of the NLSY97 to compare the differences in CAPI and PAPI interviews (Baker et al., 1995). The introduction of CAPI reduced branching and skipping errors made by interviewers because the computer program acted as a checking and editing mechanism. The average difference in interview length between the two modes was only 0.9 minutes. A few measures were affected by the switchover. For example, the proportion of respondents who reported that they were paid by the hour was higher in CAPI. On a separate questionnaire, many CAPI survey respondents reported that they were more willing to be forthright in their responses to sensitive questions than they had been in their previous NLSY interviews using paper-and-pencil questionnaires. This result, which has been widely replicated, occurred presumably because there was a greater perception of anonymity when the interviewer entered the answers on the computer screen instead of a form that had the respondent’s identifiable information on it.
A question is whether longitudinal surveys can maintain high response rates in the Internet age. One study that evaluated the impact of shifting to the Internet for the HRS concluded that changes in the wording of questions (which often accompany a mode shift) had more of an effect than the change in the interview mode (Van Soest and Kapteyn, 2009a).
The same study investigated the issue of selection effects. Van Soest and Kapteyn (2009a) used a random sample of HRS 2002 and HRS 2004 respondents to investigate the mode effect of Internet surveys on measurement of household assets: checking accounts, savings accounts, stocks, and stock mutual funds. The 2002 and 2004 questionnaires contained questions on Internet access and willingness to participate in an Internet survey in between the biannual surveys. Those who were willing were administered the Internet questionnaires in 2003 and 2006. The authors analyzed these responses along with overlapping items from the 2002 and 2004 core survey questionnaires and found large selection effects. Respondents who used the Internet mode were likely to own more stocks and stock mutual funds than other respondents. HRS Internet 2003 survey respondents not only owned larger amounts of stocks and stock mutual funds; they also had more money in checking and saving accounts relative to respondents in HRS Internet 2006, HRS 2002, and HRS 2004.
More such experiments are required before panel surveys can move to the Internet mode with confidence, as each new mode shift brings its own set of challenges. Such experiments should also look into how quickly respondents of different types are able to learn to respond on the Web, particularly as technological innovations (and the type of data requested) make Web instruments more complex and demanding.
As more researchers turn to mixed-mode designs in an effort to maintain response rates, it is increasingly important to conduct research on mode effects, not only on response rates but also on measurement errors. Furthermore, mode research needs to go beyond simple comparisons that document differences between modes to use of stronger measurement criteria, such as the impact of mode on reliability and validity.
Recommendation 4-3: Research is needed on understanding mode effects, including ways in which mixed-mode designs affect both nonresponse and measurement error and the impact of modes on reliability and validity.
In the personal interview setting, whether face to face or over the telephone, the role of the interviewer in securing participation needs to be considered. The model of survey cooperation proposed by Groves and Couper (1998) is useful in this regard. Their model distinguishes among various factors that could influence survey participation on the basis of whether the factors are under the researcher’s control. For example, the social environment and behavior characteristics of household members are not under the researcher’s control. In addition, the sample member brings his or her underlying propensity to participate to the encounter with the interviewer (or to avoiding an encounter with the interviewer).
The researcher chooses the design of the survey and selects and trains the interviewers. Ultimately, the interaction between the interviewer and the household member or members often determines the sample member’s decision to participate in the survey. Because researchers have some control over the behavior of interviewers through training and monitoring, and because personal encounters are believed to have some inherent persuasive capacity, substantial responsibility for generating high response rates rests with the interviewer. However, interviewer training, previous experience at interviewing, the characteristics of the assignment area, features of the survey design, and socioeconomic characteristics of respondents that affect
interviewer expectations all influence the interviewer’s behavior during the recruitment process.
To understand how strongly such factors influence survey response and cooperation, researchers have looked into assignment characteristics, observable and unobservable personal characteristics of the interviewer, and the behavior of the interviewer. The following section summarizes the findings and conclusions from the research done in these three areas. (These and other factors are discussed in the review by Schaeffer et al., 2010.)
Differences in interviewers’ success may be due to differences in their assignments. Assignments may not be random in centralized phone facilities, particularly after an initial contact in which some information about the household is obtained, as better interviewers may be assigned to more difficult cases (those with lower probability of success in refusal conversions). In studies that have used random assignment, there is evidence that variability in survey participation rates is influenced by the characteristics both of the assigned cases and of the interviewer.
In the few face-to-face studies with interpenetrated designs that allow the influence of the interviewer on participation to be separated from the influence of the assignment area, it appears that the interviewer contributes at least as much or more to variance in response rates than area does (O’Muircheartaigh and Campanelli, 1998; Schnell and Kreuter, 2005). In addition, some of what appears to be interviewer variance in survey responses may be due to the effects of the interviewer on participation rather than on measurement (West and Olson, 2010).
Observable Personal Characteristics
The impact on participation of an interviewer’s personal characteristics, such as gender, race, age, education, and voice, has been analyzed by various researchers. In research on gender, Fowler and Mangione (1990) found that respondents described female interviewers as “friendly,” while Morton-Williams (1993) found that respondents perceived female interviewers to be “approachable.” Do friendliness and approachability result in higher response rates for female interviewers? Campanelli and O’Muircheartaigh (1999) and Hox and De Leeuw (2002) found a positive effect, with female interviewers obtaining higher response rates. Campanelli and O’Muircheartaigh (1999) used data from an experiment implemented during the second wave of the British Household Panel Survey and found that female interviewers had higher response rates than male interviewers.
Hox and De Leeuw (2002), in a comparison of 32 surveys from nine countries, found female interviewers to have response rates that were, on average, 0.8 percentage point higher than those of male interviewers. Similarly, Groves et al. (2008) found that “less masculine” voices tended to generate higher response rates, although Durrant et al. (2010) found the effect to be restricted to female respondents.
Very few studies have investigated the association between an interviewer’s race and participation. In an early study, Singer et al. (1983) looked at the effects of an interviewer’s personal characteristics and expectations on response rates in a telephone and personal interview survey. The race of the interviewer did not significantly explain variation in response rates. Merkle and Edelman (2002) found a similar result when examining nonresponse rates in exit polls.
Does the age of the interviewer make a difference? The four studies mentioned above (Singer et al., 1983; Campanelli and O’Muircheartaigh, 1999; Hox and de Leeuw, 2002; Merkle and Edelman, 2002) also included age of the interviewer as one of the controls or independent variables in their regression models. These four studies found that older interviewers were able to elicit higher response rates.
The educational background of the interviewer may play a role in participation rates. Durrant et al. (2010) investigated the effects of characteristics of both the interviewer and household members using a study in which U.K. census data were matched to survey data. They found that when the education levels of interviewers closely matched those of sample persons, higher cooperation rates were observed.
Vocal characteristics of the interviewer may also play a role. Lower refusal rates were found among interviewers rated as speaking more quickly, loudly, distinctly, and in a higher pitch (Oksenberg et al., 1986). Participation may be better predicted by how the voice of the interviewer is perceived than by actual acoustical vocal qualities (Van der Vaart et al., 2006). In telephone interviews, a moderate level of speech disfluency, such as false starts and non-lexical utterances in the flow of otherwise fluent speech, may actually result in higher response rates (Conrad et al., 2010).
Although it is clear from this discussion that interviewer and sample person characteristics do play a role in survey response, it is not apparent that matching those characteristics necessarily results in improved response rates. Davis et al. (2010) observed that there is surprisingly little evidence to indicate whether sociodemographic interviewer–respondent matching improves survey response rates. A recent study that tested whether local or outside interviewers had better response rates suggested that outside interviewers had a better chance of obtaining sensitive information (Sana et al., 2012).
Unobservable Personal Characteristics
Some characteristics of interviewers that are not directly observed by the respondent, such as the interviewer’s attitudes and expectations, may play a role in securing a response. Researchers have used such measures as an interviewer’s confidence, attitudes about persuasion, belief in confidentiality and the importance of refusal conversions, and an expression of willingness to proceed in the face of obstacles to determine if unobservable characteristics could play a role in securing a response.
In the work cited above, Durrant et al. (2010) found that interviewer confidence and attitudes toward persuading reluctant respondents play an important role in reducing refusal rates. Groves and Couper (1998), however, found that a measure of tailoring derived from contact forms (i.e., a measure of how well the interviewer adapted to household characteristics), was not a significant explanatory variable. Sinibaldi et al. (2009) looked at the interviewer’s personality traits and interpersonal skills. They found that extroverted interviewers and more conscientious interviewers were more likely to achieve cooperation from respondents. They also found that interpersonal skills were not predictive of cooperation rates. Therefore, the available literature does not offer a clear picture of the mechanism connecting an interviewer’s unobservable characteristics and the survey participation that he or she achieves.
Experience, however, is important. Work done by Campanelli et al. (1997), Groves and Couper (1998), and Sinibaldi et al. (2009) found that experienced interviewers were more successful, as measured by the likelihood of obtaining a completed survey. In a range of designs and across modes, experience was found to relate positively to cooperation, as interviewers with five or more years of experience were better able to overcome negative responses (Durbin and Stuart, 1951; Groves and Fultz, 1985; Couper and Groves, 1992; Groves and Couper, 1998; Pickery and Looseveldt, 1998; Hox and De Leeuw, 2002; Sinibaldi et al., 2009; Durrant et al., 2010). Experience is also related to lower non-contact rates in face-to-face and telephone interviews; two studies suggest that interviewers who succeed at one succeed at the other (O’Muircheartaigh and Campanelli, 1999; Pickery and Looseveldt, 2002).
These results, however, may reflect self-selection of interviewers—experienced interviewers are more likely to have been successful from the outset and therefore more likely to stay in the survey business compared with less experienced interviewers. Finally, it matters how experience is defined. Studies that measured experience as “number of organizations” and “number of surveys” found no relationship or a negative relationship with cooperation rates.
Interactions Between Sample Person and Interviewer
Studies on the factors influencing participation have begun looking more closely at the interaction between the interviewer and the sample person, which has three main phases. The first phase is the interaction during the survey introduction, which takes place for less than a minute on the phone (Oksenberg et al., 1986) and up to five minutes in the case of a face-to-face interview (Groves and Couper, 1994). The second phase is the persuasion attempt by the interviewer if he or she faces reluctance from the householder to participate in a survey. If the householder agrees to participate in the survey, then the third phase of interaction takes place, in which the interviewer elicits responses to survey questions. Research investigating the interaction of the interviewer and householders has looked at all three phases; only the first two phases are relevant for survey nonresponse decisions.
The theory proposed by Groves and Couper (1998) provided a description of two techniques that should be employed by interviewers during the three phases: tailoring and maintaining interaction. Tailoring is the technique employed by expert interviewers who customize their interactions with sample persons based on a variety of cues. Maintaining interaction is a technique in which interviewers continue engaging respondents in conversation to obtain more information for tailoring and to reduce the likelihood that sample members will refuse to participate in a given turn of talk. The authors stressed the fact that interaction must be maintained for tailoring to occur.
Groves and McGonagle (2001) developed nonresponse aversion training based on these two concepts. They broke the task of the interviewer into four steps: (1) identifying the concern, (2) classifying it, (3) providing an appropriate response, and (4) performing those tasks as quickly as possible. The training improved the response rates of interviewers, especially for those who had lower response rates before the training. Relatedly, Dijkstra and Smit (2002) recorded and analyzed spontaneously occurring persuasion techniques and found that such techniques increased participation.
Survey introductions can vary in content (sponsor’s name, confidentiality concerns), amount of information (level of detail about topic), and scriptedness. O’Neil et al. (1979) experimentally varied what the interviewer said after a short introduction and found marginal differences in response rates between groups who were administered different sets of introductions. In a telephone survey, Singer et al. (1983) varied the information provided to sampled households on survey content and on the purpose
of the interview. This variation did not affect the overall response rate. In another study, scripted introductions were found to generate lower response rates (Morton-Williams, 1993).
Houtkoop-Steenstra and van den Bergh (2000) hypothesized that if interviewers varied their survey introduction style, without altering the content, they could achieve greater cooperation. They looked at response rates in a telephone survey in the Netherlands. Four types of introductions were given. The first was an agenda-based introduction, in which the interviewers formulated their own introductions on the basis of a limited number of catchwords. The other three were standardized introductions of varying length—short, medium, and long. The short version included a greeting and request for participation which were not part of the agenda-based introduction. The medium version included the elements of the short version and the reason for calling. The long version included elements that in theory and sometimes in research increase response rates, such as (a) the information about the length of interview (“The interview will not take long.”); (b) the nature of questions (“The questions are simple.”); (c) an authority statement mentioning the name of the company (“You may know about us from television.”); (d) a statement about the importance of the information (“Your opinion is important.”); and (e) a confidentiality statement. The authors did not find any differences among the respondent groups assigned to the standardized introductions, but the agenda-based introduction induced higher response rates.
A recent study by Maynard et al. (2010) discusses the leverage–saliency framework outlined in Chapter 1. Interviewers may increase the probability of obtaining a response by emphasizing features of the study or participation with “positive leverage and neutralizing the salience of those with negative leverage” (p. 792). The authors point out that the theory accords with actual practice—interviewers tend to emphasize positive aspects of participating or downplay negative aspects. For example, an interviewer might acknowledge that an interview takes a long time but note that it can be broken into parts. By emphasizing that the leverage a survey attribute has differs across sample persons, leverage–saliency theory calls attention to the importance that interviewers tailor requests to individual sample persons. Interviewers can encourage participation by “observ[ing] idiosyncratic concerns of the householder and customiz[ing] their remarks to those concerns” (Groves et al., 2000, p. 299; see also Couper and Groves, 1992; Groves et al., 1992; Maynard and Schaeffer, 2002).
In a presentation to the panel, Schaeffer pointed out that approaches such as leverage–saliency theory draw attention to the predispositions of
the sample member (Schaeffer, 2011). However, the response propensity that the sample member brings to the contact with the interviewer might be modified over the course of the encounter and may affect the leverage that a feature of the survey design has with a respondent. These propensities, and their fluctuations, are difficult to incorporate into practical study designs. However, using conversation analytic techniques, Schaeffer et al. (2013) found that the interactional environment provided by the sample member (encouraging, discouraging, or ambiguous) is a very strong predictor of subsequent participation.
Questions by sample members may provide evidence of their predispositions. Previous studies have identified questions by sample members as predictive of whether the sample member is likely to accept the request to participate. Drawing on interviewers’ descriptions of their interaction with sample members, Groves and Couper (1996), for example, concluded that questions indicate cognitive engagement by the sample member and are associated with an increased likelihood of participation in future contacts. A more recent study that selected pairs of sample members matched on propensity to participate refined this finding. Schaeffer et al. (2013) found that wh-type questions (i.e., questions beginning with “wh,” such as what, why, when) before the request to participate were associated with decreased odds of participating. On the other hand, questions about the length of interview or wh-type questions after the request to participate were associated with increased odds of participating. The predictive value of sample members’ questions is of practical significance because interviewers could be alerted to interpret such questions as a sign that the sample member is positively engaged.
In a new survey, all interviewers begin equal, with no knowledge about the survey; even in existing surveys, experienced interviewers may not clearly recall all the relevant facts. Training provides an opportunity for the survey designers to make the factual information sufficiently salient to interviewers that it can shape their engagement with respondents. If interviewers do not understand a given study, it seems likely that they will be less effective in motivating respondents to participate.
Besides making information mentally accessible, training can help in developing interviewers’ strategies for interactions with respondents. Demonstrations of various approaches can provide models of effective recruitment behavior. Role playing, particularly when accompanied by coaching, can assist in developing confident introductions and delivery of particular arguments. Role playing can be effective both in helping interviewers cope with the stresses of rejection and in learning how to back off gracefully
without completely forestalling future attempts. When a variety of survey materials are available for providing to respondents, training can be effective in determining when different pieces of information might be most appropriate.
Training can also be seen as ongoing throughout the field period. As interviewers learn more about the respondent population, they can interact with more senior staff for coaching. To the extent that information is available about actual performance—through recordings, direct observation, or notes recorded by interviewers as part of their record keeping—such feedback can be more focused.
Recommendation 4-4: Research is needed on the structure and content of interviewer training as well as on the value of continued coaching of interviewers. Where possible, experiments should be done to identify the most effective techniques.
Concluding Remarks on the Role of Interviewers
In summary, interviewers play a valuable role in obtaining survey responses. The survey participation literature summarized above has scrutinized various aspects of that role. But it is important to acknowledge that an interviewer’s actions are very much dependent on sampling frame, survey design, survey mode, and interviewer training. Future research studies investigating an interviewer’s role in survey participation should provide insights into how to integrate interviewers’ efforts with design features. Interviewers can be provided material that will contain information on respondents and records of prior contacts. Interviewer training can explain the importance of participation, how to assure the respondents of confidentiality, how to approach previous refusals, how to diagnose reluctance and respond appropriately, how to make a graceful exit, and various strategies to handle high-priority but low-propensity cases. As for respondents, they can be persuaded through advance letters, survey materials (explaining reasons for conducting the survey or addressing respondents’ fears or reservations directly), and (monetary) incentives. Further empirical research into survey participation requires collection of more information on interviewers and behavior of respondents.
Singer (2011) spoke to the panel on the use of monetary incentives to counter the trend toward increasing nonresponse in national household surveys. She noted that monetary incentives, especially prepaid incentives, are being employed more often (Singer and Ye, 2013). Her talk summarized
research on incentives focusing on findings largely drawn from randomized experiments. She examined the effects of incentives on response quality, sample composition, and response distributions. She noted that incentives have been found to reduce nonresponse rates, primarily by reducing refusals, but that little is known about their effect on nonresponse bias.
There are a number of answers to the question of why people respond to surveys. All theories of survey response emphasize the role of incentives in motivating behavior, though these need not be monetary incentives. Singer noted that results from responses to open-ended questions suggest that there are three main reasons for responding to surveys: altruistic reasons (e.g., wanting to be helpful); egoistic reasons (e.g., monetary incentives); and reasons associated with aspects of the survey (e.g., topic interest, trust in the sponsor). Both theory and observation confirm the importance of incentives for participation in surveys.
Effects on Cooperation
Incentives improve cooperation (Church, 1993; Singer and Ye, 2013). For example, Mann et al. (2008) reported that in a longitudinal study of young adults, parents receiving incentives of $1 or $2 were more likely than those receiving no incentives to provide addresses for their adult children, and children of parents receiving incentives responded more quickly to the survey.
In another study, Holbrook et al. (2008) analyzed 114 RDD surveys between 1996 and 2005 and found, after controlling for other variables, that incentives were significantly associated with higher response rates, with the effect due mainly to a reduction in refusals (with no change in contact rates).
Beydoun et al. (2006) compared the results of unconditional (prepaid) and conditional (promised) incentives on tracing and contact rates in a sample of Iowa postpartum women. The unconditional incentive rates were slightly higher than were the conditional rates, and the highest rates were attained when the incentives were combined.
Effects in Mail Surveys
One meta-analysis by Church (1993) found that prepaid incentives yielded significantly higher response rates to mail surveys than promised or no incentives, that they yielded higher response rates than gifts, and that response rates increased with increasing amounts of money. In another meta-analysis, Edwards et al. (2002) reported similar results.
Effects in Interviewer-Mediated Surveys
A meta-analysis of 39 experiments by Singer et al. (1999) found results for surveys using interviewers that were similar to those in mail surveys, although the effects of incentives were generally smaller. The analysis of 114 RDD surveys by Holbrook et al. (2008) found that surveys offering incentives had significantly higher response rates than those offering no incentives; the effect came mainly from a reduction in refusals. The 2008 analysis by Cantor et al. of 23 RDD experiments found that:
• a prepayment of $1 to $5 increased response rates from 2 to 12 percentage points;
• larger incentives led to higher response rates;
• the effect of incentives has not declined over time, but baseline response rates have dropped substantially;
• incentives at refusal conversion had about the same effect as those sent at initial contact; and • promised incentives of $5 and $25 did not increase response rates compared to no incentives, but promising larger incentives sometimes did.
These findings are generally consistent with other experiments involving interviewer-mediated surveys, including face-to-face surveys.
Effects in Cell Phone Surveys
Brick et al. (2007) conducted a dual-frame survey including both cell phones and landlines that include an incentive experiment. This study used two promised incentive conditions ($10, $5) and two message conditions (sample members notified of the survey and incentive by text messaging, or not notified). They found that the $10 group had a higher screener response rate than the $5 group as well as a higher cooperation rate. The message had no effect on either screener or survey response rates, and there were no interaction effects.
Incentives in Longitudinal Studies
Longitudinal surveys have special issues, because incentives are usually part of a larger motivational package designed to retain respondents. As in cross-sectional studies, initial survey round incentives have been found to increase response rates, usually by reducing refusals but sometimes by reducing non-contacts (e.g., McGrath, 2006). Some studies suggest that an initial payment may continue to motivate participation in subsequent waves (Singer and Kulka, 2001; McGrath, 2006; Creighton et al., 2007;
Goldenberg et al., 2009). Singer concluded that incentives appear to increase response among those who have previously refused, but not among those who have previously cooperated (Zagorsky and Rhoton, 2008).
In a study by Jaeckle and Lynn (2008) of incentive payments in a U.K. longitudinal study, the researchers found that (1) attrition was significantly reduced by incentives in all waves, (2) the attrition was reduced proportionately among subgroups and so did not reduce attrition bias, (3) the effect of the incentive decreased across waves, (4) incentives increased item nonresponse, but (5) there was a net gain in information.
The NLSY97 has been a rich source of analysis of the effects of incentive payments on participation in a longitudinal survey because, from the beginning, NLSY management has had discretion over the level of incentives to be offered to participants. The amount of the incentive has also been adjusted on an experimental basis. In an early study conducted in NLSY97 Round 4 and extended into Round 5, Datta et al. (2001) found that sample members who were paid $20 had higher participation rates than those paid $10 or $15. However, there were no measurable effects on data quality from the higher level of incentives.
Subsequent NLSY97 experiments found that higher incentives had a particular effect on bringing those who dropped from prior rounds back into participation in later rounds. Pierret et al. (2007) studied the results of the incentive experiments and concluded that incentives moderately increased response rates and had a greater impact on those respondents who did not participate in the previous round relative to those who did participate.
An incentive experiment was conducted as part of the 2000 wave of HRS, in which the incentive amount was increased from $20 to $30 or $50. Rodgers (2011) found an improvement in response rates as the incentive increased. A lowered incentive amount of $40 in subsequent rounds or waves did not result in lowered response rates. He also found a statistically significant decrease in item nonresponse among respondents receiving larger incentives.
Other Findings on Incentive Effects
Two experiments failed to find a role for interviewers in mediating incentive effects. Singer et al. (2000) kept interviewers blind to households’ receipt of incentives in one condition but not in another and found that there were no differences in incentive effects between the two conditions. Lynn (2001) randomly offered promised incentives to half of each interviewer’s assigned households and then asked interviewers how useful they thought the incentives had been. Interviewers’ judgments were almost uniformly negative, but incentives had significant positive effects
on completion of a household interview, completion of interviews with individual members, and completion of time use diaries. Nevertheless, as Singer (2011) pointed out to the panel, interviewer expectations may have an independent effect on respondent behavior—for example, Lynn’s effects might have been larger had interviewers had a more positive view of incentives. Also, the possibility of contamination in Singer’s experiment cannot be entirely ruled out since the same interviewers administered both conditions. It may also be that incentives vary in their effect at different points over the field period of a survey.
An important consideration is the effect of incentives on response quality. Singer and Kulka (2001) found no decline in quality of response to incentives in terms of differences in nonresponse or length of open-ended answers. Since then, the small number of studies (mail, RDD, and face to face) that have examined incentive effects on data quality have, with one exception, found no effects. The exception is Jaeckle and Lynn (2008), who found that incentives increased item nonresponse. Cantor et al. (2008) argued for additional tests that would control for such factors as survey topic, size and type of incentive (e.g., prepaid, promised, refusal conversion), and whether studies are cross-sectional or longitudinal.
Do incentives affect sample composition? Cantor et al. (2008), in their review of 23 RDD studies, concluded that incentives, whether prepaid or promised, have little effect on measures of sample composition. Nevertheless, a number of studies have demonstrated such effects on specific characteristics (see Singer, 2013, pp. 128–129). But specific attempts to use incentives to bring groups into the sample that are less disposed to respond because of lower topic interest have received only qualified support (Groves et al., 2004, 2006). Singer points out that very few studies have considered the sample composition effect of Web survey incentives, and she concluded that more research is clearly needed.
A key question concerns the effect of incentives on the responses that respondents provide. The research findings are mixed. James and Bolstein (1990), Brehm (1994), and Schwarz and Clore (1996) reported results consistent with the mood hypothesis—that incentives boost mood and therefore affect responses—and Curtin et al. (2007) found an interaction between race and receipt of incentives (nonwhites receiving an incentive gave more optimistic answers on the Index of Consumer Confidence). Groves et al. (2004, 2006) reduced nonresponse due to lack of topic interest by offering incentives, and the change in bias due to increased participation of those with less interest was not statistically significant. The possibility that incentives bias responses directly through an effect on attitudes has found no support in the experimental literature, although Dirmaier et al. (2007) specifically tested such a hypothesis. There is no evidence that
randomly administered incentives increase bias, but not enough is known about the effect to use them effectively to reduce bias.
Whether incentives have an effect on Internet surveys is still unknown. Singer (2011) pointed out that research in this area is limited. Much of the published experimental research has been done by Göritz (2006), who finds that incentives increase the proportion of invitees starting a survey and the proportion completing it over a no-incentive group. Lotteries are the incentives most often used. Her literature review concluded that specific tests of lotteries against other types of incentives or against no incentives show that lotteries are no more effective in Web surveys than in other kinds of surveys. In most tests, lotteries did not significantly increase response rates over a no-incentive or alternative incentive group.
Incentives are sometimes differential: different amounts are offered primarily to convert refusals, often on the basis that differential incentives are more economical than prepaid incentives and that they are more effective in reducing bias. But there is also a question of fairness; many sample members are likely to consider differential incentives to be unfair. Nonetheless, Singer et al. (1999) found that respondents would be willing to participate in a new survey by the same organization even when told that differential incentives would be paid. When differential incentives are used to reduce bias, they are commonly paired with a small prepaid incentive to all possible participants, which serves to increase the sample size and helps to satisfy the fairness criterion.
Whether or not there are long-term incentive effects is not yet known. Singer (2011) said that there is no evidence of long-term effects thus far, but studies have been done only over short intervals.
Conclusions on Incentives
Singer (2011) drew the following conclusions regarding incentives:
• Incentives increase response rates to surveys in all modes and in cross-sectional as well as panel studies.
• Monetary incentives increase response rates more than gifts do, and prepaid incentives increase them more than promised incentives or lotteries do.
• There is no good evidence for how large an incentive should be. In general, although response rates increase as the size of the incentive increases, they do so at a declining rate. Also, there may be ceiling effects to the extent that people come to expect incentives in all surveys.
• Few studies have evaluated the effect of incentives on the quality of response; most of these have found no effects.
• Few studies have examined the effect of incentives on sample composition and response distributions; most of these have found no effects.
• Effects on sample composition and response distributions have been demonstrated in a small number of studies in which incentives have brought into the sample a larger-than-expected number of members of particular demographic categories or interest groups.
• Incentives have the potential for both increasing and reducing nonresponse bias. They may reduce bias if they can be targeted to groups that might otherwise fail to respond. They may increase bias if they bring into the sample more of those who are already overrepresented. And if they affect all subgroups equally, they will leave the nonresponse bias unchanged.
Recommendation 4-5: Research is needed on the differential effects of incentives offered to respondents (and interviewers) and the extent to which incentives affect nonresponse bias.
Paradata are data about the survey process itself and are collected as part of the survey operation (Couper, 1998). These data may include records of calls, reasons for refusals, responses to incentive offers, and characteristics of the interviewers (Couper and Lyberg, 2005; Bates et al., 2008; Laflamme et al., 2008; Lynn and Nicolaas, 2010; Stoop et al., 2010; Olson, 2013). As discussed in Chapter 3, paradata are used for many purposes: to monitor the status of field collection, to confirm that fieldwork has been carried out according to instructions, to compute response rates, to identify reasons for nonresponse, to implement responsive design strategies, and to adjust for nonresponse bias. They can be in the form of macro paradata, sometimes also termed metadata, which are aggregate statistics about the survey (e.g., response rates, coverage rates, editing rates). Studies have found such aggregate data to be useful for coming to an understanding of the survey information and in the weighting process (Dippo and Sundgren, 2000). Paradata can also be in micro form and include information carried on individual records, such as imputation flags, together with call records and the like.7
Sometimes paradata comprise auxiliary information external to the information collected on the survey questionnaire. Auxiliary data “can be thought of as encompassing all information not obtained directly from the interview of a respondent or case-specific informant” (Smith, 2011, p. 389).
7However, care should be taken in using call record data because the process that causes such records to be created is often non-neutral, in that their presence or absence reflects a decision to pursue or not pursue a case.
Auxiliary data can be information about the sample frame taken from external sources such as census data by block group, census tract, and other geographic areas (Smith and Kim, 2009). Auxiliary data can also be in the form of observational data about the interview environment. For example, the European Social Survey collects data about the type of dwellings and about neighborhood characteristics such as accumulated litter and signs of vandalism (Stoop et al., 2010). The researchers found that refusals and non-contacts were more likely in areas in which buildings were poorly maintained and where litter abounded. However, these data have not been found useful for nonresponse adjustment purposes (Stoop et al., 2010).
The power of employing paradata and auxiliary data for improving response rates is now becoming recognized. In her presentation to the panel, panel member Kristen Olson observed that, in response to declining response rates, survey practitioners can often introduce design features on the basis of paradata to recruit previously uncontacted or uncooperative sample members into the respondent pool (Olson, 2013). Paradata may also be helpful in tailoring a survey to increase its saliency to sample members. Paradata can be used as well to create nonresponse adjustments, reflecting each respondent’s probability of being selected and observed in the respondent pool (see Chapter 2).
There is an increasing recognition that auxiliary data can be used in sample design (i.e., to achieve large samples of rare or hard-to-find groups) or to improve approaches to the interview (Smith, 2011). In this regard, Smith (2011, p. 392) concludes that research is needed on methods for linking databases in order to augment “sample-frame data with information from other databases and sources.” Auxiliary data may also be useful to improve imputation techniques (Smith and Kim, 2009).
Olson (2013) observed that few auxiliary variables are available on both respondents and nonrespondents and that there is a dearth of research on predicting survey variables. She concluded that paradata have traditionally been developed as measures of participation, not as survey variables. Paradata have the potential to assist in nonresponse adjustment and may be useful in developing responsive designs. However, the use of paradata requires upfront investment and additional research to demonstrate when the application of paradata is effective and when it is not. The use of paradata also requires the development of standards for its collection.
Recommendation 4-6: Research leading to the development of minimal standards for call records and similar data is needed in order to improve the management of data collection, increase response rates, and reduce nonresponse errors.
Paradata make it possible for survey managers to monitor the survey process in real time and to make decisions and alterations in order to improve response rates. This approach to survey design has been termed responsive design (Groves and Heeringa, 2006). As envisioned by Groves and Heeringa (2006, p. 1), responsive designs “pre-identify a set of design features potentially affecting costs and errors of survey statistics, identify a set of indicators of the cost and error properties of those features, monitor those indicators in initial phases of data collection, alter the active features of the survey in subsequent phases based on cost/error tradeoff decision rules, and combine data from separate design phases into a single estimator.” Responsive design is a flexible menu of design approaches that can be employed in real time to ameliorate the damage caused by reduced response rates to surveys. The effectiveness of this approach depends critically on the ability to pre-identify variables that can provide basic data on costs and error sources so that survey managers can make rational decisions about the trade-offs between costs and errors.
One particular application of responsive design was described by Laflamme (2011) at the panel’s workshop. This application, called responsive collection design (RCD), uses the information available prior to and during data collection to adjust the collection strategy for the remaining in-progress cases. Statistics Canada has conducted two experimental surveys with RCD and control groups for two major CATI surveys: the 2009 Households and Environment Survey, using the 2009 Canadian Community Health Survey sampling frame, and the 2010 Survey of Labour and Income Dynamics (SLID). The SLID 2011 was designed with full RDC techniques in which there was an embedded experiment for the first call. The paradata consisted of the information on the Blaise transaction file (i.e., calls and contact information), interviewer payroll hours, budget and target figures, previous and current collection cycle information, and response propensity model results. These produced indicators that were used to identify when to start RCD.
The results of the Statistics Canada tests indicated that there was a higher overall response rate when RCD was used compared to the previous survey cycle. The responsive design group achieved the same response rate with less effort. On the basis of the evidence, it was concluded that RCD is technically feasible. However, Laflamme (2011) asserted that RCD is not a “magic” solution.
Recommendation 4-7: Research is needed on the theory and practice of responsive design, including its effect on nonresponse bias, informa-
tion requirements for its implementation, types of surveys for which it is most appropriate, and variance implications.
Administrative records may be helpful in reducing potential bias due to nonresponse, and they may be helpful in correcting for bias. John Czajka’s remarks (2009) addressed the use of administrative records to reduce nonresponse bias. He gave the example of the use of Internal Revenue Service (IRS) tax records by the Census Bureau. The IRS conducts an annual enumeration, and Social Security numbers (SSNs) provide a link to age, sex, race, and Spanish origin data stored in other files, such as the Social Security files. These administrative records yield population coverage estimated at 95 percent of U.S. residents.
The potential of these administrative data led to the development of the Census Bureau’s Statistical Administrative Records System (StARS), which combines data from six large federal files: IRS 1040 and 1099, Selective Service, Medicare, Indian Health Service, and Housing and Urban Development’s Tenant Rental Assistance Certification System (TRACS). However, administrative records have limitations. For example, IRS records do not include undocumented residents, dependents of non-filers, and non-filers with no reported income. In addition, the unit of observation is not the household, and the reported address may not be residential. The records may also be incomplete. Race is often missing from Social Security files and, when present, may not reflect current definitions.
Although administrative records hold promise for helping to improve survey operations, in Czajka’s judgment, it is unlikely that they can be substituted for survey reports. Reasons include that the set of survey items for which there is a high quality administrative records alternative is small and largely limited to federal records; the concepts underlying administrative records may differ from survey concepts (e.g., tax versus survey income); the records may be outdated; and there are severe limitations on the ability to use administrative records because of confidentiality concerns. Little work has been done on informed consent procedures to enable the use of the confidential administrative records.
Yet work has moved forward on ways to use administrative records to make data collection programs more cost-effective. Thus, as part of planning for the 2020 census, the Census Bureau has continued and expanded its acquisition of administrative records and its research on ways to use administrative records with census and survey data (see http://www.census.gov/2010census/pdf/2010_Census_Match_Study_Report.pdf [April 2013]). Other statistical agencies have explored the uses of administrative records
to augment survey data, such as matching health survey data with health expenditure claims data (see, e.g., http://www.cdc.gov/nchs/data_access/data_linkage/cms_medicare.htm [April 2013]).
Recommendation 4-8: Research is needed on the availability, quality, and application of administrative records to augment (or replace) survey data collections.
The problems with obtaining cooperation with social science surveys do not mean that probability-based sampling should be abandoned. However, it would be useful for the survey community to continue to prepare for a time when current modes are no longer tenable, due to excessive costs and burden. Several emerging methods for gathering social science data are briefly discussed here because they warrant consideration in the development of a research agenda for dealing with the problem of declining response rates.
Non-probability samples are a troubling alternative to traditional probability samples, but, largely owing to their cost and timeliness advantages, they are rapidly growing as a means of gathering data. Much of that growth is associated with the growth of online survey methods. In fact, non-probability samples now account for the largest share of online research, according to the AAPOR Report on Online Panels (American Association for Public Opinion Research, 2010b).
Nonresponse, according to the AAPOR report, is an issue for non-probability samples, just as it is for probability-based samples. Nonresponse in the respondent recruitment phase is likely to be considerable, but since the target population is not known as it is with probability-based samples, it is not easily measured. Relatively little is known about nonresponse in non-probability samples, but nonresponse is not likely to be random and is likely to include the effects of self-selection. For example, the AAPOR report notes that self-selected, non-probability-based online panels are more likely to include white, younger, more active Internet users and those with higher levels of educational attainment than the general population. The report concludes that these surveys offer no foundation on which to draw inferences to the general population.
Internet Scraping Technologies
Another possible approach to the growing problem of survey nonresponse is not to survey at all, but instead to gather information via data mining techniques, essentially mining the Internet to gather information. There are several examples of the use of this technique for the generation of economic and social statistics.
• MIT Billion Prices Project. The Billion Prices Project (BPP) is an initiative by economists Roberto Rigobon and Alberto Cavallo to collect prices from hundreds of online retailers around the world on a daily basis (see http://bpp.mit.edu [April 2013]). The project monitors daily price fluctuations of about 5 million items sold by approximately 300 online retailers in more than 70 countries. For the United States, the project collects about 500,000 prices. It has been collecting prices since 2007. The BPP is said to have closely tracked the Consumer Price Index.
• Google Price Index. The Google Price Index is a project of the company’s chief economist, Hal Varian. Varian uses Google’s vast database of Web prices to construct the constantly updated measure of price changes and inflation. Google has not yet decided whether it will publish the price index, and it has not released its methodology, but it has reported that the preliminary index tracks the Consumer Price Index closely.
• Flu Epidemic Prediction. For several years, studies have been conducted to detect the onset of U.S. seasonal flu epidemics by extracting patterns of flu-related search terms from the billions of queries stored by Google and Yahoo! Inc. (Butler, 2008). These studies have been taken to provide real-time indicators to complement the CDC reports that are compiled using a combination of data about hospital admissions, laboratory test results, and clinical symptoms. These reports are often weeks old by the time hospitals get them, and so they do not allow frontline health-care workers enough time to prepare for a surge in flu cases. The studies have found that patterns of searches matched official flu surveillance data almost perfectly—and often weeks in advance of these official data.
• Predicting the Stock Market Using Twitter Feeds. Recent research has tested whether measurements of collective mood states derived from large-scale Twitter feeds are correlated with the value of the Dow Jones Industrial Average (DJIA) over time. Bollen et al. (2011) analyzed the text content of daily Twitter feeds with two mood-tracking tools and were able to predict the daily up and down changes in the closing values of the DJIA with an accuracy of 87.6 percent.
Admittedly, these examples of the use of Internet data mining to produce socioeconomic indicators are currently developmental, but their ap-
parent success in replicating data collected administratively or through costly surveys suggests that further development and testing could be warranted. However, caution is in order. Availability of many types of information on the Web is a matter of choice; thus, the data available on a given topic may not fully reflect the underlying range of information, raising the possibility of unknown biases. Often, there is no built-in constraint to make measures conceptually compatible. In addition, there is no guarantee that biases would necessarily be stationary in a meaningful sense over time, which would compromise the validity of trend analysis.
Recommendation 4-9: Research is needed to determine the capability of information gathered by mining the Internet to augment (or replace) official survey statistics.