RESEARCH RECOMMENDATIONS FROM NATIONAL RESEARCH COUNCIL (1983)
As part of its three-volume report, the National Research Council Panel on Incomplete Data in Sample Surveys prepared separate sets of recommendations for improving survey operations and for structuring future research on nonresponse and other issues. The following text excerpts the 11 recommendations offered on future research (National Research Council, 1983, pp. 11–14).
The recommendations on research have three objectives: to provide a capital investment in computer programs and data sets that will make nonresponse methodology cheaper to implement and evaluate; to encourage research on and evaluation of theoretical response mechanisms; and to urge that long-term programs be undertaken by individual or groups of survey organizations and sponsors to provide for and accomplish cumulative survey research, including research on nonresponse.
Recommendation 1. General-purpose computer programs or modules should be developed for dealing with nonresponse. These programs and modules should include editing, imputing (single and multiple), and the calculation of estimators, variances, and mean square errors that, at least, reflect contributions due to nonresponse.
Recommendation 2. Current methods of improving estimates that take account of nonresponse, such as poststratification, weighting methods, and hot-deck imputation, especially hot-deck methods of multiple imputation, require further study and evaluation.
Recommendation 3. Theoretical and applied research on response mechanisms should be undertaken so that the properties and applicability of the models become known for estimates of both level and change.
Recommendation 4. A systematic summarization of information from various surveys should be undertaken on the proportions of respondents for specified parts of populations and for particular questions in stated contexts.
Recommendation 5. Research is needed to distinguish the characteristics of nonrespondents as opposed to respondents and to assess the impact of questionnaire design and data collection procedures on the level of nonresponse.
Recommendation 6. Data sets that permit good estimates of bias and variance to be made when various statistical methods of dealing with nonresponse are adopted should be made publicly available. Such data sets could be used for testing various methods of bias reduction and for assessing effects of the methods on variances. They could also be used for the evaluation of more general methods depending on models.
Recommendation 7. Theoretical and empirical research should be undertaken on methods of dealing with nonresponse in longitudinal and panel surveys.
Recommendation 8. Theoretical and empirical research on the effects of nonresponse on more complex methods of analysis of sample survey data, e.g., multivariate analysis, should be undertaken.
Recommendation 9. A consistent terminology should be adopted for descriptive parameters of nonresponse problems and for methods used to handle nonresponse in order to aid communication on nonresponse problems.
Recommendation 10. Research on response mechanisms that depend on reasons for nonresponse should be undertaken.
Recommendation 11. Data on costs should be obtained and analyzed in relation to nonresponse procedures so that objective cost-effective decisions may become increasingly possible.
OTHER SELECTED RESEARCH TOPICS COMPILED BY THE PANEL
|Research Area / Quotation||Source|
Theoretical Approaches to Nonresponse
|We conjecture that there may be a direct link between the increase in efforts to contact households and refusals. Many households contacted because of the additional efforts may be more inclined to refuse precisely because of the increased contact efforts. This effect might be especially pronounced in telephone surveys, where the members of households with caller ID can see that numerous attempts have been made to contact them. If so, it is possible that the multiple attempts will predispose the household to refuse when they are finally reached.… This conjecture is consistent with our earlier suggestion that technological barriers may suppress the opportunity actually to hear the survey request. In this case, the barrier would promote refusals by increasing the rate of noncontact over time. Frustration with multiple contact attempts might also partially explain why so many RDD surveys with high nonresponse rates have low nonresponse bias. In terms of a mechanism for nonresponse, frustration with multiple contact attempts is generally not very selective and unlikely to target a particular group or subgroup.||Brick and Williams (2013:55–56)|
|It is interesting to note that the two most prominent and useful models for thinking about survey nonresponse— social exchange theory and leverage–saliency theory—are actually models of survey participation. They do not explicitly address the relationship between contact efforts and participation efforts. Extending nonresponse models to include the effects of contact and testing these theories might yield valuable practical advice for survey researchers.||Brick and Williams (2013:56)|
|Perhaps most important in the present study is the finding that the relationship between the type of respondent (cooperative, reluctant) and the attitudinal and background variables was not all in the same direction in all countries. This needs further research and discussion because it creates a serious challenge to any scholar who believes there is a theory of nonresponse that applies cross-nationally.||Billiet et al. (2007:159)|
|There may be additional hidden costs to the effort to maintain nonresponse rates in the face of mounting resistance. Many survey researchers suspect that reluctant respondents may provide less accurate information than those who are more easily persuaded to take part.… Although the general conditions that produce nonresponse bias in survey means or proportions are known (the bias is a function of both the nonresponse rate and the relation between the response “propensity”—the probability that a given case will become a respondent—and the survey variables), it is not clear what circumstances are likely to yield large nonresponse biases and what circumstances are likely to yield small or negligible ones.||Tourangeau (2003:11)|
|Most of the survey literature on nonresponse has focused on its impact on means, proportions, and totals. The impact of attrition may be reduced for more complex, multivariate statistics (such as regression coefficients), but clearly more work is needed to document this difference.||Tourangeau (2003:11)|
|Another kind of study is likely to assume increasing importance in the coming years; these studies will focus on the issue of when nonresponse produces large biases and when it can be safely ignored. Like investigations of measurement error, these studies may involve disruptions of ongoing efforts to maintain response rates (perhaps even lowering response rates by design) in order to assess their impact on nonresponse bias. In addition, it will be important to demonstrate that falling response rates actually matter (at least some of the time) and to understand the impact of nonresponse on complex statistics derived from survey data.||Tourangeau (2003:12)|
|More research across a range of surveys is needed to answer the question as to whether higher response rates decrease nonresponse bias. Indeed, in the light of our mixed results, we are not able to decide which of the two models, the “continuum of resistance model” or the “classes of nonparticipants model[,]” finds most support in our data. Further research on the differences and similarities in reasons for refusing cooperation between the two kinds of reluctant respondents (easy- and hard-to-convert refusals) and the refusals who were reapproached and who still refused to participate in a survey is needed.||Billiet et al. (2007:160)|
|First, our results do not go far in explaining the mechanisms through which interviewer experience is related to cooperation. Since experience has a strong effect, further exploration of the mechanisms by which it occurs is of interest. Second, we have not addressed the question of whether experience has a positive effect due to learning or selective drop-out of less successful interviewers. Third, we believe that the lack of effect of inter-personal skills is related to problems in measuring these, rather than to the fact that they are not relevant. The question then is how such skills may be measured more successfully.||Sinibaldi et al. (2009:5968)|
|What is needed next are studies which address some of the other aspects of the doorstep interaction such as the intonation of the interviewers voice and non-verbal behaviour and the other various intangible things which help to determine the outcome of a request for participation. It would also be useful to try to separate out the subtleties that make a professional interviewer a professional interviewer.||Campanelli et al. (1997:5-4)|
|The extent to which variation in interviewer practices, sample persons’ interactional moves, and the interrelation between these practices and moves have measurable effects on response rates awaits further, quantitative investigation. Nonetheless, this study highlights two challenges for such research. First, if practices are effective because of their deployment in particular contexts, then their effectiveness can be assessed only by experimental designs in which that context is considered. One cannot simply assign some interviewers to do presumptive requests and others to do cautious ones; instead, properly varying the presumptiveness and cautiousness of requests depending on the circumstances may be optimal. Interviewers would need to be trained to recognize these situations—and to do so very quickly. Second, observational studies of practices need to be careful not to confuse the influence of an interviewer’s practices on a sample person with the influence of a sample person’s behavior on an interviewer.||Maynard et al. (2010:810)|
|The influences of interviewer behavior, as well as interviewer personality traits, are not yet well understood. It seems advisable to measure interviewer behavior at the interaction level rather than the interviewer level. To better understand the process of establishing cooperation, interviewer call records need to be investigated, which only more recently have become available. It also seems advisable to control for previous interviewer performance, which requires survey agencies to record and use these data. A largely unexplored area is interviewer effects in longitudinal surveys.||Durrant et al. (2010:25–26)|
|Given the apparent importance of the perception and interpretation of voice characteristics, an alternative method is to focus on the perceived interviewer approaches. Since there are probably many combinations of voice characteristics that can convey a similar interviewer approach (e.g., there are multiple ways to express authority), this method might be more fruitful. In that case, more research is needed into how interviewer approaches—as likeability, authority, and reliability—might be expressed and perceived during the introductory part of a telephone interview, and in which conditions they are effective in enhancing cooperation rates.||van der Vaart et al. (2006:497)|
|In general, more work is needed to assess whether certain types of survey items are more or less susceptible to nonresponse error variance or measurement error variance among interviewers.||West and Olson (2010:1022)|
|Interviewer incentives are ill-understood and have received little attention in the research literature, relative to respondent incentives. The mechanisms through which they may act on interviewer response rates and nonresponse bias are possibly different from those that act on respondents, as interviewers and respondents have very different roles in the social interaction at the doorstep. Further research is needed to explore how, and under what circumstances, interviewer incentives could help achieve survey goals.||Peytchev et al. (2010:26)|
|There is also evidence that interviewer motivation is a major contributing factor in maintaining respondents’ interest in a survey and preventing break-offs. So studies of interview length should also explore the burden placed on interviewers in different modes and how this impacts on data quality.||Roberts (2005:4)|
|Another question for future research is the relative power of following the attempts to obtain Web and IVR responses with a mail survey in Phase 2, rather than telephone. In many ways the telephone attempts during Phase 2 were similar to the initial contacts, i.e., both involved interaction by phone. It is reasonable to expect that switching to mail at this stage would have had a much greater impact on improving response to these treatment groups, but remains to be tested experimentally.… Using an alternative mode that depends upon a different channel of communication, i.e., aural vs. visual, to increase response may also introduce measurement differences issues that cannot be ignored. Understanding the basis of these differences should be a high priority for future research.||Dillman et al. (2008:17)|
|Mixed or multiple mode systems are not new, but new modes emerge and with them new mixes. This means that we have to update our knowledge about the influence of modes on data quality. We need comparative studies on new modes and mode effects, and preferably an integration of findings through meta-analysis.||De Leeuw (2005:249)|
|Multiple mode contact strategies are employed to combat survey nonresponse. Still we need more research on the optimal mixes, preferably including other indicators besides response rate, such as bias reduction and costs.||De Leeuw (2005:249)|
|Adjustment or calibration strategies for mode mixes are still in an early phase, and more research is needed.||De Leeuw (2005:250)|
|Not much is currently known about people’s preferences for different data collection modes. What modes would respondents prefer to use when participating in a survey? Meta-analyses of mode preference data have found that people tend to “over-prefer” the mode in which they were interviewed, but when mode of interview is controlled for, there is an overall preference for mail [surveys]. It is likely that these findings are now out of date, yet the apparent popularity of the Internet as a mode of data collection may well reflect an overall preference among respondents for self-completion. More research into public attitudes to data collection modes would shed light on this issue and might help guide survey designers in making mode choices.||Roberts (2005:3)|
|Offering different survey agencies/countries or respondents a choice from a range of data collection modes will be a realistic option only once it is known that a questionnaire can practicably be administered in each of the modes on offer.… Not enough is known, however, about the extent to which modes are differentially sensitive to questionnaire length (and people’s tolerance of long interviews), so any survey considering the feasibility of mixing modes will need to examine this problem. [Some] survey organisations impose a limit on the permissible length of phone interviews (e.g., Gallup’s “18 minute” rule). But research has shown that people’s willingness to respond to long surveys depends on their motivation and ability to participate which, to a large extent, will vary by survey topic. There may also be cultural variation in tolerance of interview length (e.g., norms regarding the duration of phone calls), and these should be investigated.||Roberts (2005:4)|
|We need to understand better the non-response mechanisms associated with each mode. For example, non-response in self-completion surveys is often linked to variables of interest. A weakness of face-to-face interviewing is that we get greater non-response in urban populations than in rural ones. Each mode has weaknesses, and we need to be aware of what those weaknesses are.||Roberts (2005:7)|
|In terms of nonresponse, cell phone response rates trend somewhat lower than comparable landline response rates, but the size of the gap between the rates for the two frames is closing. This is thought to be due to landline response rates continuing to drop faster than cell phone response rates. Research needs to be conducted to more fully understand the size and nature of differential nonresponse in dual frame telephone surveys and the possible bias this may be adding to survey estimates. Future research needs also to seek a better understanding of how dual service users (those with both a cell phone and a landline) can best be contacted and successfully interview via telephone.||American Association for Public Opinion Research (2010a:109)|
|While we were quite successful in predicting response outcome prior to the study, surveys vary in the amount of information that is available on sample cases. Exploring external sources of information is needed, particularly for cross-sectional survey designs that do not benefit from prior wave data and may also lack rich frame data. Similarly, more research will be needed on how to apply these data prior to any contact with sample cases. Two alternatives are to apply model coefficients from similar surveys, or to estimate predictive models during data collection as proposed under responsive survey design (Groves and Heeringa, 2006).||Peytchev et al. (2010:26)|
|New and effective interventions for cases with low response propensities are needed in order to succeed in the second step of our proposed approach to reducing nonresponse bias. Such interventions are certainly not limited to incentives as their effectiveness varies across target populations, modes of data collection, and other major study design features.||Peytchev et al. (2010:26)|
|Further research is needed into the whole sequence of the survey process and how the protocols at each stage (e.g., screening) interact with those applied on other stages (e.g., refusal conversion or interviewing) of the process. The dynamic treatment regimes approach offers a roadmap for [how] this research might be conducted. The results developed here suggest that such a research program could be successful.||Wagner (2008:76)|
|Relatively few studies have examined the effect of incentives on sample composition and response distributions, and most studies that have done so have found no significant effects. However, such effects have been demonstrated in a number of studies in which the use of incentives has brought into the sample larger (or smaller) than expected demographic categories or interest groups.||Singer and Ye (2013:134)|
|Clearly, there is still much about incentives that is unknown. In particular, we have not examined the interaction of respondent characteristics such as socioeconomic status with incentives to see whether they are particularly effective with certain demographic groups. Geocoding telephone numbers in the initial sample might permit analysis of such interaction effects (cf. King , who applied a similar method to face-to-face interviews in Great Britain). And we need better information on the conditions under which incentives might affect sample composition or bias responses. Such analyses should receive high priority in future work.||Singer et al. (2000:187)|
|The number of incentive experiments that could be designed is legion; unless they are guided by theory, they will not contribute to generalizable knowledge.… One question often asked is how large an incentive should be for a given survey. The issue here is the optimum size of an incentive, given other factors affecting survey response. If experiments varying the size of the incentive are designed in the context of a theory of survey participation that allows for changes in motivation over time, some generally useful answers to this question may emerge. In the absence of such theoretically based answers, pretesting is the only safe interim solution.||Singer (2000:241)|
|Research is also needed on how paying respondents for survey participation affects both respondent and interviewer expectations for such payments in the long run.||Singer (2000:25)|
|Research is needed on the conditions under which incentives not only increase response rates but produce a meaningful reduction in nonresponse bias. Because they complement other motives for participating in surveys—such as interest in the survey topic, deference to the sponsor, or altruism—it is reasonable to hypothesize that incentives would serve to reduce the bias attributable to nonresponse. Whether the use of incentives for this purpose is cost-effective is less easily answered, however, and research is needed on this topic, as well.||Singer (2000:25)|
Weighting and Nonresponse Adjustment
|Including many auxiliary variables and using the fullest cross-classification of these variables possible in the weighting will quickly result in small numbers of respondents in at least some of the weighting cells. Guidance on appropriate cell sizes for calibration weighting is very limited. The appropriate cell size is a trade-off between the potential reduction in nonresponse bias associated with increasing the information in calibration weighting and the potential increase in the variance and ratio biases of the estimates. More research is needed in this area.||Brick and Jones (2008:72)|
|Another area that requires more research is the effect of nonresponse on multivariate methods such as measures of association and linear and logistic regression parameters when the survey weights are used to compute these measures. The analytic results for odds ratios imply that the bias in this type of statistic could be sensitive to varying response propensities. Simulation studies on these multivariate statistics could prove very enlightening.||Brick and Jones (2008:72)|
|The challenge of weighting adjustment, for survey researchers and practitioners, lies in the search for an appropriate set of auxiliary variables that are predictive of both response probabilities and survey variables of interest. We encourage survey researchers to engage actively in identifying an appropriate set of auxiliary variables in developing non-response adjustment weights. This should include identifying measures at the design stage that can be obtained on both respondents and non-respondents and that are good proxy variables for one or multiple survey variables. In the past, attention was often focused on finding variables that are associated with response although small R2-statistics are very common in response propensity models…. The results of this paper show that a renewed focus on correlates of the key survey outcome variables is warranted. An avenue that is worth exploring is statistics derived from call record data or other types of paradata that were not discussed here.||Kreuter et al. (2010:405–406)|
|Regarding further research, we make several suggestions. First, we suggest looking to new technologies to further assess paradata validity and quality. If possible, the use of computer-assisted recorded interviewing (CARI) might be implemented. Ideally, we could record the pre-interview door-step interactions so we could have the “truth” against which to compare [content history instruments (CHI)] entries. However, given the legal and policy requirements to obtain informed consent prior to using CARI, this may prove impossible. An alternative is to have trained observers shadow interviewers, record their own versions of CHI, and then compare their records and the interviewer’s. Second, we recommend bringing interviewer characteristics into the equation when assessing paradata quality (e.g., years of experience, gender, education). Since recording interviewer–respondent interactions is a rather subjective undertaking, interviewers are undoubtedly a source of systematic variance. To date, there is very little research regarding interviewer impact on the collection of paradata.||Bates et al. (2010:103)|
|We encourage future work in this area that might include indicators for time and part of the day or other features that would be correlates of respondent attributes related to contactability and cooperation.||Kreuter and Kohler (2009:224)|
|This paper did not consider the measurement error properties of the interviewer observations and record variables. We made a simplistic assumption that there is no measurement error in those variables. Of course, this assumption is debatable in the real world. Future research is needed to examine the effect of the potential measurement error in auxiliary variables on survey estimates and on the bias–variance trade-off. Although it will be difficult to do so, research is also needed on the presence and effect of selective measurement error, e.g., if measurement error in the auxiliary variables is correlated with response.||Kreuter et al. (2010:405)|
|Administrative records are another avenue agencies are pursuing for use as sampling frames, as survey benchmarks, as sources of auxiliary data for model-based estimates, and for direct analysis. This is a promising area for future research, Abraham said, but she added a word of caution about treating administrative records as the “gold standard” of data, because little is known of their error properties.||National Research Council (2011:7), summarizing workshop presentation by Katharine Abraham (University of Maryland, College Park)|
|For many years, members of the statistical community have said that administrative records can and should be used more fully in the federal statistical system and in federal programs. The use of administrative records in the Netherlands and other countries gives a good flavor of the kinds of things the statistical system can envision doing in the United States to varying degrees. There are also areas, however, in which substantial work has already been done in the U.S. context. Most notably, administrative records have been used in economic statistical programs since the 1940s. There are also good examples of administrative data use with vital statistics, population estimates, and other programs across several federal statistical agencies.||National Research Council (2011:41– 42), summarizing workshop presentation by Rochelle Martinez (U.S. Office of Management and Budget)|
|[Another] barrier is administrative data quality. Although they are not perfect, with survey data, agencies have the capability to describe and to understand the quality of what they have. In other words, there are a lot of measurement tools for survey data that do not yet exist for administrative records. Some have assumed that administrative data are a gold standard of data, that they are the truth. However, others in the statistical community think quite the opposite: that survey data are more likely to be of better quality. Without a common vocabulary and a common set of measurements between the two types of data, the conversation about data quality becomes subjective.||National Research Council (2011:44), summarizing workshop presentation by Rochelle Martinez (U.S. Office of Management and Budget)|
|Another significant data quality issue for statistical agencies is the bias that comes with the refusal or the inability to successfully link records. In addition to the quality of the administrative data as an input, the quality of the data as they come out of a linkage must be considered as well.||National Research Council (2011:44), summarizing workshop presentation by Rochelle Martinez (U.S. Office of Management and Budget)|
|For the future, Trépanier said, using administrative data to build sampling frames is of particular interest. There is the risk of coverage error in using an administrative database in constructing a frame, but if it is done in the context of using multiple other frames and calibration to correct coverage error, this is probably less of an issue. The ideal goal is a single frame, which is the approach used in building Statistics Canada’s Address Register, but this does not preclude the inclusion of auxiliary information. A single frame would allow for better coordination of samples and survey feedback, she said.||National Research Council (2011:49–50), summarizing workshop presentation by Julie Trépanier (Statistics Canada)|
|For data collection, one of the goals related to administrative data is to enable tracing. Statistics Canada wants to centralize the tracing process leading to the linking of all administrative data sources to make available the best contact information possible. This will require substantial effort, including a process to weigh the quality of the different sources and determine what contact information is most likely to be accurate. Another goal for administrative data could be to better understand the determinants of survey response and improve data collection procedures based on this information. For example, administrative data can provide guidance on preferred mode of data collection if one can assess whether persons who file their taxes electronically are also more likely to respond to an electronic questionnaire.||National Research Council (2011:50), summarizing workshop presentation by Julie Trépanier (Statistics Canada)|
|Statistics Canada has been successful in using substitution of income data from tax records, and this is likely to be continued. It is yet unclear, however, whether other information is available that could replace survey data. Investigating these options is done with caution because of the risk discussed. There is also the problem of ensuring consistency between survey and administrative data across variables.||National Research Council (2011:50), summarizing workshop presentation by Julie Trépanier (Statistics Canada)|
|Administrative data can also assist researchers in better understanding nonresponse bias and the impact of lower response rates. Finally, they can help both reduce the volume of data collected in surveys and improve estimation. Now that Statistics Canada has the omnibus record linkage authority in place, exploring all of these options has become a much easier process.||National Research Council (2011:50), summarizing workshop presentation by Julie Trépanier (Statistics Canada)|