National Academy of Sciences | 150 Year Anniversary

Questions? Call 800-624-6242

| Items in cart [0]

The National Academies Press

PAPERBACK
price:$43.00
add to cart

Rights & Permissions

topleft topright

Nonresponse in Social Science Surveys: A Research Agenda (2013)
Committee on National Statistics (CNSTAT)

Citation Manager

. "3 Mitigating the Consequences of Nonresponse." Nonresponse in Social Science Surveys: A Research Agenda. Washington, DC: The National Academies Press, 2013.

Please select a format:

BibTeX EndNote RefMan


Page
41
bottomleft bottomright
Page
41

Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 41
Prepublication Copy — Uncorrected Proofs 3 Mitigating the Consequences of Nonresponse It is well established that survey nonresponse has consequences, not the least of which is the potential for nonresponse bias. The techniques and procedures for dealing with nonresponse bias depend on how one approaches the problem. Singer (2006) points out that statisticians have been concerned mainly with imputation and weighting as ways of adjusting for the bias introduced by nonresponse, while social scientists and survey methodologists have tended to focus on measuring, understanding, and reducing the nonresponse rates themselves. Because of the negative effect that nonresponse has on survey quality, in recent years survey researchers and managers have been responding very aggressively to the problem of growing nonresponse in surveys. However, as Couper observed in the panel’s workshop, the approaches have become more selective and are rejecting the older approach that aimed simply to maximize the overall response rate (Couper, 2011a). The newer approaches target interventions at subgroups, at domains of interest, and at maximizing the response from specific cases based on their perceived special contribution to the quality of key statistics. Survey researchers are focusing less attention on increasing overall rates and are increasingly focusing on understanding the causes and correlates of nonresponse and making adjustments based on that understanding. In this chapter we explore some of the ways in which survey methodologists and managers are responding to the growing problem of survey nonresponse. We outline the results of some very good work that has gone into the development of weighting adjustments and adjustment models, and we document the increased use of paradata in nonresponse adjustment. Some of this work is in early stages, and other work is more advanced. We make several recommendations for research to solidify and further advance these lines of development. NONRESPONSE WEIGHTING ADJUSTMENT METHODS 1 The need for nonresponse adjustment arises because probability samples, in which all units have a known, positive probability of selection, require complete responses. Without other nonsampling errors, estimators for probability samples are approximately design-unbiased, consistent, and measurable. Base weights, or inverse probability selection weights, can be used to implement standard estimators. One possible simple estimator is the ratio mean. The ratio estimator is approximately ˆ unbiased and consistent for the population mean Y = Y N , with d equal to the inverse of the ˆ ˆ π π i probability of selection, Yπ = ∑n d i y i , and N = ∑s d i . ˆ ˆ If some of the sample units do not respond (unit nonresponse), and the form of the estimator is unchanged, then the estimator may be biased: 1 The discussion of nonresponse weighting adjustment methods is abstracted from the presentation by Michael Brick at the panel’s workshop (Brick, 2011). 3-1

OCR for page 42
Prepublication Copy — Uncorrected Proofs ˆ ∑yd i i Y0 = r . ∑d r i The bias can be expressed in two ways: (1) A deterministic framework assumes the population contains a stratum of respondents and a stratum of nonrespondents. Let the population means in the two strata be Yr and Ynr , respectively. The respondent stratum is R percent of N, and the bias of the unadjusted estimator ( ) ˆ  (1 − ( ) is bias Y = R) Y − Y . In the deterministic view the bias arises when the means of the 0 r nr respondents and of the nonrespondents differ. (2) A stochastic framework assumes every unit in the population has some nonzero probability of responding. The bias of the unadjusted estimator is ˆ ( ) 1 N ∑ (Yi − Y ) (ϕi − ϕ ) where � is the mean of the response propensities. Thus, in the stochastic view the bias arises 𝜑 bias Y0 ≈ Nϕ i when the characteristic and response propensity co-vary. A natural adjusted estimator is then ∑ di yiϕi−1 ˆ′ = r ˆ Y , ∑ diϕi−1 where ⏞ I is an estimate of the response propensity. 𝜑 ˆ r The selection of the weighting framework—deterministic or stochastic—depends then on the theoretical model of the response mechanism. In other words, the underlying model is the rationale for the selection of the adjustment scheme. Most models now in use assume that the missing data are missing completely at random (MCAR) or missing at random (MAR). The MCAR assumption holds if all the units in the population have the same probability of responding, that is, if the respondents are a smaller random sample. MCAR means that the distribution of the missingness (an indicator for whether the unit responds or not) is independent of the y-variable and all auxiliary (or x) variables. Missing at random is a more realistic assumption than MCAR. MAR implies that the probability of response does not depend on the y-variable once we control for a vector of known auxiliary variables, x. Weighting class adjustment schemes that define groups (sometimes called “response homogeneity groups”) – (h = 1,…, H ) using the auxiliary data such that the sample units within the groups have the same response propensity are consistent with the MAR assumption. These methods adjust the weights for respondents in the group with ϕhi = ϕh ∀ i ∈ h . ˆ ˆ This type of estimator is a weighting-class estimator or post-stratified estimator, depending on the type of data available for computing the adjustment. If the data are at the sample level (known for sampled units but not for the entire population), it is a weighting-class estimator; if the data are at the population level, then it is post-stratified estimator. The 3-2

OCR for page 43
Prepublication Copy — Uncorrected Proofs adjustment requires that sample members can be divided into cells using the vector of observable characteristics. In the weighting-class approach, the adjusted weight is calculated in four stages: 1. Calculate a base weight that is the reciprocal of the probability of selection of the case under the sample design; 2. when there is nonresponse and the eligibility of the nonrespondents cannot be determined, distribute the base weights of the nonrespondents into the eligible nonresponse category based on the proportion of the weights that are eligible in the respondent set; 3. adjust these weights to compensate for eligible nonrespondents; and 4. compute a final weight for the eligible respondent cases as the product of the base weight, the eligibility adjustment factor, and the nonresponse adjustment factor (Yang and Wang, 2008). In choosing weighting classes for the adjustment in stage (c), bias is limited when the variables and classes are such that either: (1) ϕ= ϕh ∀ i or ˆhi (2) Yhi Yh ∀ i . = As noted in the stochastic model of nonresponse, nonresponse bias only exists when the response propensities and the outcomes are correlated. However, since most surveys have a multitude of survey outcomes, the idea of using classes that are related only to the response propensities is commonly adopted. Models constructed to meet (1) are called response propensity stratification, and those designed to meet (2) are referred to as predicted mean stratification. The classes themselves are sometimes formed by subject matter experts, based on information on the key survey outcomes. An empirical method that is used often with categorical data is to form weighting classes by using classification software such as CART, CHAID, or SEARCH. Often the dependent variable is the response (respondent or not), and sometimes the survey outcomes are used as dependent variables, depending on the criteria being used. In either case, this approach may result in a very large number of weighting classes. Eltinge and Yansaneh (1997) suggested methods to test whether appropriate classes are formed. There are many alternative methods of making these adjustments that are sometimes used. We describe several of these alternatives below. Propensity Model Approach The propensity model approach uses multiple regression analysis to examine the nonresponse mechanism and calculate a nonresponse adjustment. In this method a response indicator is regressed on a set of independent variables such as those used to define weighting class cells. A predicted value derived from the regression equation is called the propensity score, which is simply an estimated response probability (Rosenbaum and Rubin, 1983). Survey population members with the same observable characteristics are assigned the same propensity score. The response propensity can be used to adjust directly by using the inverse of the 3-3

OCR for page 44
Prepublication Copy — Uncorrected Proofs estimated response propensity to adjust the weights for the respondents. This is the response propensity stratification, although in many cases the propensity scores are used to divide the sample into propensity classes based on quintiles of the distribution, and the average propensity within the class is the adjustment. Advantages of propensity-weighting methods over the traditional weighting-class methods are that continuous variables can be used to define cells, the models can accommodate a large number of variables, and the technique is simple to apply (Hazelwood et al., 2007). If large adjustments are avoided by using classes rather than the inverse of the estimated response propensities, then the methods can be as stable as other weighting-class methods. Like the response probability adjustments, this approach also implicitly assumes that one weighting adjustment is sufficient to address nonresponse bias in all estimates. Selection Models Heckman (1979) first proposed the sample selection model for regressions. The model is based on the observation that respondents self-select to participate in a survey, either explicitly by refusing to participate or implicitly through inability to answer or be contacted. Selection models are the conventional method among empirical economists for modeling samples with nonresponse (or other types of selectivity). Sampling statisticians have viewed this approach with skepticism, mainly because most selection models make strong assumptions about the nonresponse mechanism that may not hold in practice. Selection models are also typically univariate solutions, in the sense that the model is constructed for one particular estimate and cannot be used for a wide variety of statistics. While this feature can be of benefit because selection models may improve the quality of individual estimates, survey users are typically interested in producing many statistics. The goal is most often a consistency among estimates (e.g., the sum of the estimates for males and females should equal the total) that the selection models do not possess. The most popular form of selection model requires an explicit distributional assumption. In principle, different selectivity corrections could be made from a given set of data, depending on the model to be estimated. Raking Ratio Adjustment Approach Raking ratio adjustments are used to benchmark sampling weights to known control totals and can be considered a form of multi-dimensional post-stratification. The approach reduces sampling error through the use of auxiliary variables correlated to survey response and has been used to reduce nonresponse bias (Brick et al., 2008). The advantage of raking is that more variables that are correlated with response propensities and outcome variables can be included in the weighting process without creating large weight adjustments. Like post- stratification and weighting-class methods, careful review of the weights is required to make sure large weight adjustments are not introduced by the raking process. 3-4

OCR for page 45
Prepublication Copy — Uncorrected Proofs Calibration Post-stratification and raking are two specific methods of calibration, as described by Särndal (2007). Calibration is a method of computing weights in a manner that equates the sum of the calibrated weights to totals that are defined by auxiliary information and that satisfies calibration equations. Calibrated weights can then be used to produce estimates of totals and other finite population parameters that are consistent internally, as discussed above in raking. Calibration is used to correct for survey nonresponse (as well as for coverage error resulting from frame undercoverage or unit duplication). Kott and Chang (2010) showed that calibration weighting treats response as an additional phase of random sampling. This method is particularly valuable when there are many important auxiliary variables that are related to either response propensity or to the key survey outcomes. As a result, it has been heavily studied for use in countries with population registers or when the sampling frame is rich in auxiliary data, such as in establishment surveys. Mixture Models Selection models can be thought of as expressing the joint distribution of the outcome and “missingness” as the product of the distribution of the missing data mechanism conditional upon the outcome variable and on the marginal distribution of the outcome variable. An alternative approach is to write the joint distribution as the product of the distribution of the outcome conditional on the missing mechanism and on the marginal distribution of the missingness mechanism. The two approaches do not result in the same estimates in some situations. Little (1993) described the difference in the two approaches and discussed when pattern mixture models might be preferred. All of these weighting adjustment schemes depend very heavily upon the availability of auxiliary data that are highly correlated with either the response propensities or the key outcomes. Without these types of data, the adjustments are ineffective in reducing nonresponse bias. As response rates decline, these weighting adjustments may become even more important tools for producing high quality survey estimates. Recommendation 3-1: More research is needed on the use of auxiliary data for weighting adjustments, including whether weighting can make an estimate worse (i.e., increase bias) and whether traditional weighting approaches overly inflate the variance of the estimates. In his summary Brick makes the case for the development and refinement of survey theory, suggesting that empirical adjustment methods may work in many cases, but pointing out that they are unsatisfying in several ways (Brick, 2011). Some possible paths to solution would be to develop a more comprehensive survey theory relating response mechanism to nonresponse bias, to develop a more comprehensive statistical theory to adjustment to deal with different statistics, or to create a more multivariate approach to both of these theories. Recommendation 3-2: Research is needed to assist in understanding the impact of adjustment procedures on estimates other than means, proportions, and totals. 3-5

OCR for page 46
Prepublication Copy — Uncorrected Proofs USE OF PARADATA IN ADJUSTMENT There is a growing interest in paradata; that is, data about the process by which the survey data were collected that are collected in the process of conducting the survey. Paradata includes such topics about a survey include the interview (times of day interviews were conducted and how long the interviews took), information about the contacts (how many times there were contacts with each interviewee or attempts to contact the interviewee, the reluctance of the interviewee) as well as survey mode (such as phone, Web, email, or in person). These administrative data have many uses. They help in managing the survey operation (scheduling and evaluating interviewers) and assessing its costs. They are also important for understanding the findings of a survey and making inferences about non-respondents. Indeed, there is a long history in the research literature of collecting additional data (what has become known as paradata) for nonresponse. These administrative data have many uses. They help in managing the survey operation (scheduling and evaluating interviewers) and assessing its costs. They are also important for understanding the findings of a survey and making inferences about nonrespondents. Indeed, there is a long history in the research literature of collecting additional data for nonresponse adjustment. Hansen and Hurwitz (1946) suggested two-phase sampling, with the second phase of sampling looking at nonrespondents using an intensive follow-up of units selected for the second phase. If data can be collected from all the sampled second-phase nonrespondents, then standard two-phase sampling weights can be developed to eliminate nonresponse bias. Even with an incomplete response at the second phase, the potential for bias can be reduced by using information from the additional second-phase sample. A related technique is called response probability adjustments. Politz and Simmons (1949) used the number of call attempts to measure the probability of being at home and then proposed a weight adjustment using these data to reduce nonresponse bias. If the respondents included in such later phases are more like nonrespondents than like the earlier respondents, such cases can be weighted in a way to reduce bias, at a cost of increasing the estimated variance of estimates. Implicitly, such approaches assume that there is one correction that is sufficient for addressing bias in all possible estimates. The response probability adjustment method addresses only the inability to contact the sampled units and has some other issues, but it illustrates the idea of collecting additional data during the data-collection process to reduce nonresponse bias. In many respects, the Politz and Simmons suggestion was a precursor to the modern idea of collecting paradata. Significant advances have been made in the state of the science for using paradata for reducing nonresponse bias Olson (2013). Today there are two main options for reducing nonresponse bias. One is to use paradata to introduce new design features to recruit uncontacted or uncooperative sample members—and, hopefully, respondents with different characteristics— into the respondent pool. The new design features rely on the use of paradata in responsive design. The second approach is to use paradata as the auxiliary data that are then used to adjust the base weights of the respondents. A third use of paradata is to use the data to better understand the survey participation phenomenon so that future surveys may reduce nonresponse, but this use does not result in reducing nonresponse bias. The initial focus of research on paradata was to explore nonresponse rates. The types of paradata that were considered as predictors of response were respondent-voiced concerns, the 3-6

OCR for page 47
Prepublication Copy — Uncorrected Proofs presence of a locked entrance or other safety measures, a multi-unit building, and an urban setting (Campanelli et al., 1997; Groves and Couper, 1998). More recently the work on paradata has taken a new direction and has focused more on the reduction of nonresponse bias by using paradata in responsive designs or in weighting adjustments. Adjustments that are effective in reducing nonresponse bias must be based on data that are predictive of the likelihood of participating in a survey or on the key survey outcomes (Little, 1986; Kalton and Flores-Cervantes, 2003; Little and Vartivarian, 2005; Groves, 2006; Kreuter et al., 2010). The current challenges for paradata research are to enhance the underlying theory (e.g., what paradata are correlated with both response propensities and outcome measures); to better understand measurement error in the paradata and what effect these errors have on the utility of the paradata for reducing nonresponse bias; to operationally assign new tasks for interviewers that are feasible and that do not detract from their ability to conduct the interviews; and to better understand the environment for the interview—doorstep interactions, reasons for non- participation, information about contact persons, available times, interviewer observations of the neighborhoods and the housing unit, household or respondent characteristics, the contact history including the level of effort (number of calls), and the outcomes of calls and their sequence—so that better paradata measures can be developed. Recommendation 3-3: Research is needed on the impact that reduction of survey nonresponse would have on other error sources, such as measurement error. Generally, the quality of paradata is relatively good if the data are automatically generated. When interviewers are asked to collect additional data that are not a byproduct of the data-collection process, there is often a drop in quality. Additional data-collection requirements often lead to substantial missing data rates. Likewise, when interviewer judgment is required, the data are of varying quality (see Casas-Cordero, 2010; Kreuter and Casas-Cordero, 2010; McCulloch et al., 2010; and West and Olson, 2010). Kreuter holds that paradata carry a compelling theoretical potential for nonresponse adjustment. With paradata, the development of proxy variables is possible, and it is also possible to identify large variations in correlations across outcome variables. However, research has shown that although interviewers are good at making observations that are relevant for the primary act of data collection, they can have difficulty in collecting the additional proxy Y’s. According to Kreuter, a case can be made for further collaboration with subject-matter experts, statisticians, psychologists, and field work staff in the refinement of paradata, so these are examples of areas in which further investigation may be fruitful. It would be useful, for instance, to collaborate with substantive researchers to develop interviewer observation measures for labor force surveys (at-home pattern), health surveys (too ill to participate), housing surveys (condition of the dwelling), crime surveys (bars on windows), and educational surveys (literacy). Collaboration with statisticians could help improve statistical models, providing answers to such questions as how to balance multiple predictors of response and Ys, how to handle large and messy data, how to model and cluster sequences of unequal length, and how to address issues of discrete times and mixed processes. Psychologists, in collaboration with survey methodologists, could aid in understanding the factors that drive errors in interviewer observation, how training could improve ratings, how much error can be tolerated, and what the quality is relative to other sources. Finally, collaboration with field work staff could help identify the costs associated with 3-7

OCR for page 48
Prepublication Copy — Uncorrected Proofs paradata collection as well as cheaper alternatives, the risks in interviewer multi-tasking, the appropriate level of observation, and ethical and legal matters issues that need to be resolved. 3-8