In the redesign of the Survey of Income and Program Participation (SIPP) for 2014, the U.S. Census Bureau made steps toward a number of methodological enhancements. These include a redesigned imputation system and greater use of administrative data, particularly within that imputation system. The SIPP system now has access to a subsystem that can record interviews, and the Census Bureau has continued to conduct incentive experiments to improve response and decrease perceived burden. These enhancements are discussed in this chapter.
In Reengineering the Survey of Income and Program Participation, the National Research Council panel recommended, “the Census Bureau should move to replace hot-deck imputation routines for missing data . . . with modern model-based imputations” (National Research Council, 2009, p. 4; also listed as Recommendation 3-4 in Box 2-3 of this report). Subsequently, the Census Bureau included the development of new model-based imputation methods as part of the reengineered 2014 SIPP. This section describes the new imputation methodology, basing much of its descriptive text on Census Bureau documentation provided in a paper by Benedetto and colleagues (2015). It is important to note that the imputation system developed for the 2014 SIPP uses both model-based imputation and hot-deck imputation. As the report describes below, selected topic flags are imputed using model-based imputation. All other variables are still imputed via hot-deck procedures. It is also important to note that although the SIPP
documentation refers to SRMI (Sequential Regression Multiple Imputation), only a single imputation is provided.
The report begins by describing the hot-deck methods.
Earlier SIPP designs used hot-deck imputation methods1 to handle missing data. In this process, records with complete responses are combined into “donor cells,” where the cells are defined to create homogeneous groupings based on certain prescribed characteristics. When imputation is needed for a record, a “donor record” is sampled from the cell to which the record with missing data would have belonged if it had “reported data.” The donor record’s answers are then used to impute “answers” into the record with missing data. The imputation can be either on a variable-by-variable basis (for item-level gaps in a record), or with a full-record donation.
Benedetto and colleagues (2015, p. 2) point out two major disadvantages of using a hot-deck method on SIPP:
- Due to the need to make cells sufficiently large (for sampling donors), only a limited number of characteristics can be used in creating donor cells. The limited characteristics may not allow the cells to be as homogeneous as one would like.
- When full-record donation is needed (as opposed to the donation of individual items), any information that was provided by the respondent is overwritten. This results in lost information and creates the potential that the resulting data record may be inconsistent with data reported by other household members.
Creating Topic Flags
The imputation process developed for the 2014 SIPP combines both model-based and hot-deck methods. A model-based approach is used to impute topic flags. These flags are based on higher-order questions used to determine relevancy of a topic to the specific respondent/household
1 “Hot-deck imputation is a popular and widely used imputation method to handle missing data. The method involves filling in missing data on variables of interest for nonrespondents (or recipients) using observed values from respondents (i.e. donors) within the same survey data set. Hot-deck imputation can be applied to missing data caused by either failure to participate in a survey (i.e., unit nonresponse) or failure to respond to certain survey questions (i.e., item nonresponse)” (Lavrakas, 2008, p. 315). For additional background on hot-deck imputation procedures, see Andridge and Little (2010) and Brick and Kalton (1996).
before asking (or branching around) an additional group of questions related to that topic. For example, a screener question might ask if a person received Old-Age, Survivors, and Disability Insurance (OASDI) benefits during the reference period. If the respondent answers “yes,” she is asked subsequent questions about OASDI benefits; if she answers “no,” the questionnaire instrument will bypass the additional questions and move on to the next topic area. Based on the screener question, a topic flag is created for this topic, indicating whether subsequent questions were asked and therefore whether the variables associated with those subsequent questions are applicable to this respondent’s data record. Other topic flag screeners ask about such things as disability and labor force status; energy assistance; educational enrollment; health insurance status; veterans, unemployment, and workers’ compensation benefits; dependent care; and participation in a range of social programs.
The system also creates an accompanying allocation flag for each topic flag; the allocation flag indicates whether a valid response was captured, the respondent was not in universe, or the flag was imputed. For respondents who were “in universe” for a particular topic (e.g., for the labor force topic, whether the respondent is of labor force age) but for whom the answer to the screener question is missing, the system imputes the topic flag (yes or no) using information from available household, parent, and spouse characteristics, as well as from administrative records. In most cases this is a model-based imputation, but on occasion a logic-based imputation is possible.2
Topic flags are imputed in the first round. Once imputation of topic flags is complete, respondents who were imputed to “yes” on a particular topic flag have their “downstream” responses to the topic-related questions imputed using a hot-deck method. The donor cells for this hot-deck procedure are constructed using a mix of basic demographics and program-specific cell boundaries. Draws of families, rather than individuals, from the donor cells, help to maintain appropriate within-family correlations.
Details of Topic Flag Imputations
Topic flags are imputed simultaneously based on SRMI methods. Statistical details of this imputation are provided in the appendix of this report. The sources of administrative data are highlighted below.
A major strength of SIPP imputation models is the use of administrative data from the Social Security Administration (SSA) in predicting the value of certain topic flags. These details are described in the SIPP 2014 Panel
2 “Whenever a missing item can be logically inferred from other data that have been provided . . . that information is used to replace the missing information” rather than using the imputation system (U.S. Census Bureau, n.d.-b).
Users’ Guide (U.S. Census Bureau, 2016). Five SSA data files provided important input into the imputation models:
- The Detailed Earnings Record (uncapped income-taxable earnings for each employer that filed a W-2 record from 1978 to 2012) is used to create a measure of total earnings in a given year, count the number of jobs an individual held, provide a measure of self-employed earnings, and provide an indicator of any deferred earnings.
- The Summary Earnings Record (SER), which contains total earnings capped at the maximum earnings taxable for Social Security and Medicare3 from 1951 to 2012, is used to create a count of how many years an individual has worked over his lifetime.
- The Master Beneficiary Record and Payment History Update System are used to create indicators for whether an individual was eligible for and received OASDI payments, including the year an individual started receiving benefits and whether she ever stopped.
- The Supplemental Security Record provides the same information about Supplemental Security Income (SSI) benefits.
- The Numident (a register of all Social Security numbers ever issued in the United States), along with the Master Beneficiary Record and Supplemental Security Record, provided an administrative source of birth date information.
The study panel did not undertake an evaluation that specially addressed the effect of the model-based imputations. In this section, we report some preliminary analyses of the effects reported by the Census Bureau (Benedetto et al., 2015), using data from the household respondents in wave 1 of the 2014 SIPP. The paper was presented in December 2015, so its analyses are based on an early version of the wave 1 data.
Addressing administrative data, the table also compares respondents that were imputed, by topic, with those who responded, based on whether the Census Bureau could link the individual to Social Security data. Approximately 3 percent to 7 percent of in-universe respondents were missing and flagged for imputation (with higher imputation rates for
3 This amount is commonly referred to as the FICA maximum, in reference to the Federal Insurance Contributions Act.
|Topic||Total In-Universe||In-Universe Respondents Imputed||In-Universe, Imputed Respondents Who Linked to SSA||In-Universe, Reported Data Respondents Who Linked to SSA|
|Lump Sum Payments||58 030||6.1||68.1||91.7|
|Private Health Insurance||73,215||6.4||65.7||91.0|
|Military Health Insurance||73,215||7.5||67.1||91.1|
|Medicare Health Insurance||73,215||7.7||67.4||91.1|
|Medicaid Health Insurance||73,215||10.2||73.6||91.1|
NOTES: OASDI = Old-Age, Survivors, and Disability Insurance; SNAP = Supplemental Nutrition Assistance Program; SSA = U.S. Social Security Administration; SSI = Supplemental Security Income; TANF = Temporary Assistance for Needy Families; WIC = Supplemental Nutrition Program for Women, Infants, and Children.
SOURCE: Benedetto et al. (2015, p. 8).
disability payments, retirement payments, and Medicaid health insurance). Of those flagged for imputation, approximately 65 to 70 percent were matched to administrative records. The right-hand column of Table 5-1 shows that about 90 percent of in-universe reporting respondents for each topic matched to administrative records—a considerably higher percentage than for the imputed records.
Benedetto and colleagues (2015) also looked at the imputation of participation in the Supplemental Nutrition Assistance Program (SNAP). With the old hot-deck procedures, analysts often saw an inconsistency between imputed SNAP participation and reported or imputed earnings, with SNAP being imputed to households with earnings that would have made them ineligible for this program. This could be due to an inability to adequately account for SNAP eligibility when forming donor cells. Table 5-2 categorizes SNAP participation into three levels of household earnings (based on 2012 household administrative data). The first two columns compare the distribution by household earnings among people who reported “yes” and those who were imputed “yes,” respectively, to receiving SNAP benefits in 2013. The table shows that households with earnings of ≥$25,000 account for a greater proportion of households imputed as having received SNAP benefits (38%) than households that reported receiving SNAP benefits (22%). In contrast, there is a relatively greater percentage with reported versus imputed SNAP in the share of households with no earnings. At first this appears to be a problem, but Benedetto and colleagues (2015) found that these results were due to differences in the earnings distribution between responders and nonresponders (see middle two columns of Table 5-2). The table shows that a larger percentage of nonresponders were in the ≥$25,000 income category compared to responders (68% and 55%,
|Household Income from Administrative Records||“Yes” Received SNAP in 2013||In-Universe for SNAP||Conditional “Yes” In-Universe for SNAP|
|SIPP Reported||SIPP Imputed||SIPP Reported||SIPP Imputed||SIPP Reported||SIPP Imputed|
|>$0 but <$25,000||43.0||45.9||24.9||25.0||23.5||19.8|
SOURCE: Benedetto et al. (2015, p. 11).
respectively), and a smaller percentage of nonresponders were in the “no earnings” category (7% and 21%, respectively). They conditioned the “yes” to SNAP participation on being “in-universe for SNAP.” The final two columns of Table 5-2 show that 5.3 percent of those in the ≥$25,000 income grouping reported “yes” to receiving SNAP benefits, whereas 6.7 percent were imputed “yes.” In the “no earnings” group, 24 percent reported “yes” while 22 percent were imputed with “yes.” These numbers are much closer than in the first two columns. This analysis suggests that the differences in the distribution of household earnings between imputed SNAP participants and reported SNAP participants (first two columns) appropriately reflects observable differences in the distribution of household earnings for nonresponders and responders.
Table 5-3 illustrates the value that administrative data bring to the imputation process. We again reproduce the analysis and a table generated by Benedetto and colleagues, which examine the presence of a “job” for individuals designated as “in universe” for the labor market. Individuals were asked by the SIPP interviewer whether they worked for pay in 2013. Of those responding to the question, 59.6 percent said “yes” and 40.4 percent responded “no.” For those individuals who did not respond to the question, the system imputed a “yes” for 62.9 percent and a “no” for 37.1 percent. The imputation process provided a higher percentage of “yes” answers than the responses. However, the administrative data reinforces the numbers that were imputed. Column 2 of Table 5-3 presents the results for earnings data from administrative records, also categorized by SIPP survey respondents and nonrespondents. (Again, the administrative data are for 2012, the year prior to the survey reference year, as 2013 data were not
|Overall percentages for cases where SIPP respondent answered the first question about jobs held (94.5% of in-universe respondents)|
|Worked for pay in 2013?||W-2/Schedule C positive earnings in 2012?|
|Overall percentages for cases where SIPP respondent DID NOT answer the first question about jobs held and topic flags were imputed (5.5% of in-universe respondents)|
|Worked for pay in 2013?||W-2/Schedule C positive earnings in 2012?|
SOURCE: Benedetto et al. (2015, p. 12).
available.) For respondents, Column 2 shows that 58.0 percent had positive earnings in 2012, whereas Column 1 shows that 59.6 percent reported holding a job in 2013. For nonrespondents, 61.7 percent had positive earnings in 2012 (from administrative data), compared with 62.9 percent who were imputed (by the SIPP process) as having a job in 2013. The numbers in these comparisons are very close, which illustrates the importance of the administrative data in the imputation process. Absent the administrative data, the imputation system might have imputed a lower percentage based on the SIPP respondents. It would be useful for the Census Bureau to revisit these analyses when administrative data for 2013 become available.
Benedetto and colleagues also drew on an approach to assess the quality of imputation suggested by Bondarenko and Raghunathan (2016). Their approach uses the full, completed dataset to estimate a logistic regression based on whether a surveyed individual responds to an item or not. If the imputations are good under the missing-at-random assumption, then the distributions of imputed and reported values for individuals with similar response propensities should also be similar. Drawing from this approach, Benedetto and colleagues (2015) considered the rate of the “Yes” response for five topic flags (Job Line 1, SNAP, WIC, Disability, and Education Enrollment), broken down by imputed versus reported for each quintile of the predicted propensity of response for that topic. For most cells, the imputed and reported means were not statistically different.4 The one major problem cell is the second quintile of the propensity to respond for SNAP, where 11 percent of respondents reported receiving SNAP versus 20 percent of nonrespondents who were imputed as receiving SNAP. This result indicates that the SNAP imputation model may need further refinement in future iterations of modeling.
Technical Comments and Suggestions
On the whole, SIPP’s use of model-based imputation, in combination with hot-deck imputation, appears to be a significant improvement over hot-deck imputation alone. In particular, the model-based approach allows the incorporation of data from SSA administrative records for a large majority of missing cases. However, the study panel believes there is potential for further analysis and improvement, and provides some technical concerns and suggestions:
- The best comparison between imputation methods would be to recreate a comparison dataset using the previous hot-deck alone
4 We note that statistically significant differences may not be observed because the number of program participants is small, as well as because the statistical discrepancies are small.
imputation method, and compare the resulting estimates with those obtained through model-based imputation.
- While the topic flag imputations demonstrate a potential major step forward, the topic with the highest incidence of missing data—assets—continues to be imputed using hot-deck methods alone. The development of model-based methods for assets—both flags for any holding and values for those flagged as having nonzero holdings—would be a very useful addition in the near term.
- Use of single imputation is improper. It does not account for the uncertainty in the model estimation when the filled-in data are treated as real. Many of the flags currently imputed have relatively little missing data (refer to Table 5-1); the effect of ignoring uncertainty in model estimation on these flags should have little impact. However, for flags such as disability payments—or missingness in asset holdings, if eventually undertaken—the fraction of missing data is sufficiently large that analyses using single imputation and therefore treating imputed values as fully observed data will be anticonservative, yielding confidence intervals that are too narrow and p values that are too small. While the complexities involving the analysis of multiple imputed datasets are nontrivial, statistical software packages increasingly are making options available to handle multiple imputed data. The Survey of Consumer Finances uses and advocates multiple imputation and provides software to others. Consideration of multiple imputations in future releases of SIPP would be an important enhancement.
The remaining three comments relate to obtaining multiple imputed datasets.
- The new imputation procedures use five iterations to obtain convergence of a Markov chain Monte Carlo (MCMC) chain used in imputation modeling. The study panel believes that the use of only five iterations to obtain convergence is likely far too few. There may be confusion in following the literature because Schafer (1999) indicates that five multiple imputed datasets are sufficient for inference as relates to the need for multiple imputation for proper inference—see (3) above. This literature assumes, however, (a) that the MCMC chain has converged and (b) that the imputations are sufficiently far apart in the MCMC chain to be effectively independent. The latter assumption is not an issue in this case, because only one imputation is being used. Draws sufficient to lead to convergence, however, typically number in the hundreds if not thousands; the multiple imputed datasets are then set aside from widely spaced intervals to ensure independence between the
imputations. We understand that the Survey of Consumer Finances and the Survey of Small Business Finances typically use 5 to 20 iterations. A number of methods are available to assess MCMC convergence (Geweke, 1991); perhaps the most common is the use of multiple chains (Gelman and Rubin, 1992). One of these methods could be used to assess convergence for SIPP imputations.
- In Appendix A, the study panel presents the statistical form of the joint distribution of the topic flags. The issue—there is no guarantee that the draws of the missing elements will converge to a single stationary distribution. Whereas some empirical results indicated that the impact of this limitation in practice might be modest (White et al., 2011), more recent work suggests that the impact of incompatible distributions on inference can be severe (Zhu and Raghunathan, 2015). A possible alternative would be to use a latent class model (Vermunt et al., 2008) for the joint distribution of the topic flags. These models assume independence of the topic flag indicators conditional on membership in a latent (unobserved) class, thus allowing clustering of topic flag indicators marginally while reducing the dimension of the joint distribution. Flexible latent class models for multiple imputation have been recently developed that do not require the specification of class sizes in advance (Si and Reiter, 2013).
- Perhaps some Bayesian methods may be appropriate. The current method assumes a simple random sampling design during the generation of the model parameters and imputation of the missing data elements. Such an assumption can lead to bias of both point estimates (when unequal sampling probabilities are ignored) and variance estimates (when clustering and stratification are ignored) (Reiter et al., 2006). While direct incorporation of sample design elements in the model is possible, the use of finite Bayesian bootstrap methods (Zhou et al., 2016) has shown great promise in providing a simple nonparametric approach to incorporate sample design elements in estimation.
RECOMMENDATION 5-1: The U.S. Census Bureau should continue to research, evaluate, and implement modifications and further enhancements to the new imputation system. These potential enhancements include extending the model-based approach to additional variables, such as assets. Further evaluations should also include consideration of the study panel’s suggestions for improving model-based imputation. The effects of various enhancements/modifications should be quantified and provided to data users.
The study panel that authored the 2009 National Research Council report on SIPP was specifically charged to provide recommendations on effective use of administrative records in the reengineered SIPP. After devoting a major portion of its work to this topic, that panel made the following recommendations to the Census Bureau (National Research Council, 2009, paraphrased from pp. 5-6):
- Conduct regular, frequent assessments of SIPP data quality by comparing aggregate counts of recipients and income and benefit amounts from appropriate administrative records, and, when feasible, evaluate reporting errors for income sources by exact-match studies that link SIPP records with the corresponding administrative records.
- Move to replace hot-deck imputation routines for missing data in SIPP with modern model-based imputations, implemented multiple times to permit estimating the variability due to imputation; imputation models for program participation and benefits should make use of program eligibility criteria and characteristics of beneficiaries from administrative records so that the imputed values reflect as closely as possible what is known about the beneficiary population.5
- Request the Statistical and Science Policy Office in the U.S. Office of Management and Budget to establish an interagency working group on uses of administrative records in SIPP.
- Give priority in the near term to indirect uses of administrative records in a reengineered SIPP, but, at the same time, working closely with data users and agencies with custody of relevant administrative records, identify feasible direct uses of administrative records in SIPP to be implemented in the medium and longer terms.
This last recommendation made explicit mention of Social Security and SSI benefit records, “which are available to the Census Bureau on a timely basis,” as “prime candidates for research and development on ways to use the administrative values directly—either to adjust survey responses for categories of beneficiaries or to replace survey questions” (National Research Council, 2009, p. 6).
Echoing these recommendations, administrative records are being used in the 2014 SIPP panel in three principal ways: (1) as predictors in the
5 These considerations for model-based implementation are described in the previous section of this chapter.
model-based imputation of topic flags, as described in the preceding section, (2) to evaluate reported program participation and benefit amounts, and (3) as input variables to identify needed corrections of reported program participation and benefits.
Under an arrangement with the SSA, the Census Bureau has linked Social Security and SSI administrative records to SIPP since the beginning of the survey and continues to link a new year of these program administrative records to each SIPP panel annually. With the development of the Person Identification Validation System, which assigns unique identifiers to survey records and administrative records collected by the Census Bureau, the Bureau has been able to eliminate requests for respondents to provide their Social Security number while increasing the rate (i.e., the ratio of number of successful links to link attempts) at which data users are able to link survey records to administrative records. Linked data have supported multiple studies of Social Security and SSI reporting, but the 2014 panel is the first to use this information in correcting SIPP data for misreporting of program participation.
Researchers from the SSA performed a number of studies of Social Security and SSI reporting using the linked data. In this context, “Social Security” refers more specifically to OASDI, but these three distinct programs are more widely known collectively as “Social Security.” These studies demonstrated, among other things, that some respondents whose administrative records indicated that they received OASDI but not SSI reported that they received SSI benefits—sometimes with and sometimes without also reporting receipt of OASDI benefits. Other respondents whose administrative records indicated that they received SSI but not OASDI benefits reported that they received only OASDI benefits or both OASDI and SSI (Huynh et al., 2002). Some level of respondent confusion between “Social Security” (OASDI) and SSI is not surprising, given the similarity of the program names and the facts that both are administered by the SSA and serve some of the same populations.
Because OASDI serves far more people than SSI, confusion between the two among survey respondents (and field representatives) tends to benefit (increase) SSI reporting more than OASDI reporting. A small fraction of SSI recipients misreporting their program participation as Social Security adds very little to the number of Social Security beneficiaries, but the same fraction of OASDI recipients misreporting their benefits as SSI could have a substantial impact on the reported receipt of SSI.
With the decision to reengineer SIPP, the Census Bureau began a series of field tests to explore design options and develop the final design. Evaluation of the field tests included comparisons of field-test results with the 2008 SIPP panel. Linked SIPP and SSA administrative records showed that 23 percent of the reports of SSI receipt in 2010 and 2011 were false (Giefer et al., 2015). In the first two field tests, covering the same 2 years, the false
positive rates were even higher, at 33 and 31 percent, respectively. A notable change between the 2008 panel and the field tests was a reversal of the order in which questions on the two programs were asked. In the 2008 and earlier panels, Social Security came first. In the field tests, questions on SSI receipt were placed in the event history calendar (EHC) along with questions on other means-tested programs. Questions on Social Security were included in the general income section, which followed the EHC section.
Results from the field tests prompted the Census Bureau to revise the wording preceding the SSI questions in order to offset the effects of asking about SSI ahead of Social Security. The revisions included instructions to the field representatives as well, but these may not have been followed consistently. When Census Bureau researchers linked SSA administrative records to the first wave of the 2014 panel, they found that nearly half of those respondents who reported receiving SSI had no administrative records confirming their receipt of SSI benefits.
The misreporting of SSI receipt was of two general types: double reporting and program swapping. Double reporting involved reporting receipt of both Social Security and SSI benefits when benefits from only one program were being received. Program swapping involved reporting participation in one program when benefits were being received from only the other program. These types of errors occurred in both directions, but misreporting of SSI receipt was more common than misreporting of Social Security receipt (Giefer et al., 2015).
The extent of the misreporting was discovered long before the wave 1 data were ready for release, and the Census Bureau decided to implement corrections in conjunction with the topic flag imputations discussed in the previous section. Predictors used in the topic flag imputations for Social Security and SSI include indicators of program participation obtained from program administrative records. These indicators were imputed for the in-universe respondents who could not be matched to administrative records. Corrections were applied to reported responses as follows. If a respondent reported or was imputed receipt of federal or federally administered state SSI benefits but the administrative data indicated OASDI receipt rather than SSI receipt, the SSI topic flag for that respondent was reset to “no,” and the remaining responses to the SSI section were changed to be consistent with a “no” value for the topic flag. However, if the administrative data indicated no receipt of SSI or OASDI, no correction was made. In all, close to 70 percent of the incorrect “yes” values on the SSI topic flag were corrected (Giefer et al., 2015).
No corrections were made for false-negative SSI reporting. In support of this strategy the Census Bureau notes that only 1 percent of the respondents who reported or were imputed no receipt of SSI benefits had administrative records indicating that they did receive SSI benefits. However, it
should be noted that the number of false negatives was one-third as large as the number of true positives. In other words, correcting the false negatives would have increased the estimated fraction of the population with SSI benefits by about one-third. The false positives that were not corrected were about five-sixths (about 83%) of the number of false negatives. In other words, the two types of SSI reporting errors that remained after the corrections were largely offsetting.
No corrections were made to the Social Security topic flag for either false positive or false negative reporting—not even when the SSI topic flag was changed from “yes” to “no.” The Census Bureau found that two-thirds of the instances of false positive SSI reporting were of the double reporting rather than program swapping variety. That is, two-thirds of the respondents who falsely reported SSI when they actually received Social Security benefits did report—or were imputed—receipt of Social Security benefits. A correction was not necessary for these cases. Program swapping that involved the reporting of only OASDI when only SSI benefits were being received would have offset, to at least some degree, the program swapping in the opposite direction, but the frequency with which respondents with only SSI benefits reported OASDI instead was not presented by the authors.
Efforts to reduce the level of program confusion and its impact on reporting of SSI and Social Security receipt in future waves include a revised introduction to the SSI section that field representatives will read to their respondents, explaining what the SSI program is and how it differs from Social Security. To what extent this may have addressed the problem is unknown to the panel, as results from wave 2 were not yet available when this report was prepared.
The panel commends the Census Bureau and SIPP staff on their work to effectively utilize available administrative data, and we encourage continued efforts. These efforts should include, among others, exploiting the Internal Revenue Service Form 1099-R data on pension income and IRA withdrawals, Internal Revenue Service Form 1099-INT and 1099-DIV data on interest and dividend income, and the SSA Detailed Earnings Record data on tax deferred pension contributions.
RECOMMENDATION 5-2: The Census Bureau should continue to investigate effective use of administrative records, such as Internal Revenue Service data mentioned in this report, to enhance the quality of data collected in the Survey of Income and Program Participation. This research should include identification and correction of both false positives and false negatives. Efforts should continue to obtain administrative data for important state-run programs such as the Supplemental Nutrition Assistance Program.
RECOMMENDATION 5-3: Because of apparent respondent confusion between Social Security (Old-Age, Survivors and Disability Insurance) and Supplemental Security Income payments, questions about participation in these programs should be co-located within the calendar part of the questionnaire so there can be cross-checking of participation in these two programs. Field representatives should be further trained on the differences between these two programs and on effective ways to help eliminate confusion on the part of respondents.
Computer audio-recorded interviewing (CARI) is a technique that pairs audio recordings of interviews with a computer-assisted personal interview (CAPI) instrument. Throughout the interview process, the CAPI instrument collects and controls recordings of the interaction between the interviewer and respondent for questions of interest to analysts. CARI is primarily a tool for monitoring and managing field staff, but it also can have a direct impact on data quality (Thissen et al., 2008). CARI recordings can help survey managers identify the authenticity of the data capture (falsification or curbstoning), the conformance to the protocol by interviewers, and the validity of the data with respect to the intent of the survey (Thissen et al., 2008). CARI recordings are becoming more common in production surveys as the computer technology needed to capture these recordings has become easier to implement and data storage has become relatively inexpensive. Typically, CARI recordings are subsequently behavior-coded by expert coders to indicate where any issues may have occurred in the data collection process.
SIPP’s redesign uses CARI recordings primarily for the purpose of quality control checks: to determine whether the interviewers were conforming to protocols. According to Census Bureau protocols, a respondent must consent to having the interview recorded in order to use CARI on a case. The consent question was recorded for all respondents, but subsequent questions were not recorded if consent was not obtained. In wave 1, 65.62 percent of respondents consented to having their interview recorded; in wave 2, 63.74 percent consented to having their interview recorded. In some cases the interviewer may not have asked for consent from the respondent if the interviewer did not want to be recorded. For SIPP, the Blaise CAPI instrument controlled the CARI recordings, with instrument coders being able to select which parts of the interview would be recorded.
The study panel was particularly interested in listening to selected interviews to help understand outliers and identified data problems. In particular, the panel wanted to examine factors that could lead to seam effects in SIPP results. Unfortunately the panel identified several major limitations
to the current implementation of CARI on SIPP that made the recordings unsuitable for our examinations:
- Files containing digital records are fairly large. Therefore space limitations of the laptops used for the CAPI interviewing required that only limited parts of each interview could be recorded. Due to the major focus on quality control, the 2014 wave 1 interviews focused on recording items at the beginning and end of the CAPI instrument to determine adherence to protocol and also to ensure that falsification did not occur after a break-off. The study panel was more interested in the middle parts of the interview.
- The Blaise instrument only allows the recording of a single question at a time. The recording is automatically started when the interviewer navigates to that screen and automatically ended when the interviewer navigates off the screen. This can often lead to “clipped” recordings as the interviewer proceeds (clicks and moves forward in the instrument) while still conversing with the respondent about the previous question and answer.
- CARI recordings are limited to those individuals on roster line numbers 1 and 2. Additional household members were not recorded.
- Most critical for the study panel’s work, the software plugin that ran the EHC module was not set up to allow CARI recordings in any of the EHC sections.6 This was most unfortunate because this plugin encompasses most of the program participation and job sections for which the panel wanted to examine seam effects.
The study panel initially had a goal to behavior-code items related to program participation, looking for evidence of behaviors that may contribute to seam effects. The limited number of program participation questions recorded made pursuit of this goal impossible. However, the panel listened to a number of interviews that were identified through unedited wave 1 and wave 2 data to have had inconsistent responses to the Social Security participation questions. Although this listening was focused on a relatively small number of respondents, there was evidence of confusion and misleading program definitions provided by the interviewers to the respondents. The lack of clarity and understanding of the differences between SSI and “Social Security” (OASDI), evident in these recordings, on the part of many interviewers likely contributed to overall program misreporting as discussed both by Giefer and colleagues (2015) and in the previous section of this chapter. An early review of these recordings by the Census Bureau
could have (1) allowed correction of the issue and retraining of interviewers while data collection was ongoing and/or (2) identified the issue earlier in the data editing process.
The panel commends the Census Bureau and SIPP staff for implementing an initial CARI methodology in SIPP, but we find the current implementation in need of revision and expansion. In particular, a formalized process of analyses using audit trails along with audio streams at the individual level would be a valuable way to examine concordance between reports and administrative records.
RECOMMENDATION 5-4: The Census Bureau should extend the collection of audio recordings so as to capture as much of the interview as possible for all respondents, including questions within the event history calendar. These recordings should be routinely reviewed for evidence of confusing questionnaire items and factors contributing to seam bias and item nonresponse, in addition to quality control issues such as interviewer falsification, deviations from questionnaire wording, and deviations from standardized interviewing. The use of audit trails along with audio streams would be important for these types of evaluations.
A fourth area of methodological enhancements employed in the 2014 SIPP is the continued experimentation with incentives7 to strengthen survey response. We begin by providing some background on incentive experiments more generally and then talk about the specific experiments conducted by the Census Bureau using the reengineered SIPP.
Singer and Ye (2013) reviewed recent studies (since 2003) on the effects of incentives. Incentives can either encourage individuals to participate or help in following and recontacting longitudinal respondents. For panel studies, incentives are common and are often combined with other response-rate-improvement efforts such as letters to encourage response or just to alert people that the survey is coming. The clear take-away is
7 An incentive is money, gift, gift card, or another thing of value that is given to individuals sampled for a survey to encourage response, to help maintain contact across waves on longitudinal surveys, and/or to merely show appreciation for response. Some incentives are given only after a respondent has completed the proposed tasks (such as completed a survey questionnaire). These incentives are called “conditional.” Other incentives are provided up front and are “unconditional.”
that incentives—both conditional and unconditional—can positively affect response and deter attrition. However, there is little knowledge about the optimal size of incentives, although there is evidence that the incremental response rate gains for increased payments decline with the size of the payment.
Singer and Ye (2013) also raised the issue that incentives are not delivered in a vacuum, and interviewers often know who got them. Obviously this is the case if Census Bureau field representatives/interviewers have discretion over who gets them. This information can affect interviewer expectations and effort. Incentives also can affect data quality, including item nonresponse (which is easily measured). One could imagine these effects going either way, and in fact the evidence reported by Singer and Ye (2013) is mixed. The key concern is that if the likely refusers differ with respect to salient characteristics from the nonrefusers, this systematic difference could cause overall data quality to change if incentives encourage non-refusers to participate. There is little consistent evidence about whether the use of incentives increases or decreases nonresponse bias. Singer and Ye (2013) also discuss the relative advantages and disadvantages of unconditional incentives—that is, those provided upfront whether or not the sampled individual completes the interview. Finally, there is some discussion of whether there are long-term consequences of incentives, either because they condition respondents to expect them or because they may affect interviewers’ expectations and behavior.
Lessons Learned from Other Federal Surveys
The Census Bureau has a rich history of using incentives to improve response rates and reduce attrition, including experiments in SIPP and the Survey of Program Dynamics. This point also holds for a host of other federal surveys including the Medical Expenditure Panel Survey (MEPS), the National Health and Nutritional Examination Survey (NHANES),8 the National Household Food Acquisition and Purchase Survey (FoodAPS),9 the National Survey on Drug Use and Health (NSDUH),10 the National Survey of Family Growth (NSFG),11 the Survey of Consumer Finances (SCF),12 and the Consumer Expenditure Survey (CE).13 We focus attention
8 MEPS and NHANES are conducted by the National Center for Health Statistics.
9 FoodAPS is sponsored by the Food and Nutrition Service and the Economic Research Service of the U.S. Department of Agriculture.
10 NSDUH is sponsored by the Substance Abuse and Mental Health Services Administration of the U.S. Department of Health and Human Services.
11 NSFG is conducted by the National Center for Health Statistics.
12 SCF is sponsored by the Board of Governors of the Federal Reserve System.
13 The CE is conducted by the Census Bureau for the Bureau of Labor Statistics.
on previous studies using SIPP and on the other panel-based surveys, with MEPS and the CE the most relevant.
MEPS has used incentives since its modern incarnation in 1996. The entire 2008 MEPS was included in an experiment to compare results from three incentive payment amounts: $30, $50, and $70. Payment was provided at the end of the interview in each of the five rounds. Response rates were statistically higher for the $50 and $70 group relative to the $30 group, and the $70 group had a statistically significant 4.4 percentage point higher response rate compared to the $50 group. When considering costs, the extra $20 per wave for the $50 group resulted in almost no increase in cost per completed case, while the $70 incentive (an extra $40) resulted in an increase in cost. Consequently, the MEPS 2011 panel used a $50 rather than $30 incentive.
From 2005 to 2007, the CE explored incentives in both the diary and the quarterly survey. In the diary experiment, $20 and $40 unconditional debit cards were provided with an advance letter. This had a small impact, with response rates only 1 percentage point higher. The completed interviews for the incentive groups reported more expenditures and higher-quality information. In the quarterly survey experiment, unconditional incentives of $20 and $40 were compared to a priority mail advance letter only and a control group with first class mail (McGrath, 2006). The larger incentive led to higher response rates (significant at the 5% level on the first wave) and better data (more likely to use records, answered more expenditure questions, required fewer imputations), compared to the control group (Goldenberg et al., 2009; McGrath, 2006). The smaller $20 incentive had no significant impact on response rates. There was some additional interesting information on the use of debit cards. About 30 percent of debit card recipients attempted to cash the debit card prior to the interview. Others reported things, such as they did not have time, they were unclear how to use the debit card, they were not committed to doing the survey, or they had lost or thrown away the card (McGrath, 2006).
Incentive Experiments in SIPP
Earlier Experiments and Results
The SIPP program conducted incentive experiments in 1996 and 2001 to address rising nonresponse rates and incorporated incentives for all cases in 2004. These efforts are reviewed by Weinberg (2002), discussed at length by To (2015) and Creighton and colleagues (2007), and touched upon by Westra and colleagues (2015). Another experiment was added in 2008.
In 1996, the Census Bureau explored unconditional incentives in the first interaction with respondents, “booster” incentives to induce continued
cooperation, and incentives to attriters. These experiments had the following results. Due to perceptions of risk to field representatives, they delivered incentives in the form of vouchers that families had to send back to receive checks in the mail in 1996. Unconditional incentives of $20 reduced household nonresponse relative to the control group (p < .10). Adding an additional $20 booster incentive in wave 7 for households with low income in wave 1 led to an interactive effect in cutting Type A nonresponse (refusals, not at home, temporary absence, or language barrier). The $20 incentive also led to statistically significantly reduced item nonresponse rates relative to no incentive.
Booster payments of $20 or $40 to nonresponding households in later waves helped to improve conversion rates for wave 7 Type A nonrespondents and/or reduce attrition. The experiment here consisted of sending a debit card (unconditional) to the experimental group along with the special conversion letter sent to all Type A noninterviews. The incentives increased conversion by 5.3 percentage points for the $20 payment group and 8.2 percentage points for the $40 group. Effects were larger among refusers relative to the other Type A categories. Of course it is interesting to know how these experiments affected costs, but a full cost accounting was not available. Total costs of the incentive experiments were $415,000 in payments in order to obtain another 880 units with completed interviews at the end of the panel.
In the 2001 SIPP, the Census Bureau explored an incentive schema that incorporated the discretion of the field representative, as well as some conditional and some unconditional incentives. Weinberg noted that the Census Field Division argued that giving the field representatives a say in when to use incentives would be effective. Here the payments were debit cards of $40. Creighton and colleagues (2007) suggested the conditional incentives in waves 1-3 had a positive and significant effect on completion, although Westra and colleagues (2015) said there were no results as to the effectiveness of incentives. The later incentives in waves 4-9 also improved conversion rates. According to Creighton and colleagues (2007), the results of the 2001 incentives led to incentives being made a standard part of the process in the 2004 panel (earlier incentives had been used on subsets of the respondents as an experiment). In the 2008 panel, the Census Bureau tested conditional and unconditional incentives experimentally again, with the conditional one being $40 (for every wave) and the unconditional one being $20 (sent with the advance letter). Here, the unconditional incentive induced more completed interviews (an increase of 1.0-1.8%). The discretionary incentive had a significant effect in waves 4-7.
Incentive Experiments in the 2014 SIPP
For the 2014 SIPP, Census Bureau staff designed an incentive experiment that carried across the initial waves. Overall, the previous experiences with incentives in SIPP suggested that unconditional incentives as well as conditional ones can be effective in increasing response rates modestly, in converting nonrespondents, and in deterring attrition. Incentives also have some unexpected consequences for interviewer behavior.
Drawing on these previous experiments but cognizant that the 2014 SIPP was very different from its predecessors, the 2014 SIPP incorporated a number of experiments at the production level.14 In wave 1, the entire sample was randomized into 4 groups. Groups 1 and 2 received no incentive payment in wave 1. (Group 1 was the control group for all of the experiments; Group 2 received an incentive in wave 2.) Group 3 received a $20 incentive, and Group 4 received a $40 incentive in wave 1. The experimental conditions and results are summarized in Table 5-4. In wave 1, the experiment showed a 1.2 percentage point increase in response for the $20 incentive and a 3.5 percentage point increase for the $40 incentive (difference significant at p < .10). The increase for the $40 payment was also statistically different from that for the $20 payment. Because of evidence that incentives were differentially implemented and effective across different regions, changes in response rates were also tested to see whether they varied within regional office. They did indeed vary, finding no differences in the New York office at all, and no difference for the $20 group in any region except for Atlanta. Census staff also explored differences across subgroups, finding some variation across demographic characteristics.
Wave 2 saw a second set of payments. Group 1 again received no incentive, but in this wave Group 2 received a $40 payment. This group saw an increase in overall response of about 3.7 percentage points, well in line with the effect of the $40 incentive in wave 1 (Group 4). The $20 group (Group 3) from wave 1 obtained no incentive in wave 2 and, disappointingly but perhaps not so surprisingly, had a response rate very close to the Group 1 control (about 0.4 percentage points higher). Group 4, which received a $40 incentive in wave 1, was split for wave 2 into two subgroups, with subgroup 4A receiving no incentive in wave 2 and subgroup 4B receiving a second $40 incentive in wave 2. Both subgroups showed higher response rates compared to the Group 1 control (no incentive) in wave 2,
14 The study panel’s description here of the Census Bureau experiments on incentives is based on (1) two presentations made by Jason Fields to the panel, on October 1, 2015, and October 12, 2016, and (2) the discussion of the experiments by Westra and colleagues (2015), who reported that the Office of Management and Budget required evidence from incentive experiments in the new design before incentives could be implemented as part of the production sample.
|Incentive Treatment||Response Rate (%)||Differences (percentage point)|
|Wave 1||Wave 2||Wave 1||Wave 2a||Wave 1 Difference from Control||Wave 2 Difference from Control|
|Group 1 Control||None||None||69.0||72.7||—||—|
|Group 4 Ab||$40||None||72.5||73.6||+3.5||+0.9|
|Group 4 Bb||$40||$40||72.5||75.7||+3.5||+3.0|
aWave 2 rate calculated based on wave 1 responders.
bGroup 4 was randomized as a single group during Wave 1. For wave 2, Group 4 was split into two subgroups, A and B.
SOURCE: Presentation by Jason Fields, U.S. Census Bureau, to the study panel at its October 2015 public meeting.
with the $40 in wave 1 and $0 in wave 2 group (Group 4A) showing an improvement of 0.9 percentage points and the $40 in both wave 1 and 2 group (Group 4B) having about a 3.0 percentage point higher response rate.
The study panel was given Census Bureau plans for wave 3 incentives but had seen no results as of the time of this report. The plans were to target incentives to those predicted in waves 1 and 2 to have low response rates and the highest increase in response based on an incentive. Nonresponse in waves 1 and 2 was to be predicted in a logistic regression model as a function of the following household variables: metropolitan residence, age, sex, household size, housing tenure, and poverty strata. The model results targeted for incentives households of size 1 and 4 that were nonwhite, low income, and lived in the outskirts of a metropolitan statistical area,15 had female reference persons, renters, and younger reference persons. Incentives for each of the four groups (defined for wave 1) in wave 3 are as follows:
15 Metropolitan statistical areas are defined by the Office of Management and Budget as one or more adjacent counties or county equivalents that have at least one urban core area of at least 50,000 population, plus adjacent territory that has a high degree of social and economic integration with the core.
- Group 1 was given $0 in all waves. This is the control group.
- Group 2 received $0 in wave 1 and $40 in wave 2. Model targeted households in this group received a $40 incentive in wave 3.
- Group 3 received a $20 incentive in wave 1 and no incentive in wave 2. Model targeted households in this group received a $40 incentive in wave 3.
- Group 4A received a $40 incentive in both waves 1 and 2. All households in this group again received a $40 incentive in wave 3.
- Group 4B received a $40 incentive in wave 1 and no incentive in wave 2. Model targeted households in this group received a $40 incentive in wave 3.
The panel commends the Census Bureau and SIPP staff for their research on incentives and encourages continued work to devise an effective long-term plan.
RECOMMENDATION 5-5: The Census Bureau should continue to research and utilize incentive programs in the Survey of Income and Program Participation to promote response and reduce nonresponse bias with the goal of devising an effective long-term plan. The plan should specifically address the effect of incentives on nonresponse bias.
This page intentionally left blank.