Appendix A
SIPP Data Quality

John L. Czajka


This appendix provides brief summaries of what is known about the quality of data from the Survey of Income and Program Participation (SIPP) in areas that are central to the survey’s principal purposes or major uses. Topics include the following:

  • Income

  • Program Participation

  • Income Receipt from Multiple Sources

  • Wealth

  • Health Insurance Coverage Transitions

  • Attrition

  • Representation of the Population Over Time

  • Seam Bias

  • Imputation

  • Wave 1 Bias

These topics are discussed in the order that they are listed.

INCOME

In comparison with the Annual Social and Economic Supplement (ASEC) to the Current Population Survey (CPS), the official source of income and poverty statistics for the United States, SIPP captures nearly as much transfer income and substantially more self-employment income but



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 127
Appendix A SIPP Data Quality John L. Czajka T his appendix provides brief summaries of what is known about the quality of data from the Survey of Income and Program Participation (SIPP) in areas that are central to the survey’s principal purposes or major uses. Topics include the following: • Income • Program Participation • Income Receipt from Multiple Sources • Wealth • Health Insurance Coverage Transitions • Attrition • Representation of the Population Over Time • Seam Bias • Imputation • Wave 1 Bias These topics are discussed in the order that they are listed. INCOME In comparison with the Annual Social and Economic Supplement (ASEC) to the Current Population Survey (CPS), the official source of income and poverty statistics for the United States, SIPP captures nearly as much transfer income and substantially more self-employment income but 

OCR for page 127
 REENGINEERING THE SURVEY less wage and salary income and substantially less property income. These last two sources dominate earned and unearned income, respectively; as a result, SIPP underestimates total CPS income by 11 percent according to a recent comparison based on calendar year 2002 (Czajka and Denmead, 2008). This underestimation reflects a deterioration in the relative quality of SIPP income data since the survey’s inception. Early SIPP Comparisons of income estimates from the first SIPP panel with the CPS and independent benchmarks were quite favorable to SIPP. In its estimate of aggregate income for calendar year 1984, SIPP captured 99.9 percent as much regular money income—that is, excluding lump sums—as the CPS (Vaughan, 1993). SIPP captured nearly 12 percent more transfer income—a major focus of the survey—and 3 percent more property income. Rela- tive to independent estimates from program administrative data, SIPP captured 101 percent of aggregate Social Security income, 98 percent of Supplemental Security Income (SSI), 82 percent of Aid to Families with Dependent Children benefits, 96 percent of general assistance benefits, 77 percent of veterans’ compensation or pension income, and 87 percent of unemployment compensation. SIPP estimates of aggregate pension dollars by type were between 95 and 103 percent of independent estimates. How- ever, SIPP’s estimate of total earnings, the largest component of total income by far, was 1.8 percentage points below the CPS. Furthermore, SIPP’s shortfall on earned income was the net result of differential performance for wage and salary employment and self-employment. SIPP’s estimate of self-employment income exceeded the CPS estimate by 45 percent, but for wage and salary income SIPP captured 5.3 percent fewer total dollars than the CPS. Relative to an independent estimate from the national income and product accounts (NIPAs), the CPS captured 98 percent of total wage and salary income and SIPP captured 92.6 percent. SIPP’s success with self-employment income was the result of a non- conventional measurement approach that rejected the traditional defini- tion of such income as revenue less expenses (or profit/loss). The SIPP approach grew out of efforts to translate the conventional approach to a subannual reference period, during which revenues and expenses might fluctuate widely—if they were known at all. SIPP staff sought a better measure of the income that business owners obtained from their businesses on a month-to-month basis. Rather than asking about profit and loss, SIPP asks respondents how much they withdrew from each business during each month of the reference period. One consequence of this approach is that

OCR for page 127
 APPENDIX A self-employment income cannot be negative in SIPP.1 In the CPS in the mid-1980s, roughly a fifth of the self-employed reported net losses from their businesses. With respect to wage and salary income, SIPP’s shortfall occurred despite the survey’s finding 1.3 percent more workers than the CPS. The composition of the workers identified by SIPP may have contributed to the difference in aggregate dollars. Compared with the CPS, SIPP found 13 percent more workers who were employed less than full-time, full-year, but 7 percent fewer full-time, full-year workers. SIPP’s success in find- ing part-time and part-year workers seemed to be a direct result of the survey’s more frequent interviews and shorter reference periods relative to the annual interviews and annual reference period of the CPS. The smaller number of full-time, full-year workers in SIPP could also have reflected a more accurate reporting of hours and weeks worked. If that were the case, however, the lower aggregate income obtained in SIPP would have been due entirely to workers reporting lower income from their employment than workers responding to the CPS. SIPP Income Over Time Between 1984 and 1990, the SIPP estimate of total income slipped below 98 percent of the CPS aggregate according to analyses reported by Coder and Scoon-Rogers (1996) and Roemer (2000).2 This reduction was distributed across a large number of income sources, with no single source or small number of sources being primarily responsible for the change. 1 In the 2004 panel, SIPP started to ask separately for the amount of profit or loss over the 4-month reference period and to include this amount in monthly income totals. Net negative income from self-employment—not previously provided in the SIPP public-use files—will now be provided. 2 To estimate aggregate annual income with SIPP, one must sum the monthly amounts reported by respondents who may not have been present—in the sample or even in the popu- lation—for the entire calendar year. There are different ways to do this, and they vary with respect to which months are counted for which persons and what weights are applied to them. Coder and Scoon-Rogers (1996) describe three methods and provide SIPP estimates for all three. None of the three methods is inherently more valid than the others; they just represent different ways of looking at the income data collected by SIPP, although two of the methods are more consistent with the way that SIPP collects income data. The third method, which is designed to resemble the CPS, requires an adjustment for missing months. The first method, which sums the monthly aggregates for all respondents present each month, makes the fullest use of the income data reported for a calendar year, but it yields slightly lower annual income estimates than the other two methods for 1990. Coder and Scoon-Rogers used the third method, and Vaughan (1993) and Roemer (2000) used the first method.

OCR for page 127
0 REENGINEERING THE SURVEY Between 1990 and 1996, a period that saw the introduction of computer- assisted interviewing to both surveys and a major redesign of SIPP, Roemer’s (2000) detailed analysis shows that SIPP fell further behind the CPS. Rela- tive to independent NIPA benchmarks, SIPP estimates of total income dropped only slightly, from 87.1 percent in 1990 to 85.7 percent in 1996 (see Table A-1). More substantial reductions were recorded for property income (9 percentage points) and transfers (6 percentage points). Estimates of pension income increased by a percentage point, as did wages and salaries, but income from self-employment fell from 85 percent of the benchmark in 1990 to 69 percent by 1996. Given that the SIPP concept of self-employment income differs from the conventional concept, the decline should probably be attributed to a growing gap between the two concepts rather than anything in the survey. Finally, it may reflect favorably on some aspects of the SIPP redesign that the estimate of SIPP total income relative to the benchmark rose by a percentage point between 1995 and 1996 after having declined from 87.1 percent in 1990 to 84.8 percent in 1994 and 1995. Over the same period, however, CPS total income increased by 3 per- centage points relative to the benchmark (see Table A-2). CPS estimates of wages and salaries increased from 95.9 to 101.9 percent of the NIPA estimate; property income rose from 62.8 to 70.0 percent; and transfer income increased from 87.6 to 88.3 percent. Pensions declined, however, from 88.9 to 76.6 percent, and self-employment income dropped from 68.5 to 52.6 percent. The biggest increase occurred between the 1992 and 1993 reference years, which coincided with the introduction of computer-assisted interviewing in the CPS. One element of the switch from a paper and pencil instrument was clearly related to the increased amount of income collected: the maximum amount of wage and salary income that could be reported was increased from $499,997 to $2,099,999. Roemer determined that this change alone added 2 percentage points to the CPS income total relative to the NIPA total.3 The combined impact of the SIPP and CPS changes over this period was to reduce the ratio of SIPP to CPS total income to 92.5 percent (see Table A-3). Wages and salaries in SIPP dropped from 94.0 to 89.3 percent of the CPS estimate, although self-employment income increased by 7 per- centage points relative to the CPS. SIPP property income fell substantially, going from 104.0 to 80.9 percent of the CPS. Even transfer income dropped from 105.0 to 97.7 percent of the CPS estimate, but this could be attributed primarily to Social Security income, which fell from 105.6 to 98.4 percent of the CPS estimate between 1993 and 1994. The shift between those two years was owing to an increase in the amount reported in the CPS rather 3 There does not appear to have been a similar issue with respect to the collection of wage and salary income in SIPP, given that annual earnings are constructed from monthly earnings.

OCR for page 127
TABLE A-1 Survey Income as a Percentage of Independent (NIPA) Benchmarks: SIPP, 1990 to 1996 Survey Reference Year Income Source 1990 1991 1992 1993 1994 1995 1996 Total Income 87.1 87.9 84.9 86.9 84.8 84.8 85.7 Earnings 89.6 90.9 86.9 87.4 86.4 86.7 88.4 Wages and salaries 90.1 90.5 88.1 89.0 88.5 88.3 91.0 Self-employment 85.1 94.6 77.7 76.2 70.5 75.0 69.1 Property Income 65.3 60.2 60.5 77.0 60.1 58.9 56.6 Interest 56.7 56.6 56.5 62.1 51.3 51.3 50.2 Dividends 65.8 53.3 50.5 95.9 62.5 65.8 51.0 Rent and royalties 113.1 90.7 90.8 91.2 81.0 69.2 82.0 Transfers 92.0 90.5 89.0 89.4 87.8 87.0 86.3 Social Security and Railroad Retirement 97.1 95.0 93.6 92.7 90.8 90.9 87.9 Supplemental Security Income 83.1 88.6 84.9 82.9 86.0 86.2 101.4 Family assistance 75.6 76.4 69.9 89.1 87.3 85.8 76.3 Other cash welfare 81.9 100.9 81.3 96.6 79.2 95.9 114.0 Unemployment compensation 77.5 83.5 82.4 86.3 84.3 75.7 69.4 Workers’ compensation 67.8 61.5 68.6 59.2 57.8 51.2 71.7 Veterans’ payments 83.1 78.8 79.5 77.5 75.6 72.7 72.9 Pensions 84.6 87.9 84.9 86.9 84.8 84.8 85.7 Private pensions 91.8 85.7 86.7 96.9 103.8 99.5 98.1 Federal employee pensions 75.9 89.8 84.6 86.3 89.0 88.5 75.6 Military retirement 87.4 92.0 83.4 87.3 87.1 85.4 101.6 State and local employee pensions 76.8 84.2 80.1 76.6 77.0 74.3 67.8 NOTE: Survey estimates are based on the Census Bureau’s internal data, without top-coding; however, there are limits on the amount of income that can be reported, which vary by source. SOURCE: Roemer (2000:Table 3b); data from the 1990, 1991, 1993, and 1996 SIPP panels. 

OCR for page 127
TABLE A-2 Survey Income as a Percentage of Independent (NIPA) Benchmarks: March CPS, 1990 to 1996  Survey Reference Year Income Source 1990 1991 1992 1993 1994 1995 1996 Total Income 89.3 89.4 88.0 91.7 92.9 92.2 92.6 Earnings 93.0 93.0 91.3 94.8 96.4 95.1 96.1 Wages and salaries 95.9 96.4 95.6 99.7 101.9 101.4 101.9 Self-employment 68.5 65.3 58.6 58.9 54.8 48.5 52.6 Property Income 62.8 63.6 63.2 69.8 65.7 72.9 70.0 Interest 67.1 68.3 67.6 79.7 72.3 83.9 83.8 Dividends 40.9 45.7 49.2 54.3 54.6 62.6 59.4 Rent and royalties 85.0 74.1 69.8 65.2 64.8 58.7 58.6 Transfers 87.6 86.8 83.6 85.6 89.5 89.2 88.3 Social Security and Railroad Retirement 90.6 88.6 87.1 87.8 92.3 92.0 91.7 Supplemental Security Income 78.9 84.6 75.5 84.2 78.0 77.1 84.2 Family assistance 74.4 74.4 72.2 76.4 73.1 70.5 67.7 Other cash welfare 85.6 77.5 81.6 101.3 105.2 95.8 80.5 Unemployment compensation 79.9 82.5 72.8 77.6 90.0 91.3 81.6 Workers’ compensation 89.5 89.1 82.5 77.0 77.7 69.3 62.7 Veterans’ payments 73.9 82.9 77.7 85.5 84.7 94.9 89.6 Pensions 88.9 85.5 83.1 83.6 83.1 78.2 76.6 Private pensions 98.3 96.3 96.4 98.8 102.7 93.9 93.1 Federal employee pensions 82.7 82.6 84.5 82.7 80.9 77.9 80.8 Military retirement 85.6 84.6 74.3 71.7 76.4 70.6 58.2 State and local employee pensions 78.7 68.5 64.2 66.7 59.6 59.0 57.3 SOURCE: Roemer (2000:Table 2b); data from the 1991 through 1997 ASEC supplements to the CPS.

OCR for page 127
TABLE A-3 SIPP Aggregate Income as a Percentage of March CPS Aggregate Income, 1990 to 1996 Survey Reference Year Income Source 1990 1991 1992 1993 1994 1995 1996 Total Income 97.5 98.3 96.5 94.8 91.3 92.0 92.5 Earnings 96.3 97.7 95.2 92.2 89.6 91.2 92.0 Wages and salaries 94.0 93.9 92.2 89.3 86.8 87.1 89.3 Self-employment 124.2 144.9 132.6 129.4 128.6 154.6 131.4 Property Income 104.0 94.7 95.7 110.3 91.5 80.8 80.9 Interest 84.5 82.9 83.6 77.9 71.0 61.1 59.9 Dividends 160.9 116.6 102.6 176.6 114.5 105.1 85.9 Rent and royalties 133.1 122.4 130.1 139.9 125.0 117.9 139.9 Transfers 105.0 104.3 106.5 104.4 98.1 97.5 97.7 Social Security and Railroad Retirement 107.2 107.2 107.5 105.6 98.4 98.8 95.9 Supplemental Security Income 105.3 104.7 112.5 98.5 110.3 111.8 120.4 Family assistance 101.6 102.7 96.8 116.6 119.4 121.7 112.7 Other cash welfare 95.7 130.2 99.6 95.4 75.3 100.1 141.6 Unemployment compensation 97.0 101.2 113.2 111.2 93.7 82.9 85.0 Workers’ compensation 75.8 69.0 83.2 76.9 74.4 73.9 114.4 Veterans’ payments 112.4 95.1 102.3 90.6 89.3 76.6 81.4 Pensions 95.2 102.8 102.2 103.9 102.0 108.4 111.9 Private pensions 93.4 89.0 89.9 98.1 101.1 106.0 105.4 Federal employee pensions 91.8 108.7 100.1 104.4 110.0 113.6 93.6 Military retirement 102.1 108.7 112.2 121.8 114.0 121.0 174.6 State and local employee pensions 97.6 122.9 124.8 114.8 129.2 125.9 118.3 SOURCE: Tables A-1 and A-2. 

OCR for page 127
 REENGINEERING THE SURVEY than a decline in what was reported in the SIPP.4 However, later analyses of SIPP data matched to Social Security administrative records uncovered a tendency for respondents to report their Social Security payments net of their Medicare Part B premiums, which are deducted from their monthly benefit checks or automated payments (Huynh, Rupp, and Sears, 2001). In an apparent concession to respondents, the SIPP instrument was changed after the first wave of the 1993 panel to explicitly request that Social Security benefits be reported net of the Medicare premiums. The SIPP instrument was revised again for the 2004 panel to collect the amount of the Medicare premium as a separate quantity, which the Census Bureau could then add to the reported net payment to obtain the gross amount. Finally, SIPP pension income increased from 95.2 to 111.9 percent of the CPS estimate due to the decline in pension dollars collected in the CPS. Quality of Wage and Salary Data To gain a better understanding of the biggest source of the discrepancy between SIPP and CPS total income, Roemer (2002) compared both SIPP and CPS annual wages and salaries to the wages and salaries reported in the Social Security Administration’s Detailed Earnings Records (DER) for 1990, 1993, and 1996. Unlike other Social Security wage records, the DER is not capped at the income subject to the Social Security tax, and unlike tax records it includes deferred compensation. Roemer’s comparisons used sur- vey records that had been matched to the DER based on the Social Security numbers reported by SIPP and CPS respondents, allowing an assessment of discrepancies between the survey and administrative records at the micro level. Key findings from Roemer’s analysis include • Distributions of DER wages for the two surveys were very similar, implying that differential sample selection bias was not a factor in SIPP’s lower wage and salary income. • Compared with the distribution of wages in the DER, SIPP had too many individuals with amounts below $30,000 and too few with amounts above $35,000; above $175,000, SIPP had only one-third to one-half as many earners as the DER. • For 1996, the CPS had too few individuals with wages below $10,000, too many between $15,000 and $100,000, slightly too few between $100,000 and $200,000, and too many above $300,000. 4 The increased reporting of Social Security benefits in the CPS lagged by a year the introduc- tion of computer-assisted interviewing; nevertheless, the sudden stepped-up reporting suggests an instrument change.

OCR for page 127
 APPENDIX A • For sample members with both survey and DER wages, 57 percent of SIPP respondents and 49 percent of CPS respondents reported wages below their DER amounts; 3 percent of SIPP and 8 percent of CPS respondents reported wages equal to their DER amounts; and 40 percent of SIPP and 43 percent of CPS respondents reported wages above their DER amounts. • The CPS appears to be superior to SIPP in capturing wages from the underground economy; in 1996, 3.6 percent of CPS wages and 1.8 percent of SIPP wages were reported by persons with no DER wages and no indication of self-employment; for the CPS this frac- tion grew from 2.5 percent in 1993. • The CPS also appears to pick up more self-employment income misclassified as wages; in 1996, 3.0 percent of CPS wages and 1.5 percent of SIPP wages were reported by persons with no DER wages but with DER self-employment income; for the CPS this fraction grew from 2.2 percent in 1993. • Both types of non-DER wages (underground wages and misclassified self-employment) occur at all income levels in both surveys, but the CPS has far more persons than SIPP with non-DER wages at upper income levels. Thus, most of the difference between the SIPP and CPS wage and salary aggregates appears to be due to underreporting of legitimate wage income in SIPP, with misclassified self-employment income and the CPS’s greater reporting of underground income accounting for less than a third of the gap between the two surveys. Speculation about possible reasons for SIPP’s underreporting of wage and salary income has focused on the possibility that the short reference period may lead SIPP respondents to report their take-home rather than gross pay despite the specificity of the questions. The short reference period, which is clearly helpful in capturing earnings from people with irregular employment, may also contribute to omissions of earned income. Roemer (2002) notes that when SIPP asked annual income questions at the end of each year, Coder (1988) found that the 12 months of reported wages for respondents with a single employer totaled nearly 7 percent less than what the same respondents reported in the annual round-up. Income by Quintile For most SIPP users, the quality of the income data in the lower end of the income distribution is far more important than its quality across the entire distribution. Furthermore, estimates of aggregate income for many sources are affected disproportionately by the amount of income captured

OCR for page 127
 REENGINEERING THE SURVEY in the upper tail of the distribution, in which the income holdings for those sources are concentrated. SIPP’s superior capture of transfer income could reflect the survey’s more complete capture of income in the lower end of the distribution generally. To show how SIPP and CPS income estimates compare in different parts of the income distribution, Table A-4 presents estimates of aggregate income, by source, for quintiles of the population based on total family income, prepared for the panel.5 Estimates are presented for 3 calendar years: 1993, 1997, and 2002. The SIPP estimates are from the 1992, 1996, and 2001 panels and, for consistency, are derived from the second year of data in each panel.6 The CPS estimates are from the 1994, 1998, and 2003 supplements. The CPS data for all 3 years were collected with a computer- assisted instrument, whereas the SIPP data for 1993 were collected with a paper and pencil instrument. SIPP data for 2002 were the latest full cal- endar year available at the time the estimates in Table A-4 were prepared. By including comparative estimates for 2002, one can determine if the CPS gains during the first half of the 1990s persisted or whether the second new panel following the SIPP redesign was able to reverse the earlier trend. Unlike Roemer’s estimates in Table A-1, the estimates in Table A-4 are based on public-use microdata files rather than the Census Bureau’s inter- nal files, and the 1993 SIPP estimates are from the second year of the 1992 panel rather than the first year of the 1993 panel. Also, the SIPP estimates in Table A-4 were calculated with the same method of aggregation used by Coder and Scoon-Rogers (1996), which differs from the method used by Roemer (2000) and Vaughan (1993). Differences between the percentages in the total column for 1993 and those reported in Table A-3 for compa- rable sources are due to any or all of these factors. Nevertheless, while there are differences by source, our estimate of SIPP aggregate income as a percentage of CPS aggregate income, at 94.5 percent, compares closely to Roemer’s estimate of 94.8 percent. The question of what happened to the ratio of SIPP to CPS income between 1997 and 2002 is answered by the estimates in the total column. While the ratio of SIPP to CPS total income declined from 94.5 to 89.0 per- cent between 1993 and 1997, the ratio rose slightly, to 89.4 percent, between 1997 and 2002. SIPP wages and salaries declined from 84.6 to 82.4 percent of the CPS aggregate, but this was offset by small improvements in every other source. On the whole, then, the relationships between income aggre- 5 The bottom or first quintile contains the 20 percent of persons with the lowest family incomes. The top or fifth quintile contains the 20 percent of persons with the highest family incomes. 6 The 1996 panel started 2 months late and did not collect data for all 12 months of 1996 for two of the four rotation groups.

OCR for page 127
 APPENDIX A gates in the two surveys appear to have stabilized following the movement that occurred with the introduction of computer-assisted interviewing in the CPS and the redesign of SIPP. If one excludes the top quintile in order to eliminate the impact of dif- ferential topcoding as well as the CPS’s seemingly more effective capture of very high incomes, one finds that the ratio of SIPP to CPS aggregate income increases by 4 to 6 percentage points in every year. SIPP wages and salaries and property income remain well below their CPS counterparts, but their shares of CPS income increase in all years. SIPP self-employment income remains well above the corresponding CPS amount, but the margin declines. For all other sources, the differences in their shares change little or in an inconsistent way when the top income quintile is excluded. Turning to the results by income quintile, one finds, first, that the ratio of SIPP to CPS total income declines progressively from the bottom to the top quintile and does so in every year. Second, in the bottom quintile but no other quintile, the SIPP estimate of aggregate income exceeds the CPS aggregate in every year. Third, also in the bottom quintile alone, the ratio of SIPP to CPS income declines by as much between 1997 and 2002 as it did between 1993 and 1997, dropping from 119.5 to 112.2 percent and then to 105.7 percent of the CPS aggregate. In other words, over a period of only 9 years, SIPP went from capturing 20 percent more income than the CPS in the bottom quintile to capturing only 6 percent more income than the CPS. The 20 percent more income in 1993 included 25 percent more wages and salaries, 157 percent more self-employment income, 22 percent more property income, 7 percent more Social Security and Railroad Retirement income, 12 percent more Supplemental Security Income (SSI), an equal amount of welfare income, 24 percent more income from other transfers, and 44 percent more pension income. By 2002, SIPP was capturing only 9 percent more wages and salaries, 129 percent more self-employment income, 5 percent more property income, 12 percent less Social Security and Railroad Retirement income, 27 percent more SSI (an increase), 20 percent more welfare income (also an increase), 31 percent less income from other transfers, and 98 percent more pension income. In the second income quintile, the SIPP captured 1.5 percent more aggre- gate income than the CPS in 1993, but this dropped to a 4 percent deficit by 1997. Unlike the first quintile, however, the SIPP held ground after that, gaining back a percentage point by 2002. The SIPP estimate of wages and salaries dropped from 100 percent of the CPS amount to 92 percent in 1997 but rose to 94 percent in 2002. Property income fell from 112 to 90 percent of the CPS amount, while Social Security and Railroad Retirement fell from 97 to 90 percent. Other transfers dropped from 90 to 59 percent of the CPS amount. Sizable improvements relative to the CPS were recorded for self- employment, SSI, welfare, and pensions, however.

OCR for page 127
 REENGINEERING THE SURVEY nonresponse did not begin with the 2001 panel, however. The incremental sample loss rate for every wave after the first rose between the 1993 and 1996 panels. At the end of Wave 9, the cumulative sample loss rate for the 1996 panel stood at 32.8 percent versus 26.9 percent in the 1993 panel. The 1996 panel ran three additional waves, but the cumulative sample loss grew by less than 3 percentage points—to 35.5 percent—over those three waves. For comparison purposes, Table A-7 reports nonresponse rates to the CPS ASEC supplement and the labor force survey conducted in the same month.11 Some households that complete the monthly labor force sur- vey do not respond to the supplement. Historically, nonresponse to the monthly labor force survey has been very low. Noninterview rates deviated little from 4 to 5 percent of eligible households between 1960 and 1994 but then began a gradual rise that coincided with the introduction of a redesigned survey instrument using computer-assisted interviewing (U.S. Census Bureau, 2002). By March 1997, the first data point in Table A-7, the noninterview rate had reached 7 percent, but it rose by just another per- centage point over the next 7 years. Over this same period, nonresponse to the ASEC supplement among respondents to the labor force survey ranged between 8 and 9 percent, with no distinct trend, yielding a combined sam- ple loss that varied between 14 and 16 percent of the eligible households. In other words, the initial nonresponse to the 2001 SIPP panel is still 2 to 3 percentage points lower than the nonresponse to the ASEC supplement. But as a measure of how much the SIPP response rates have declined, it took two waves of cumulative sample loss in the 1996 panel to match the nonresponse to the ASEC supplement. A SIPP practice dating back to the start of the survey bears some responsibility for the amount of sample loss after Wave 3 in panels prior to 2001. Households that missed two or three consecutive interviews (depend- ing on the circumstances) were dropped from further attempts. The prin- cipal purpose, initially, was to ensure that all missing waves would be bounded by complete waves, so that the missing waves could be imputed from the information collected in the surrounding waves. Missing wave imputations were performed for the first time in the early 1990s but were discontinued with the 1996 redesign. With rising attrition and the removal of the principal rationale for dropping respondents after two missing waves, the Census Bureau revised this practice during the 2001 panel. Respondents are no longer dropped after missing two consecutive interviews. The impact 11 Until 2001 the CPS supplement that collects annual income was conducted solely in March of each year, but as part of a significant sample expansion, the Census Bureau began to administer the supplement to CPS sample households in February and April that were not interviewed in March.

OCR for page 127
 APPENDIX A TABLE A-7 Nonresponse to the CPS Labor Force Survey and ASEC Supplement, 1997 to 2004 Percentage of Eligible Households Not Percentage of Labor Percentage of All Responding to the Force Respondents Not Eligible Households Sample Labor Force Responding to the Not Responding to the Year Questionnaire Supplement Supplement 1997 7.2 9.2 15.7 1998 7.8 7.2 14.4 1999 7.9 8.9 16.1 2000 7.0 8.0 14.4 2001 8.0 8.5 15.9 2002 8.3 8.6 16.2 2003 7.7 8.0 15.0 2004 8.5 8.2 16.0 NOTE: March 1997 is the first supplement for which the CPS technical documentation reports rates of nonresponse. The nonresponse rate in column 3 is the sum of the nonresponse rate in column 1 and the product of the nonresponse rate in column 2 (divided by 100) and 100 minus the nonresponse rate in column 1. SOURCE: Current Population Survey Technical Documentation, various years. of the new policy is evident in the incremental sample loss rate between Waves 3 and 4, which dropped to 1.2 percent from a level of 3.1 percent in the 1996 panel. By Wave 7 the cumulative sample loss had fallen below that of the 1996 panel, which meant that the survey had retained enough additional sample members to offset both the 5 percentage point higher Wave 1 nonresponse rate and higher attrition between Waves 1 and 2. The 2001 panel maintained a lower cumulative sample loss through the remain- ing two waves. Interestingly, the incremental sample loss rates between Waves 8 and 9 were essentially identical across the four panels at about 1.5 percent. Attrition Bias Numerous studies with SIPP and other panel surveys have documented that attriters differ from continuers in a number of ways (see, for example, Fitzgerald, Gottschalk, and Moffitt, 1998; Zabel, 1998). Most studies of attrition bias have been limited to comparing attriters and continuers with respect to characteristics measured at the beginning of the survey, before any attrition has occurred. Such studies cannot say how much the attriters and continuers differ on characteristics subsequent to attrition, which is critical to knowing how longitudinal analyses may be affected by attri- tion. Another limitation of such studies that is rarely noted is that they

OCR for page 127
0 REENGINEERING THE SURVEY assume that the quality of the data provided by those who will later leave is comparable to that provided by those who remain in the panel through its conclusion. For many characteristics, this assumption is probably valid. But for sensitive characteristics or those that respondents might view as onerous to provide, the validity of the assumption is questionable. Yet another limitation of many attrition studies is that they fail to separate non- respondents who left the survey universe from those who remained eligible to be interviewed. Persons who leave the survey universe—by dying, joining the military, becoming institutionalized (including incarceration), or mov- ing outside the country—have distinctly different characteristics than those who remain in the universe. Administrative records linked to survey data can overcome these limi- tations. Administrative records can provide data on postattrition and even presurvey characteristics, and the values of the characteristics are recorded with very little error, generally. Moreover, any measurement error in the characteristics obtained from administrative records will be independent of attrition status. Finally, most nonrespondents who left the survey universe are identified in SIPP and can be removed from the sample of attriters. Some who cannot be identified in the survey data may drop out of analyses automatically because their administrative records terminate at some point after they have left the survey universe. Vaughan and Scheuren (2002) used Social Security Administration Summary Earnings Records matched to SIPP panel data to compare attriters and continuers with respect to earnings and program benefits over time.12 Even after removing those who left the survey universe, they found that attriters and nonattriters differed markedly with respect to earnings and receipt of program benefits at the beginning of a panel—that is, before any attrition had occurred. Over time, however, these differences attenuated. With enough passing years (longer than a typical SIPP panel, however), the characteristics of those who left and those who continued to respond to the survey converged. This trend suggests that compensating for the impact of attrition on cross-sectional estimates becomes both easier and less impor- tant over time. But the fact that the differences are large to begin with and then diminish over time also implies that attriters experience greater change than nonattriters. Vaughan and Scheuren (2002) concluded that compensat- ing for the attrition bias in estimates of gross change is both important and much more difficult than compensating for differences in net change. To evaluate the effectiveness of the Census Bureau’s nonresponse adjustments, Czajka, Mabli, and Cody (2008) used administrative data 12 Vaughan and Scheuren (2002) examined attrition in the Survey of Program Dynamics, which was selected from the 1992 and 1993 SIPP panels, and continued to interview respon- dents through 2002.

OCR for page 127
 APPENDIX A from the same sources as Vaughan and Scheuren but compared the full sample—using a Wave 1 cross-sectional weight—with the subsample of continuers weighted by the full panel weight, which incorporates adjust- ments for differential nonresponse.13 They found little evidence of bias in estimates of earnings, Social Security beneficiary status and benefit amounts, or SSI beneficiary status and benefit amounts at different points in time. Nor did they find significant bias in selected estimates of change in these characteristics. The implication is that attrition bias in these characteristics is being addressed in the longitudinal weights. It is not possible to evalu- ate the Census Bureau’s adjustments to the cross-sectional weights in the same manner as the longitudinal weights, as there is no attrition-free cross- sectional sample after the first wave. Furthermore, other, lesser known biases due to attrition are not addressed by the weights. For example, Czajka and Sykes (2006) documented attrition bias among new mothers, which contributes to a severe underestimate of the number of infants if the weights of mothers are assigned to their newborn children.14 Attrition by new mothers has been documented in the National Longitudinal Survey of Youth 1997 as well, although, in that survey, becoming a parent was found to be very highly related to returning to the survey after missing an interview (Aughinbaugh and Gardecki, 2007). REPRESENTATION OF THE POPuLATION OvER TIME Although SIPP is fundamentally a panel survey, cross-sectional applica- tions (including analysis of repeated cross-sections) abound and may in fact be more common than true longitudinal uses of the data. For this reason, it is important that users understand the limits to the survey’s representation of the population over time. While the U.S. population is currently growing at a rate of less than 1 percent a year, this net growth is the difference between substantially larger inflows and outflows. SIPP panel members who leave the sample by dying, entering institutions, moving abroad, or moving into military barracks rep- resent the outflows from the population. A priori, there is no reason to think that SIPP underrepresents, overrepresents, or otherwise misrepresents the gross outflows from the population, although one could certainly speculate that respondents who know that they are moving abroad or entering institu- tions may leave before being identified as leaving the survey universe. 13 Persons who leave the SIPP universe are assigned panel weights if they missed no prior inter- views. Such persons will have contributed to both the full sample and the panel estimates. 14 SIPP longitudinal weights are not assigned to persons entering the sample after the calen- dar month to which the weights are calibrated. It is common among users to assign infants the weights of their mothers.

OCR for page 127
 REENGINEERING THE SURVEY To maintain full cross-sectional representativeness over time, how - ever, a panel survey must also obtain—periodically if not continuously—a representative sample of new entrants to the population. New entrants include births, immigrants, and persons returning from abroad. Because SIPP excludes residents of specific types of group quarters (prisons, nurs- ing homes, and military barracks, primarily), new entrants also include persons moving from such quarters into households. SIPP captures births to panel members and, through this mechanism, represents most births to the population over the length of a panel, but its capture of other new entrants is limited to persons moving into households with original sample members. That is, SIPP represents those additional new entrants who join households containing persons who were in the SIPP universe at the start of a panel. SIPP does not represent people who enter or reenter the U.S. civilian noninstitutionalized population if they form new households or join households populated by people who have also joined the population since the start of the panel. What fraction of new entrants other than births is represented in SIPP is unknown and not readily discernible. New entrants are not identified explicitly in the SIPP public-use data files, and, even if they were, none of the SIPP weights is designed to properly reflect their contribution to the population. An estimate of the total new entrant popu- lation, exclusive of births, near the end of the 1996 SIPP panel placed it at about 10 million, or more than 3 percent of the total population (Czajka and Sykes, 2006). This estimate represents how many persons, other than those born to panel members, were in the civilian noninstitutionalized population at the end of the 1996 panel but had not been in the population at the start of the panel. To facilitate cross-sectional uses of SIPP data, the Census Bureau pro- vides monthly cross-sectional weights. These weights include an adjustment for differential attrition and a separate “mover adjustment,” which offsets the weights assigned to persons who join SIPP households. In addition, the cross-sectional weights are poststratified to monthly estimates of the civilian noninstitutionalized population by age, gender, race, and Hispanic origin. This poststratification to demographic controls is a limited attempt to make the SIPP sample consistent with changes in the size and composition of the civilian noninstitutionalized population over time. Poststratification ensures that the monthly SIPP cross-sectional weights will sum to the Census Bureau’s estimates of monthly population totals by age, gender, race, and Spanish origin. It does not ensure that the broader characteristics of the weighted sample will remain consistent with the population over time if the net effect of the gross inflows and outflows is to change the charac- teristics of the population. The implications of these population flows for the representativeness of the SIPP cross-sectional sample over time is unknown, and the issue

OCR for page 127
 APPENDIX A has attracted very little interest. But analysis of the characteristics of SIPP sample members who move out of the population over time indicates that these people differ dramatically from nonmovers with similar demographic characteristics (particularly those of Hispanic origin). This implies a poten- tial for persons moving into the population to differ dramatically as well (Czajka, 2007). Within-panel trends that have been attributed to attrition could very well be owing to the panel’s increasingly less complete repre- sentation of the national population over time as the new entrants omitted from the SIPP grow from zero to as much as 3 percent of the total popula- tion. If so, then a new strategy for weighting SIPP that takes account of the new entrants who are not represented by the survey could improve the quality of inferences supported by the data.15 SEAM BIAS Seam bias describes a tendency for transitions to be reported at the seam between survey waves—that is, between month 4 of one wave and month 1 of the next wave—rather than within waves. Evidence of seam bias was first identified in analyses of the Income Survey Development Program research panels that preceded the SIPP (Callegaro, 2008). Multiple causes have been suggested, and the causes appear to be multiple in nature. The extent of seam bias varies markedly across items, which may reflect differ- ent mixes of causes. SIPP users have adapted their analytical strategies. It is common for those examining behavior over time to take only one data point per wave—either the one calendar month that is common to all four rotation groups or the fourth reference month, which is widely viewed as the most reliable because of its proximity to the interview month. The inference is that there is not enough independent information in the other three months to make them analytically useful or that analysts do not know how to use the limited additional information that they provide. The Census Bureau has tried two alternative approaches to dealing with seam bias: (1) collecting selected data for the interview month as a fifth reference month, which will overlap the first reference month of the next wave, and (2) dependent interviewing. It remains unclear what the Census Bureau has learned from collecting the additional month of data. These data are not 15 I f the survey with its current cross-sectional weights underestimates poverty in the full population, for example, because it underrepresents 10 million people with a very high poverty rate, then one strategy would be to exclude the 10 million from the weighted popula- tion total so that the poverty rate estimated from the survey provides a better reflection of the population to which the weights sum. An alternative strategy, if the characteristics of the 10 million can be known sufficiently well to be replicated within the existing survey sample, is to revise the cross-sectional weighting of the sample to better reflect the characteristics of the total population.

OCR for page 127
 REENGINEERING THE SURVEY released on the public-use file, and it is not apparent that the Census Bureau has made use of this information in editing responses, which might have moved the seam by one month but not reduced it. However, the Census Bureau appears to have had some success with dependent interviewing, in which respondents who reported participation in a program at the end of the previous wave are informed of their prior wave response and asked if they were still participating 4 months earlier. Specifically, dependent inter- viewing has helped to lower the frequency of transitions at the seam by reducing the number of reported transitions rather than shifting their loca- tion (Moore et al., 2009). However, dependent interviewing has given rise to other problems during its application to the 2004 panel, and the Census Bureau has suspended its use in SIPP. IMPuTATION Item nonresponse is higher on income questions than on most other types of questions.16 Since the start of SIPP, item nonresponse to income questions in surveys has increased dramatically. This is reflected in the pro- portion of total income that is imputed. growth of Imputation Over Time In 1984, just 11.4 percent of total money income in SIPP was imputed (Vaughan, 1993). Even then, however, imputation rates varied widely across income sources. Income imputation was lowest for public assistance (7.5 percent) and highest for property income (23.9 percent). The single highest imputation rate occurred for dividends (46.8 percent), a compo- nent of property income. The imputation rate for wage and salary income was among the lowest at 8.8 percent. Imputation rates in the CPS were higher—in large part because the Census Bureau imputes the entire ASEC supplement for respondents who complete only the brief monthly labor force survey that precedes the supplement. In March 1985, 20.1 percent of total CPS ASEC income for 1984 was imputed—including 17.9 percent of wage and salary income. Between 1984 and 1993, imputation rates for SIPP income increased substantially, growing to 20.8 percent for total income and 17.7 percent for wages and salaries, or double the rate in 1984 (see Table A-8). The imputa- tion rate for property income, 42.4 percent, approached the very high level recorded by dividends in 1984. The low imputation rate for public assis- tance as a whole grew to more than 13 percent for SSI and welfare. 16 Item nonresponse on asset questions is even higher.

OCR for page 127
 APPENDIX A TABLE A-8 Proportion of Income Imputed, by Source: SIPP and CPS, Selected Years Survey Reference Year Income Source 1993 1997 2002 SIPP Total Income 20.8 24.0 28.6 Wages and salaries 17.7 20.5 24.9 Self-employment 29.3 32.7 36.4 Property income 42.4 42.9 49.7 Social Security and Railroad Retirement 22.6 22.7 28.8 Supplemental Security Income 13.2 16.4 22.6 Welfare income 13.8 31.2 32.8 Other transfers 20.8 33.0 33.6 Pensions 23.7 37.3 47.3 CPS Total Income 23.8 27.8 34.2 Wages and salaries 21.5 24.8 32.0 Self-employment 34.6 39.5 44.7 Property income 42.4 52.8 62.6 Social Security and Railroad Retirement 24.1 27.9 35.5 Supplemental Security Income 22.9 19.7 28.0 Welfare income 19.8 18.1 29.2 Other transfers 23.3 23.9 31.4 Pensions 24.2 27.0 35.4 SOURCE: The 1992, 1996, and 2001 SIPP panels and the 1994, 1998, and 2003 CPS ASEC supplements. Between 1993 and 2002, the proportion of total income that was imputed increased by 8 percentage points. The increase in imputation rates by income source was very uneven. The income imputation rates for welfare, other transfers, and pensions surged between 1993 and 1997. For welfare, the imputation rate more than doubled, rising from 14 to 31 percent. For other transfers and pensions, the imputation rates increased by more than half, reaching 33 percent for other transfers and 37 percent for pensions. Yet there was no increase in the already high imputation rate for property income, and the imputation rates for wages and salaries and self-employment income increased by only 3 percentage points. Between 1997 and 2002, the imputation rate for pension income grew another 10 percentage points, taking it very near the imputation rate for property income, which grew by 7 percentage points to nearly 50 percent. Imputa- tion rates for both wages and salaries and self-employment income grew by an additional 4 percentage points.

OCR for page 127
 REENGINEERING THE SURVEY Income imputation rates in the CPS grew more modestly than those in SIPP between 1984 and 1993 but then increased by 11 percentage points between 1993 and 2002. Imputation rates for all but two sources increased by about the same amount. The exceptions were property income, for which the imputation rate increased by 22 percentage points to 62.6 per- cent, and SSI, for which the increase was only 5 percentage points. Quality of Imputation The growing share of income that is imputed in these surveys makes it increasingly important that the imputations be done well. Both SIPP and the CPS have relied heavily on flexible hot-deck imputation procedures to impute missing items. Hot-deck imputation procedures replace missing values with values selected from other records—called donors—that are matched on a prespecified set of characteristics that form a large table. Flexible hot-deck procedures can combine the cells of a table, as neces- sary, to find donors when many of the cells are empty. Nevertheless, when item nonresponse is high—as it is for income and assets—the amount of collapsing that may be required to achieve matches reduces the quality of the imputations. While the hot-deck algorithms that the Census Bureau employs can incorporate a large number of potentially relevant variables, the variables used to match donors to the records being imputed are not tailored, gener- ally, to the items being imputed. For example, Doyle and Dalrymple (1987) demonstrated that by not taking into account reported Food Stamp Program benefits when imputing major components of income or by not taking account of income eligibility limits when imputing FSP benefits, the Census Bureau was imputing FSP benefits to households with incomes well beyond the eligibility limits or imputing high incomes to households that reported the receipt of FSP benefits. In response, the Census Bureau made improve- ments to address this particular problem as well as other related problems. With the 1996 redesign and the need to rewrite numerous programs to run on the expanded, reformatted file, some of these enhancements appear to have been lost. In January 2003, for example, SIPP estimated that more than 400,000 adult FSP participants were in families with incomes four times the poverty level. FSP receipt was imputed to 62 percent of these persons compared with less than 7 percent of the estimated 6.3 million FSP participants with family incomes below poverty (Beebout and Czajka, 2005). This suggests that the Census Bureau is not taking sufficient account of income when imputing FSP receipt. Similarly, $1.1 billion in welfare income was imputed in SIPP to families in the top income quintile in 2002 (Czajka, Mabli, and Cody, 2008). More than a third of all imputed welfare dollars went to families in the top income quintile in that year. This is com-

OCR for page 127
 APPENDIX A parable to only $10 million in welfare income imputed to the top income quintile in the CPS in the same year, or less than 1 percent of total imputed welfare dollars. In the years immediately preceding the 1996 redesign, the amounts of welfare income imputed to families in the top quintile were similar between SIPP and the CPS. WAvE 1 BIAS Since the redesign, each new SIPP panel (1996, 2001, and 2004) has started with a monthly poverty rate that was at least 2 percentage points higher than the poverty rate in the final wave of the preceding panel (Czajka, Mabli, and Cody, 2008). Undoubtedly, a number of factors con- tribute to this result, but one that has emerged with the most recent panels involves a possible understatement of income in Wave 1. Both the 1996 and 2001 panels showed a percentage point decline in the poverty rate between the first and second waves. In the 1996 panel, poverty continued to decline in the presence of an expanding economy, but in the 2001 panel there was no further decline in the poverty rate after the second wave. In the 2004 panel the Wave 1 to Wave 2 reduction was nearly 2 percentage points. Seasonal swings in income provide an obvious explanation, but the 1996 panel started 2 months later in the year than the 2001 and 2004 panels. Panel surveys may be subject to a “time-in-sample” bias. Through repeated interviews, respondents may become better respondents as they learn what is expected of them. They may also become bored or learn how to avoid lengthy segments of the interview. Prior to the 1996 redesign, the Census Bureau compared data from overlapping waves in successive panels in a search for evidence of a time-in-sample bias in the reporting of income and benefit receipt in the SIPP. The research yielded no evidence of time-in-sample bias in SIPP (Lepkowski et al., 1992). With the elimination of overlapping panels, it is not possible to replicate this research on more recent SIPP data. While there may be no evidence of a time-in-sample bias in earlier SIPP panels, there is a strong suggestion of some type of change in the reporting or perhaps processing of income data between the first two waves of more recent panels. Czajka, Mabli, and Cody (2008) compared poverty status between the first two waves of the 2004 panel in an effort to determine what role attrition and other sample loss might have played in the 1.8 percentage point decline in poverty. They found that changes in recorded poverty among persons present in both waves accounted for 87 percent of the net reduction in the number of poor between the two waves. Between Waves 2 and 3, a much smaller reduction in the number of poor (0.3 percentage points) could be attributed in large part to fewer gross exits from poverty—that is, fewer sample families reporting increased incomes.

OCR for page 127