Skip to main content

Currently Skimming:

5 Databases for Microsimulation
Pages 123-152

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 123...
... The federal statistical system currently provides a wide range of microdata on which models can draw. Static models of income support programs, such as TRIM2, MATH, and HITSM, have traditionally used the March income supplement to the CPS as their primary database, with information from other surveys and administrative records systems to fill gaps and improve data quality.
From page 124...
... The SOI provides information (for tax filers) about income reported to the IRS, deductions claimed, and taxes paid; the March CPS provides needed information about family and socioeconomic characteristics of tax filers and the nonfiling population.
From page 125...
... In this chapter, we consider the data quality problems that confront current microsimulation models and the kinds of strategies that have been employed to deal with problems of missing, erroneous, and inappropriately specified data, and we present our recommendations for improving data quality in the future. Because of the prominence of the March CPS for microsimulation modeling, our discussion focuses on data quality problems with this survey, particularly in its application for modeling income support programs.
From page 126...
... Net undercoverage rates in the CPS, which amount to about 7 percent of the total population, vary widely: from only 1 percent of elderly white women to 27 percent of young black and Hispanic men. The Census Bureau adjusts for undercoverage in the CPS and other surveys by increasing the household weights to match population control totals by age, race, and sex.
From page 127...
... For some income sources, imputation rates are even higher—as much as one-third of nonfarm self-employment income, interest, and dividend payments are imputed (see Table S-l) .s The Census Bureau supplies values for missing income and other items through use of sophisticated techniques that find the closest match for each nonreporter in the file or use values from a similar neighboring record.6 Even after imputation, however, estimates of recipients and amounts for many income sources in the March CPS fall short of control totals from administrative records.
From page 128...
... N.A Total income, inde- 2,164.9 80.0 20.0 2,402.5 9Q1 pendent estimates Sources with inde pendent estimates Wages or salaries 1,616.3 82.1 17.9 1,632.2 99.0 Nonfarm self-employment 119.8 67.1 32.9 104.1 115.1 Farm self-employment 10.3 78.6 21.4 8.5 121.3 Social security/ 142.3 79.5 20.5 155.2 91.7 railroad retirement Supplemental Security 7.6 82.4 17.6 9.0 84.9 Income Aid to Families with 10.5 87.2 12.8 13.8 76.0 Dependent Children Interest 99.4 66.0 34.0 220.9 45.0 Dividends 27.3 66.4 33.6 60.2 45.4 Net rent and royalties 16.5 77.9 22.1 34.3 48.1 Veterans' payments 8.8 82.6 17.3 14.0 63.3 Unemployment compensation 19.7 80.9 19.1 26.1 75.5 Workers' compensation 6.6 75.0 25.0 14.1 47.0 Private pensions and 34.6 76.1 23.9 54.7 63.3 annuities Federal government and 31.8 75.7 24.3 34.9 91.2 military retirement State and local gov- 13.3 80.3 19.7 20.5 64.7 ernment retirement Sources without independent estimates Estates and trusts 6.7 71.8 28.2 N.A. N.A Alimony and child support 8.3 84.7 15.3 N.A.
From page 129...
... The Census Bureau regularly publishes estimates of sampling error and methods for users to determine sampling error for particular estimates. The modelers currently do not produce estimates of variability in their databases.
From page 130...
... In 1980 the number of income sources identified in the questionnaire was expanded considerably; however, the Census Bureau did not implement a revised processing system to record the income detail on the public-use files until 1988. For files prior to that year, the models had to allocate canbined amounts to specific sources in order to obtain the information needed for simulating AFDC, food stamps, and other programs.
From page 131...
... Microsimulation modelers have long been aware of these venous data quality problems and the possible implications for estimates of the low-income and welfare-eligible populations from the March CPS; they have generally only been able to speculate about the level of error in the estimates and the contribution to error from each source. But it is clear that the simulated eligible populations for the AFDC and food stamp programs developed from the March CPS differ from the caseload as portrayed in administrative data from the Integrated Quality Control System (IQCS)
From page 132...
... Obviously, simple allocation formulas are easy to execute, but they rest on dubious assumptions about lack of variability. Data from the Income Survey Development Program 1979 Research Panel and the 1984 SIPP suggest that 8We have not described one other important data modification procedure that the Census Bureau carries out, namely, weighting the records to agree with population totals.
From page 133...
... are like the typical reporter and, moreover, Hat the variable being imputed does not exhibit correlations with other variables that may be important in subsequent analytical use. The Census Bureau currently applies very complex procedures, which it refers to as statistical matches, to impute values in the March CPS for whole groups of variables such as income and employment-related items.~° The records are classified by a number of characteristics, and the record that is the best match is selected as the "donor" to supply the missing values to the record requiring imputation (~e "host"~.
From page 134...
... Exact Matches In some cases, matching procedures have been used to obtain values for missing items, generally when large numbers of variables are involved. Exact matches it Studies of the hot-deck imputations in the 1984 SIPP panel have revealed anomalous results for participants in the food stamp program because the imputation matrices for program-related variables such as benefit amount and assets did not include measures of low income or participant status.
From page 135...
... . 131he Census Bureau has performed exact matches for internal use, but not for release to outside researchers except under special circumstances.
From page 136...
... Weights can also be modified by applying a simple or complex set of scaling factors, for example, to match population control totals and thereby adjust for undercoverage relative to the decennial census. Another kind of calibration that is used routinely in microsimulation models for income support programs is to adjust the function that selects participants from the pool of simulated eligible units on a baseline file so as to generate simulated numbers of AFDC, SSI, or food stamp cases that closely match the administrative counts.
From page 137...
... Policy analysts and researchers in the Social Security Administration and ASPE provided impetus for a new income survey that would furnish improved data for decision making. Experience with the new microsimulation models that were being developed at that time to evaluate alternative designs for government tax and transfer programs underscored the problems with the March CPS data on income and program participation and gave a further push to the need for a new survey to serve as a database for modeling and policy analysis generally.
From page 138...
... Looking to the longer term, the Census Bureau is planning to redesign the SIPP sample to overrepresent poverty households on the basis of results of the 1990 census. The new design will be implemented in 1995, at the same time as revised sample designs for the CPS and other household surveys based on the census are introduced.
From page 139...
... We argue the merits of a strategy that builds on the strengths of both the March CPS and the SIPP and also brings to bear relevant information from administrative records and other data sources. RECOMMENDATIONS FOR IMPROVING DATA QUALITY We have analyzed in detail the problems of data quality in the March CPS income supplement that confront the major microsimulation models of income support programs.
From page 140...
... Outside researchers as well as Census Bureau staff have conducted a number of analyses of SIPP data quality. The results of the evaluations to date have been summarized in the SIPP Quality Profile, which, now in its second edition (Jabine, King, and Petroni, 1990)
From page 141...
... Data from program administrative records are commonly used as control totals for calibrating baseline simulations, and they also provide the databases for program benefit calculators. For example, data about the characteristics of welfare recipients from the IQCS play a crucial role in calibrating the survey databases used in models of income support programs and also serve as databases for benefit-calculator models of the AFDC and food stamp programs.
From page 142...
... However, the different ways in which they currently extract and use the IQCS data for their models- for example, ASPE works with a year's worth of data from the IQCS, while FNS wows with data for selected months—may be an impediment Recommendation 5-2. We recommend that the responsible agencies sponsor in~epth evaluations of the quality of administrative data that are used as primary or supplemental inputs to social welfare policy microsimulation models.
From page 143...
... Given inevitable resource constraints and burden limitations, achieving this goal will almost always require looking beyond the confines of a particular survey and seeking to relate data from multiple sources, including administrative records. Hence, the challenge in allocating budget resources for enhanced data quality is to achieve the best possible balance among spending for additional data obtained through surveys; spending for additional data obtained from administrative records and other sources; spending for better measurements from surveys and administrative systems, through improvements to questionnaires and procedures; and spending for better databases, through improved techniques for combining data from multiple sources.
From page 144...
... In place of introducing new SIPP panels on an annual basis, the Census Bureau, together with the policy analysis agencies, should consider several possible steps, such as: · adding a low-income sample to the March CPS, which could be done readily in the same manner as the current Hispanic supplement by using cases from earlier months, thereby directly enhancing the utility of the data for microsimulation modeling and other analyses of income support programs; 1 6Cost estimates for SIPP were provided to the panel by Daniel Kasprzyk of the Census Bureau (July 1990) ; cost estimates for the CPS are from Levitan and Gallo (1989:8)
From page 145...
... ; · exploring sophisticated imputations using SIPP data to improve the March CPS information on intrayear income, employment status, and other variables; and exploring matches of administrative data with SIPP and the March CPS in a form that can be made publicly available. Adding a low-income sample and a few questions to the March CPS are steps that could be taken relatively quickly, as could limited experimentation with changes to interviewer instructions or other procedures to try to improve response.
From page 146...
... Alternatives that should be investigated include: · proceeding with the current plan to obtain added resources to restore the SIPP sample size and overlapping panels, beginning with the 1991 panel; and · keeping the SIPP budget at its current level with the 1990 design of fewer, larger panels, while reallocating the added budget to some combination of initiatives, including adding a low-income sample to the March CPS; adding a limited set of questions to the March CPS to ascertain family composition during the income reference year; exploiting the longitudinal information available in the CPS; exploring sophisticated imputations that use SIPP data to improve CPS information on intrayear income, employment status, and other variables; and exploring matches of SIPP and CPS data with administrative records in a form that can be made publicly available. Long-Term Approaches Looking to the longer term, we note that there are plans that could materially affect the role of the March CPS and the SIPP as databases for microsimulation modeling and other policy analyses of social welfare programs.
From page 147...
... Certainly, the longitudinal information on intrayear income dynamics afforded by SIPP is invaluable for research purposes and could, in the long run, support new kinds of microsimulation models that simulate the dynamics of intrayear participation in social welfare programs. However, increased panel length usually entails decreased data quality due to attrition.
From page 148...
... We recommend that these studies focus on improving the databases for modeling and analysis of income support and related social welfare programs. We recommend that the studies review all aspects of the SIPP design (such as the sample size and length of each panel and the extent to which overlapping of panels is desirable)
From page 149...
... Although it has made a number of changes in the two surveys to respond to the data needs of policy analysis agencies, the Census Bureau has not seen its job as preparing analytical databases, distinct from survey files. In other words, the Census Bureau has concentrated on such tasks as weighting and imputation for nonresponse, leaving to the users the tasks of furler processing the survey data, such as correcting income reporting errors, imputing needed variables such as asset holdings or expenditures, and adjusting the data to match administrative control totals.
From page 150...
... statistics from a particular survey such as the March CPS or SIPP and, instead, looks to the goal of publishing the best set of income statistics from all available sources. Briefly, the Census Bureau proposes to use administrative records and other sources to assess the extent and nature of income reporting errors in the March CPS and SIPP.
From page 151...
... , found that AFDC participation rates dropped by about 4 percent in each year because the adjusted database generated more eligible units. In Chapter 3 we recommend that a high priority task be to assess the implications of coverage errors in censuses and surveys for policy analysis 21 As an example of problems in obtaining administrative records' the Census Bureau currently is able to use only a limited set of IRS tax return data.
From page 152...
... Should important effects be determined, we urge the Census Bureau to develop ways to implement coverage error adjustments in the March CPS, SIPP, and other surveys that the models use and to improve the procedures that are currently implemented to adjust for the higher undercoverage in household surveys relative to censuses.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.