Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
DATABASES FOR MICROSIMULATION: A COMPARISON OF THE MARCH CPS AND SIPP 50 THE DIVISION OF LABOR FOR PRODUCTION OF DATABASES FROM CPS AND SIPP This section turns from a review of the data quality problems in the CPS and SIPP to a consideration of how these problems are addressed in microsimulation model databases. The Census Bureau carries out many operations on the data before they are released, including such tasks as coding, editing, imputing, weighting, and others. Many of these operations result in improving the quality and utility of the data. Nonetheless, before public-use files from the Census Bureau can be used by models in simulating income support programs for federal policy analysis agencies, a large number of additional steps must be carried out to generate suitable databases. (As an example, Figure 1 lists the steps followed by the TRIM2 model in creating a database each year from the March CPS.) The necessary preprocessing of the data currently accounts for a significant amount of model code and staff and computing resources. Thus, the job of creating each year's baseline CPS file for the TRIM2 model routinely occupies about 3 months of calendar timeâand may require more months if new program features must be modeled or a new version of the CPS convertedâand uses up one-sixth of the staff resources for which the Urban Institute receives funding from the U.S. Department of Health and Human Services in a typical year. Yet even after all of the processing, both by the Census Bureau and in the models, some data quality problems remain. Some of the preprocessing operations performed by the models would be required in any case, for example, converting a file to the internal representation (e.g., packed binary) used by the particular model. Other operations, however, could possibly be performed by the Census Bureau at less overall cost to the federal governmentâbecause the operation would not be duplicated by the various modelsâand with higher qualityâ because, for example, the Census Bureau would have access to detailed administrative data for use in quality enhancement. This section briefly summarizes the current roles of the Census Bureau and the contractors for the major microsimulation models of income support programsâMATH, TRIM2, and HITSMâin enhancing data quality in the CPS. Much of the discussion also applies to SIPP, although using SIPP as a model database would eliminate some steps, such as allocating yearly to monthly values, that are necessary in creating CPS databases for use in models. The topics and order of presentation generally follow those of the discussion above on quality problems in the CPS and SIPP. â¢ Population coverage The Census Bureau partially adjusts for undercoverage in the CPS by ratio- adjusting the weights to match age, race, and sex population control totals derived from the decennial census and vital records. The Census Bureau makes no adjustments for other correlates of the undercount,
DATABASES FOR MICROSIMULATION: A COMPARISON OF THE MARCH CPS AND SIPP 51 such as low income, or for undercoverage in the census itself. The models do not make any further adjustments. > Use the first-generation TRIM software to process the March CPS public-use file as follows: â¢ Convert the CPS household-family-person record structure to a family-person structure. As part of this process, drop all noninterviewed households and households that are made up solely of secondary individuals under age 15, merge children-only families with other primary families, and split very large families. â¢ Convert the CPS character format to a binary format. â¢ Perform various imputations and recodes, including allocating aggregate income amounts to individual sources (not necessary beginning with the March 1988 CPS), creating status definer variables needed for program simulation (such as disability status), and creating tax and transfer program filing unit variables. â¢ If the data are to be aged forward to a future year, carry out that set of operations at this point. > Convert the TRIM file to a TRIM2 file and continue processing as follows: â¢ Run the MONTHS module to allocate yearly employment status and income variables from the CPS across months of the year. â¢ Run the module to adjust interest and dividend income amounts for underreporting (AINC), using as control totals the administrative estimates produced for the national accounts. â¢ After updating all of the program parameters, run the basic master routines, including FEDTAX (federal income tax), SSI, AFDC, FICAT (social security payroll tax), and FSTAMP (food stamps), to produce a baseline file with simulated values for the major tax and transfer programs under current law. â¢ As part of running several of the master routines, use imputation procedures to assign values for missing data items in the CPSâfor example, impute child care expenses for use by the FEDTAX, AFDC, and FSTAMP modules based on equations estimated from the Consumer Expenditure Survey data for 1981â1985. â¢ As the last step, calibrate the simulations of program participants to accord with known control totals from administrative data. FIGURE 1 Steps in generating a TRIM2 database from the March CPS. SOURCE: Developed from TRIM2 documentation (see references in Introduction). â¢ Household and individual nonresponse The Census Bureau carries out weighting and imputation procedures to adjust for nonresponse of sampled households. These procedures, based on analyses of SIPP, do not appear to compensate fully for differential attrition of low-income respondents. â¢ Item nonresponse The Census Bureau performs various kinds of imputations to assign values for missing items due to nonresponse. Again, these procedures, based on analyses of SIPP, do not appear to preserve important relationships among income and other variables for the low-income population. â¢ Reporting errors The Census Bureau conducts research on reporting
DATABASES FOR MICROSIMULATION: A COMPARISON OF THE MARCH CPS AND SIPP 52 errors, such as underreporting and misreporting (although relatively little such research has been conducted in recent years with the March CPS, particularly for transfer income), but it does not correct the data for such errors. The models contain routines for adjusting nontransfer income amounts for underreporting (by using data from the national income accounts as controls), but the routines are not always used and some models adjust more income sources than others. For transfer income amounts, all of the models make a complete adjustment in that they simulate benefits from AFDC, SSI, etc., and, in some instances, ignore reported benefits in simulating program participation. In creating their baseline files, they calibrate the simulated number of participants to accord with selected administrative control totals. (See discussion above; also see discussion in Citro and Ross [in this volume] of the calibration process as a critical part of the simulation of program participation in the MATH, TRIM2, and HITSM models.) â¢ Sampling error The Census Bureau makes various adjustments to the survey weights to help reduce variance, and it regularly publishes sampling error estimates and information for users on how to determine sampling error for particular estimates. The modeling contractors currently do not produce estimates of variability in their databases. â¢ Income accounting period The Census Bureau provides CPS employment and income data as reported, that is, for the previous calendar year; all of the models allocate these variables across months of the year. â¢ Income detail For March CPS files beginning in 1988, the Census Bureau records on the public-use file all of the detailed income amounts by source that have been ascertained in the questionnaire since 1980. Prior to March 1988, the models had to allocate combined income amounts to individual sources. â¢ Household composition reference period Neither the Census Bureau nor the models deal with the problem that the household composition data in the March CPS pertain to the time of the interview, while the employment and income reference period is the preceding calendar year. â¢ Households and families versus program filing units The models all contain extensive code to construct tax and transfer program filing units from the information on household and family relationships obtained in the CPS. â¢ Missing asset data The models all include procedures to construct asset holdings, for example, by imputing a rate of return to asset income. â¢ Missing expenditure data The models all include various procedures for supplying needed data on expenditures that are allowable program deductions, for example, imputing values from regression equations that are estimated with other data sources. â¢ Other missing data As noted above, the models endeavor to supply ma ny kinds of missing data in order to simulate specific features of income