Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
DATABASES FOR MICROSIMULATION: A COMPARISON OF THE MARCH CPS AND SIPP 11 1 Databases for Microsimulation: A Comparison of the March CPS and SIPP Constance F.Citro INTRODUCTION Microsimulation models that are used to simulate costs and caseloads for social welfare programs in the United States have historically placed heavy reliance on the March income supplement of the Current Population Survey (CPS) for data input. This survey (augmented with data from other sources) is currently, and has been for many years, the primary database for the Transfer Income Model 2 (TRIM2), the Micro Analysis of Transfers to Households (MATH) model, and the Household Income and Tax Simulation Model (HITSM), among others.1 These models are used to simulate the nation's income support programs, including Aid to Families with Dependent Children (AFDC), supplemental security income (SSI), and food stamps. In the past, these models have also been run on other databases, including the 1967â1968 Survey of Economic Opportunity (SEO), which was used by an early version of TRIM; the public-use microdata samples from the decennial census (both MATH and TRIM created databases from the 1970 census public-use sample); and the 1976 Survey of Income and Education (SIE), which was Constance F.Citro is a staff officer at the National Research Council; she served as study director of the Panel to Evaluate Microsimulation Models for Social Welfare Programs. The author is indebted to several reviewers, particularly Daniel Kasprzyk and Thomas Espenshade, for helpful comments on an earlier draft of this chapter. 1See the Introduction for descriptions of and references for all models referred to in this chapter.
DATABASES FOR MICROSIMULATION: A COMPARISON OF THE MARCH CPS AND SIPP 12 reformatted for use by both MATH and TRIM2. However, the SEO and SIE were never repeated and quickly fell into disuse as microsimulation modeling databases. The census public-use samples contain fewer needed data items compared with the March CPS, are more expensive to use due to their size, and are updated only once a decade. These disadvantages outweighed the additional reliability the larger public-use samples provide for state- by-state estimates. The March CPS is also used for microsimulation modeling for other social welfare policy issues. Specially created exact-match files, which combine CPS records with earnings histories for the same individuals from Social Security Administration records, are the primary database for models such as the Dynamic Simulation of Income Model 2 (DYNASIM2) and the Pension and Retirement Income Simulation Model (PRISM), which simulate retirement income programs, including social security and private pensions. (There are only two publicly available exact-match filesâthe 1973 file, which is used by DYNASIM2, and the 1978 file, which is used by PRISM.) Although Statistics of Income (SOI) samples from federal income tax records are the primary data source for the tax policy models maintained by the Treasury Department and Congress's Joint Committee on Taxation, these agencies regularly match the March CPS data statistically with the SOI. (See Cohen, Chapter 2 in this volume, for a description of statistical matching techniques.) The use of the CPS in the tax models provides needed information that is not contained in the SOI on family composition for tax filers and on families who do not file tax returns. Other agencies, including the Congressional Budget Office and the Census Bureau, which lack access to the complete SOI microdata records, employ the March CPS as the basis for their tax models, imputing or statistically matching tax return data from the more limited public-use version of the SOI onto the CPS. (The Census Bureau's tax model is not used to simulate alternative tax policies but to estimate after-tax income from the CPS.) The TRIM2, MATH, and HITSM models also simulate taxes from the March CPS, supplemented with SOI information, since they are designed to use a common database for modeling taxes together with income support programs and lack access to the full SOI records. Finally, the March CPS has seen limited use in modeling health programs. The Congressional Budget Office has used the recently expanded questions on health insurance coverage, together with other data on the March CPS, to model programs for extending health insurance coverage to noncovered workers and others. The Robert Wood Johnson Foundation sponsored a project with the TRIM2 model, using the March CPS as the primary database augmented with information from Medicaid administrative records, to simulate expansion of the Medicaid program to cover more of the low-income population (Holahan and Zedlewski, 1989). To date, the CPS has reigned unchallenged as the premier microsimulation modeling database for social welfare policies, despite many acknowledged
DATABASES FOR MICROSIMULATION: A COMPARISON OF THE MARCH CPS AND SIPP 13 deficiencies. Over the past decade, efforts have been made to correct some of the deficiencies in the CPS, for example, expanding the income questions that are asked. The problems with the CPSâas a vehicle both for microsimulation modeling and for analysis of income support policies and other social welfare initiatives generallyâwere also the most important impetus for the major development effort that led in 1983 to the first interviews of the continuing Survey of Income and Program Participation (SIPP). At present, work is under way to use the SIPP both to enhance the CPS for microsimulation purposes and as a microsimulation model database in its own right. The Urban Institute used SIPP data to estimate participation probabilities for the TRIM2 food stamp program module and is considering the use of SIPP data to estimate a probit equation to improve further the food stamp participation function in TRIM2. Under a contract with the Congressional Budget Office, the Urban Institute is also investigating the use of SIPP to improve asset imputations in the SSI, AFDC, food stamp, and Medicaid modules; to improve the simulation of intrayear income dynamics in the module that allocates annual CPS income and employment data to months (see Long, 1990); and to improve the AFDC participation function if the SIPP data demonstrate a relationship of participation to multiple benefit receipt and longer-term recipiency. Mathematica Policy Research has developed a model of the food stamp program called FOSTERS (Food Stamp Eligibility Routines) that operates directly on SIPP data. (Researchers at the Urban Institute and the Congressional Budget Office have also built SIPP-based models for food stamps.) The MATH model food stamp program simulations use data from SIPP to impute child care expenses and financial and vehicular assets, and they also use data from the 1979 Research Panel of the Income Survey Development Program (ISDP)âthe predecessor to SIPPâto allocate the annual income and employment information in the CPS to months. The Social Security Administration is actively pursuing development of a SIPP-based model of the SSI program. This chapter compares the March CPS and SIPP on a number of dimensions of data quality in an effort to assess their relative strengths and weaknesses for use as microsimulation modeling databases, principally for income support programs. The chapter first specifies the data requirements of programs such as AFDC and food stamps and briefly describes the design of the two surveys. The chapter then reviews data quality problems in the surveys from the perspective of modeling income support programs and describes the ways in which these problems are currently addressed, either by the Census Bureau or by the modelers themselves. The problems fall under three headings: (1) problems resulting from survey design and data collection; (2) problems resulting from a mismatch of the variable specifications with modeling needs; and (3) problems of needed variables that are missing entirely in one or both surveys. In addition to considering data quality problems by source, the chapter reviews the limited