Typically, microsimulation models operate on large databases containing detailed information for individual decision units to obtain the necessary input to simulate current and alternative programs. The quality and scope of such databases are critical to the quality and detail of the estimates the models can provide.
In Chapter 1 Citro compares and contrasts the March Current Population Survey (CPS) with the new Survey of Income and Program Participation (SIPP) from the perspective of their utility for simulating changes to income support programs, such as food stamps and Aid to Families with Dependent Children (AFDC). The March CPS is currently widely used by microsimulation models, but it exhibits a range of data quality problems. SIPP was designed to provide improved data for microsimulation, and policy research generally in the area of economic well-being, but it has been plagued by start-up problems. SIPP clearly will play a role in generating higher quality data for microsimulation, but the precise nature of its role is uncertain at this time.
The data sets provided by specific surveys, such as the March CPS or SIPP, or by specific administrative records systems, such as federal income tax returns, are rarely adequate for microsimulation modeling without enhancement, either to add needed variables or to improve quality. Imputation techniques of various types—such as evaluating a regression equation estimated from another data set—are often used to supply values for one or more missing variables in the primary database. Exact matches of two or more data sets, based on common identifiers such as social security number, have been used to generate more comprehensive databases for models. However, such matches have not been generally available for policy analysis or research use because of concerns about protecting the confidentiality of individual records.
In Chapter 2 Cohen discusses another technique, statistical matching, that has been used to link two or more data sets, in cases in which it is not possible or feasible to perform an exact match. Statistical matching involves strong assumptions about the relationships between the variables that are common to the input files used in the matching and the variables that are unique to each file. However, the technique is worth serious consideration because of the difficulties involved in alternatives, such as exact matches or seeking to expand the content of specific surveys.
Currently, two major types of microsimulation models—static and dynamic—are widely used for social welfare policy modeling. Static models operate on cross-sectional databases that provide a snapshot of the population at one time.