National Academies Press: OpenBook
« Previous: Rough Sensitivity Analysis
Suggested Citation:"CONCLUDING NOTE." National Research Council. 1991. Improving Information for Social Policy Decisions -- The Uses of Microsimulation Modeling: Volume II, Technical Papers. Washington, DC: The National Academies Press. doi: 10.17226/1853.
Page 81
Suggested Citation:"CONCLUDING NOTE." National Research Council. 1991. Improving Information for Social Policy Decisions -- The Uses of Microsimulation Modeling: Volume II, Technical Papers. Washington, DC: The National Academies Press. doi: 10.17226/1853.
Page 82

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

STATISTICAL MATCHING AND MICROSIMULATION MODELS 81 matching, although it is possible, but difficult, to apply this technique to the case of constrained statistical matching. Rather than selecting the closest match in file B to each record in file A, identify the closest k records. It is unclear what k should be; it would depend on the size of the classes within which matching is permitted, choosing larger k's for larger classes. It is likely that setting k to values close to 5 would work most of the time. Three statistically matched files can then be created: (1) the usual unconstrained statistical match, using the closest match in file B to every record in file A and assuming conditional independence; (2) a negative conditional correlation statistical match, for which one chooses to match a particular one of the k nearest records in file B to a record in file A, where the record is chosen so that “high” values of Y are paired with “low” values of Z, and vice versa; and (3) a positive conditional correlation statistical match, similar to (2). If there is a particular variable contained in Y and another variable contained in Z that one has primary interest in, “high” and “low” can simply mean above and below that variable's mean. However, if there are several variables contained in Y and Z that are important and if the conditional independence assumption is a concern, then either one could repeat this process for each pair of interest, or one could use a multivariate notion of “high” and “low.” After forming these three statistically merged data files, one would repeat the analysis on each file. If the results were similar, the assumption of conditional independence probably is not crucial; otherwise, the results are open to question. CONCLUDING NOTE The specific application of statistical matching as input into microsimulation models (possibly the most extensive use of the methodology, but certainly not the only one) makes certain demands on the data set that must be recognized when producing statistically matched files for this purpose. Microsimulation models often operate on data sets that are fairly large. If the model is of national scope and is based on individuals or households, files on the order of 50,000 or more are typical. The use of data sets of this size or larger makes constrained statistical matching computationally intensive, especially considering the costs involved with repeating the matching process when estimating the variance of such a process with a sample reuse technique. In addition, the complexity of the policy issues—for example, eligibility for various welfare programs, income taxes, health expenditures—requires that the data sets cover a wide range of variables. If there are a large number of matching variables, say, more than five or six, matching error increases. If there are a large number of Y or Z variables, there are likely to be several uncorrelated pairs, which complicates the choice of a distance function in the match. Furthermore, the extensive use of controlling to accepted totals on the

STATISTICAL MATCHING AND MICROSIMULATION MODELS 82 statistically matched files needs to be considered. Rubin's point about the relative efficacy of constrained versus unconstrained statistical matching depends strongly on whether various control totals are going to be used after the statistical match. Also, Klevmarken's points about the limits of statistical operations that one can safely apply to a statistically matched data set have only been considered in the regression context. His points should also be considered for other models such as logistic regression (found in some participation models of microsimulation models) and iterative proportional fitting. Finally, it is not at all clear what impact processes, such as aging the data, statically or dynamically, or use of various behavioral models, have on a statistically matched data set. There is the possibility that the sensitivity of the results to the conditional independence assumption is heightened through the use of such data-intensive procedures. The use of what one might call “classical” statistical matching in microsimulation models, that is, assuming without evidence the conditional independence assumption, is very likely to misinform. At the very least, some of the sensitivity analysis described above should be performed to assess the likely effect due to failure of the assumption. If the results are not sensitive to the conditional independence assumption, and the bias introduced through the matching process is also tested and considered small, then the results are likely to be useful. In the event that the results are sensitive, to either the conditional independence assumption or the matching bias or both, a “classical” statistical match should not be used. These conclusions are true (almost) regardless of the application of the statistical match. They are even more crucial for statistical matching as input into microsimulation models, since these files are further manipulated by aging routines, monthly allocation routines, behavioral models, various sorts of controlling to independent totals, etc. Rodgers (1984:101) summarized: On the basis of these simulations, which confirm the caution arising from the absence of any mathematical justification for statistical matching, it seems clear that statistical matching may not in general be an acceptable procedure for estimating relationships between Y and Z variables, or for any type of multivariate analysis involving both Y and Z variables. Paass (1985:9.3–15) summarized: At the current state of knowledge SM [statistical matching] is more an art than an exact and reliable technique. Therefore SM methods should be employed only if the CIA [conditional independence assumption] can be verified or replaced by additional information and the demands on the data are not very high. It seems as if microsimulation models place very high demands on data, and those words of caution should be heeded. However, it is important to remember the important function statistical

Improving Information for Social Policy Decisions -- The Uses of Microsimulation Modeling: Volume II, Technical Papers Get This Book
Buy Paperback | $100.00
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

This volume, second in the series, provides essential background material for policy analysts, researchers, statisticians, and others interested in the application of microsimulation techniques to develop estimates of the costs and population impacts of proposed changes in government policies ranging from welfare to retirement income to health care to taxes.

The material spans data inputs to models, design and computer implementation of models, validation of model outputs, and model documentation.

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook,'s online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!