National Academies Press: OpenBook

Improving Information for Social Policy Decisions -- The Uses of Microsimulation Modeling: Volume II, Technical Papers (1991)

Chapter: Reweighting of File B Data Resulting From Statistical Matching

« Previous: Limitations in Modeling
Suggested Citation:"Reweighting of File B Data Resulting From Statistical Matching." National Research Council. 1991. Improving Information for Social Policy Decisions -- The Uses of Microsimulation Modeling: Volume II, Technical Papers. Washington, DC: The National Academies Press. doi: 10.17226/1853.
Page 76

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

STATISTICAL MATCHING AND MICROSIMULATION MODELS 76 applied without consideration given to the matching that was used to create the merged file. For example, Klevmarken (1982) has shown that the parameters of a regression model of the form where X 1 indicates a subset of the matching variables and Y1 indicates a subset of the variables in the first file, using a statistically matched file, are not estimable unless the number of variables in Y1 is fewer than the number of matching variables excluded from X 1. Error Resulting From the Distance Between X(A) and X(B) Another problem with statistical matching is the failure of the matched two records to have identical values for the matching variables, that is, the failure for X(A) to equal X(B). It is obvious that these two vectors will not necessarily agree. This disagreement adds an additional assumption that an analyst must rely on: that the relationship between Z and X is smooth. The discrepancy between X(A) and X(B) is, of course, largest when matches are hardest to find, namely the sparse regions of X-space. These records will find matches generally closer to the center of the data set, adding a bias to the statistical match. One way to remove or reduce this bias is to use a form of parametric statistical matching, for example, through the use of regression. Sims (1978:175) warns: “In sparse regions we are almost bound to distort the joint distribution in synthetic file formation, unless we go beyond ‘matching' to more elaborate methods of generating synthetic observations.” To check the effect of imperfect matching, Sims (1978) suggests the following procedure. Perform the regression Z1 equals b X(B) for some variable Z1 contained in Z. Then compare the output generated from the file [(X(A), Y, Z)] and the file {X(A), Y, Z+b[X(A)−X(B)]}. If the inference is similar, it is likely that matching bias has not affected the data set appreciably. However, if the two data sets produce substantially different results, some accounting for the effects of “far” matches is needed. In a related idea, Sims (1974) suggests only matching in areas where the data are dense. Otherwise, regression models could be used, but adjusted by the difference between the regression model and the matched value for the nearest “matchable” points. Paass (1985) suggests that one choose a small number of X(A) variables to reduce the size of this bias, since matches are then easier to find. However, this approach will reduce the correlations between the matching variables and the singly occurring variables. Rewe ighting of File B Data Resulting From Statistical Matching A related problem concerns an additional impact of a statistical match on the

Next: Iterative Proportional Fitting »
Improving Information for Social Policy Decisions -- The Uses of Microsimulation Modeling: Volume II, Technical Papers Get This Book
Buy Paperback | $100.00
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

This volume, second in the series, provides essential background material for policy analysts, researchers, statisticians, and others interested in the application of microsimulation techniques to develop estimates of the costs and population impacts of proposed changes in government policies ranging from welfare to retirement income to health care to taxes.

The material spans data inputs to models, design and computer implementation of models, validation of model outputs, and model documentation.

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook,'s online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!