**Suggested Citation:**"Reweighting of File B Data Resulting From Statistical Matching." National Research Council. 1991.

*Improving Information for Social Policy Decisions -- The Uses of Microsimulation Modeling: Volume II, Technical Papers*. Washington, DC: The National Academies Press. doi: 10.17226/1853.

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

STATISTICAL MATCHING AND MICROSIMULATION MODELS 76 applied without consideration given to the matching that was used to create the merged file. For example, Klevmarken (1982) has shown that the parameters of a regression model of the form where X 1 indicates a subset of the matching variables and Y1 indicates a subset of the variables in the first file, using a statistically matched file, are not estimable unless the number of variables in Y1 is fewer than the number of matching variables excluded from X 1. Error Resulting From the Distance Between X(A) and X(B) Another problem with statistical matching is the failure of the matched two records to have identical values for the matching variables, that is, the failure for X(A) to equal X(B). It is obvious that these two vectors will not necessarily agree. This disagreement adds an additional assumption that an analyst must rely on: that the relationship between Z and X is smooth. The discrepancy between X(A) and X(B) is, of course, largest when matches are hardest to find, namely the sparse regions of X-space. These records will find matches generally closer to the center of the data set, adding a bias to the statistical match. One way to remove or reduce this bias is to use a form of parametric statistical matching, for example, through the use of regression. Sims (1978:175) warns: âIn sparse regions we are almost bound to distort the joint distribution in synthetic file formation, unless we go beyond âmatching' to more elaborate methods of generating synthetic observations.â To check the effect of imperfect matching, Sims (1978) suggests the following procedure. Perform the regression Z1 equals b X(B) for some variable Z1 contained in Z. Then compare the output generated from the file [(X(A), Y, Z)] and the file {X(A), Y, Z+b[X(A)âX(B)]}. If the inference is similar, it is likely that matching bias has not affected the data set appreciably. However, if the two data sets produce substantially different results, some accounting for the effects of âfarâ matches is needed. In a related idea, Sims (1974) suggests only matching in areas where the data are dense. Otherwise, regression models could be used, but adjusted by the difference between the regression model and the matched value for the nearest âmatchableâ points. Paass (1985) suggests that one choose a small number of X(A) variables to reduce the size of this bias, since matches are then easier to find. However, this approach will reduce the correlations between the matching variables and the singly occurring variables. Rewe ighting of File B Data Resulting From Statistical Matching A related problem concerns an additional impact of a statistical match on the