Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
STATISTICAL MATCHING AND MICROSIMULATION MODELS 68 for the common variables leads to reduced distortion in the joint distribution of (X, Z) on files created by matching. A related idea, proposed in Singh (1988), develops categorical variables X*, Y*, and Z*, related to X, Y, and Z, for which the conditional independence assumption is assumed to hold, and which are used to define equivalence classes for matching; for details, see Singh (1988). EXAMPLES OF STATISTICAL MATCHES IN MICROSIMULATION MODELS In this section I describe some applications of statistical matching, including the reasons for the match and the particular matching techniques used. The EM-AF Statistical Match It is well known that estimates of the distribution of family money income from household surveys contain serious bias. This bias can be reduced through the use of information from federal individual income tax returns. Radner (1983) describes a statistical match that begins with the March 1973 CPS-Internal Revenue Service- Social Security Administration exact match file (EM). This file was considered to have three limitations: (1) serious response errors in the CPS, (2) few high-income observations, and (3) not enough detail by income type. To address these limitations in the EM, it was statistically matched to the augmentation file (AF), a subsample of the 1972 Statistics of Income (SOI) sample of federal individual income tax returns that had been exact matched with Social Security Administration records containing earnings and demographic data. The EM-AF statistical match can be separated into three fairly distinct steps. First, there was an initial match, using 22 matching variables that included adjusted gross income, interest, dividends, and social security taxable earnings, sex, race, age, number of exemptions, and the use of various schedules. Certain of the characteristics were used to define cells within which distances between records were computed and outside of which no matches were permitted. These cells included an acceptable age range. The distance measure consisted of a s um of weighted discrepancies between the values for the 22 variables for the two files. The AF record that was closest to the EM record was chosen for the statistical match unless the minimal distance was greater than a specified maximum, in which case some cells were collapsed and the age range was eliminated Next, Radner (1983:137) describes: About 6,900 EM records that were considered to have an inconsistent initial match were rematched with the AF because we were not fully satisfied