Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
STATISTICAL MATCHING AND MICROSIMULATION MODELS 71 was repeated seven times, ending with clerical matching of the most difficult cases. The restrictions had to be relaxed for 731 (3%) of the 28,643 tax units. This merge file was discovered to have several problems, especially substantial differences between the amount of income reported by high-income units from the SEO and from the IRS. This problem was resolved by splitting the merge file into a low-income part and a high-income part, consisting of the tax file data. This second part of the file therefore had none of the SEO demographic data. Another problem was that the total income was less than the adjusted family income for 1966. A ratio adjustment was used to resolve this difficulty. The merge file was used primarily to study tax distribution. Statistics Canada SCF-FEX Match The statistical match of the 1970 Canadian Survey of Consumer Finances (SCF) and the 1970 Family Expenditure Survey (FEX), described in Alter (1974), was performed in connection with a research project to better understand income distributions internationally. The methods used borrowed heavily from Okner (1972), but one difference was that in this case the two surveys were designed with the view that the two files subsequently would be statistically matched. The questions were not asked on a single questionnaire because it was believed that the response burden would have had a deleterious impact on the response rate. The SCF is an annual income survey, carried out nationally, which had a sample size of 10,000 in the early 1970s. The FEX was a one-time survey of 14,000 families in 1970. Since these two surveys were designed to be statistically matched, there was only minimal file treatment necessary before the merging was done. The original group of variables considered for matching was determined using a priori knowledge, and the actual matching variables were further selected from the initial group using regression analysis (Alter, 1974:376): If such variables exhibited some explanatory power vis-Ã¡-vis consumption in the Family Expenditure Survey and also vis-Ã¡-vis asset holdings and debt patterns in the Survey of Consumer Finances, only then would they be considered as matching variables. These variables were then ranked in order of importance, and their relative impact was quantified. This ranking and quantifying was done with the help of regression analysis. The conceptual compatibility of the X variables was assured at the design stage of the FEX. In addition, the sampling frames were forced to be identical at the design stage. Finally, the time spread between the two surveys was the smallest possible. The data set was partitioned so that different sets of matching variables could be used for different types of families: four different types identified by the two dichotomies, homeowners versus nonhomeowners and families of two
STATISTICAL MATCHING AND MICROSIMULATION MODELS 72 or more versus unattached individuals. For example, for homeowners, matching made use of mortgage status and home equity. In this statistical match, Y was essentially net worth and Z was essentially total expenditure. Matching specifications distinguished between mandatory matches and desirable matches. Mandatory matches imposed a necessary condition on a match, an additional partitioning on the data set. However, this condition was often relaxed at the later stages of the statistical match. Desirable matches were often associated with continuous variables for which exact matching was unrealistic. For these variables, a tolerance was used; in addition, the matching score used to rank potential matches took the degree of agreement into account This score or distance function for the data set of families living in their own home could be decomposed into five components or decision modules: (1) amounts of major sources of income, (2) total family income, (3) age of family head and place of residence, (4) aspects of home ownership, and (5) attributes associated with families of two or more. Other decompositions pertained to the other three data sets defined by the two dichotomies listed above. Within each decision module, points were awarded for the closeness of the information on the two files. Overall closeness was measured as a percentage of the highest possible score. All potential matches with scores above 95 percent of the maximum were matched. This did not generate enough matches, and as needed, the percentage was lowered to 65 percent. At this point, stage two of the match relaxed some of the mandatory matching rules, and again the percentage needed began at 95 percent and was lowered as needed to 65 percent. Finally, stage three of the match involved a further relaxing of some of the required agreement. This procedure indicates that even when a statistical match is anticipated and planned for, a great deal of effort is often still involved. Alter (1974) concludes with an extensive evaluation of the SCF-FEX statistical match, primarily through summary statistics on the discrepancies of matching information on matched records and the discrepancies of information common to both files but not used in the match, which offer an interesting check on the quality of the match. Alter makes the important point that much of the discrepancy observed could be due to response error. However, the relative lack of information on response error prevented him from analyzing this possibility further. Overall, Alter is relatively pleased with the results from the evaluation. However, he is not completely comfortable with statistical matching (Alter, 1974:393): But even the most useful tool has to be used with discretion. Knowing the origin of such a research tool, the process which forged it, and the constraints which had to be imposed upon it, will help to develop this sort of discretion. Whatever use will be made of the combined SCF-FEX file, hopefully will also be made with this word of caution in mind.