National Academies Press: OpenBook
« Previous: Merge File of the Office of Tax Analysis
Suggested Citation:"1966 Merge File for Household Income Data." National Research Council. 1991. Improving Information for Social Policy Decisions -- The Uses of Microsimulation Modeling: Volume II, Technical Papers. Washington, DC: The National Academies Press. doi: 10.17226/1853.
Page 70

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

STATISTICAL MATCHING AND MICROSIMULATION MODELS 70 had to be matched or linked with 33,714 such returns in the SOI. At the other end of the income distribution, 4,277 low-income SOI records had to be matched with 17,647 CPS records. The statistical match is accomplished using a transportation algorithm. The distance measure uses 10 variables, including family size, wage income, property income, and home ownership. Weights are frequently split, so the resulting file has more than 200,000 records. After the merge is completed, the CPS nonfilers are appended. Then families are reconstructed. The resulting file can be used to simulate the behavior of individual taxpayers and their households in a microsimulation model. The merge file is reconstituted on a biennial basis. 1966 Merge File for Household Income Data Okner (1972, 1974) describes a statistical match between the 1966 IRS tax file and the 1967 Survey of Economic Opportunity (SEO), called the 1966 Merge File, in order to develop a “consistent and comprehensive set of household income data.” The SEO population was chosen as file A. It gave a stratified representation of the total U.S. population on a family basis. The income information included data on both taxable and nontaxable sources of income. The SEO also contained rich demographic data, but it was inadequate by itself because the income data were understated for the wealthier families. To remedy these problems a statistical match was performed. First, some pretreatment was necessary. SEO households and individuals who would not have filed an income tax return were excluded from the match. A number of other pretreatments did not interact with the statistical match: these included algorithms to allocate rent, interest, and dividends to the members of a household and allocation of pension income. The IRS and SEO data were then statistically matched. First, tax units were grouped into “equivalence classes” defined by marital status, whether over age 65, number of dependent exemptions, and the reported pattern of income. Unmodified, these groupings would have resulted in over 1,000 different equivalence classes. Instead, the number of equivalence classes was reduced, usually through combination of classes using marital status and an indicator variable for over age 65. The final number of equivalence classes was 74, containing 28,643 tax units. Then, for two records in the same equivalence class, a consistency score (distance function) was computed using factors such as home mortgage interest deduction, interest or dividend income, and farm income. Certain restrictions were used to limit the inconsistency of two potentially matched records. Within the acceptable consistency range, records were matched randomly but proportionally to the sampling weight of the return in the tax file. On a few occasions, no records satisfied the consistency restrictions, and the restrictions were then slightly broadened. This procedure

Next: Statistics Canada SCF-FEX Match »
Improving Information for Social Policy Decisions -- The Uses of Microsimulation Modeling: Volume II, Technical Papers Get This Book
Buy Paperback | $100.00
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

This volume, second in the series, provides essential background material for policy analysts, researchers, statisticians, and others interested in the application of microsimulation techniques to develop estimates of the costs and population impacts of proposed changes in government policies ranging from welfare to retirement income to health care to taxes.

The material spans data inputs to models, design and computer implementation of models, validation of model outputs, and model documentation.

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook,'s online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!