APPENDIX
J
Content and Quality of Federal and State Administrative Records
The Bureau of the Census has established an Administrative Record Information System (ARIS) (for more information on ARIS, see Gates and Palacios, 1993) that provides current information about the content, nature, and availability of over 60 federal administrative record systems and more than 400 state systems.
Table J.1 summarizes the subject content (population) available from a selected set of federal administrative records. These files are comprehensive in their coverage for specific universes and, in total, are estimated to or are likely to include most if not all of the population usually enumerated in the census (excluding the homeless and the institutional populations that can be obtained from other administrative records). These files would play important roles in any census activities involving administrative records.
Not included in this summary are a number of special files with restricted universes, e.g., Veterans Administration files (disability and education files), Office of Personnel Management, and the Indian Health Service, which may have some utility but lack the broad scope and appeal for present purposes.
FEDERAL FILES
The federal files summarized in Table J.1 include:
-
Internal Revenue Service (IRS) Individual Master File. This is essentially the information available on each individual income tax return (1040). Furthermore, matching to other information returns—1099s, W-2s, and other
-
IRS documents—would provide links to employers and place of work and considerably increase population coverage.
-
Master Beneficiary Record and the Supplemental Security Record of the Social Security Administration (SSA). Provides some income items not always reported in IRS returns as well as information for many nonfilers. Monthly benefit amounts for Social Security recipients and Supplemental Security Income (for very low income recipients only) are included.
-
Numident File (SSA). This is the basic file of Social Security numbers (SSNs) assigned. Information obtained from application for SSN (the SS-5) form. This is the key file for finding an individual's SSN and for other matching and linking purposes.
-
Summary Earnings File (SSA). This is where individual earnings received under covered employment are posted and maintained as a permanent record for computing Social Security benefit entitlements.
-
Health Insurance Master Entitlement File (HCFA). This is the "medicare" file and provides comprehensive coverage for those 65 years and older.
Some content issues (even for the limited amount available):
Address: Mailing address with major overlap with home address. It's been estimated that 10-20 percent of IRS addresses on individual tax returns may not be the address of residence. Work with information documents may reduce this percentage considerably. Similarly, addresses of beneficiaries include many financial institutions, but it may be possible to obtain home addresses from the Health Care Financing Administration (HCFA) files. There is the further issue of reference date of address relative to census day.
For census purposes the ability to geocode addresses to the smallest lands of geography is paramount, and this ability will be affected by the nature of addresses. Research carried out by the staff of the population division of the Census Bureau using a 1-in-1,000 sample of the 1988 individual income tax file (Form 1040) informs us on this aspect of the problem. Specifically, the results showing the types of addresses in the IRS files were as follows: city style, 81.3%; rural routes, 9.0%; and P.O. boxes, 7.7%.
The percentages varied significantly by state, with 9 states having 90 percent or greater city-style addresses, and at the other extreme, 6 states with less than 50 percent of city-style type. Furthermore, less than half the counties had such addresses in excess of 50 percent. Continuing research at the Census Bureau suggested that TIGER/Address Control File (i.e., TIGER update with the 1990 Census Address Control File) should be able to code 64.4 percent of all IRS addresses (Form 1040) in the United States. About 9 percent are not codable because of rural routes, 8 percent P.O. boxes, 2 percent for other reasons; and 17 percent were potentially codable but not coded mainly because of street misspellings, bad abbreviations, or miskeying. Improved address standardization
should reduce these problems to a minimum. Present work on enhancing TIGER and proposals for developing and maintaining a continuous updated master address file (MAF) should overcome present shortcomings due to the nature of addresses and in the coding systems (see Schneider, 1992; Sater, 1992, 1993).
Race: Uncertain quality and poor coverage especially for new birth cohorts. Before 1980, SSA obtained only three categories: white, black, and other. After 1980, the SS-5 calls for 5 categories: white (non-Hispanic), black (non-Hispanic), Hispanic, Asian and Pacific Islanders, American Indian, or Alaskan Native. Furthermore, in recent years, race is not provided for those applying for SSN for their children at birth using the birth record. Overall, race is not reported in the SSA files as follows:
Current beneficiaries (approximately 42 million) |
1.5% |
Supplemental Security Income (SSI) recipients (6 million) |
3.3% |
Wage earners (130 million) |
3.0% |
Social Security numbers issued 1980-1991 (90 million) |
15.2% |
Self-reporting, third-party reporting (birth and death records for example), and consistency of reporting over time also affect comparability of data between and within various records systems.
Relationship and household composition: Ability to reconstruct households and family composition from the records requires considerable research.
Occupation: Not clear on comparability with census classifications.
Industry and class of work: Presumably available from employer's name and employer identification number (EIN), but reference time not clear.
Note again that the summary is based on information extracted from the Census Bureau ARIS file. Discussions with program administrators may refine or modify some of the entries.
In general, not too much is known about the quality, consistency, and comparability of the various subject items in administrative records. A priori expectations vary between systems and particular characteristics. For example, income data from IRS records, age (date of birth) from birth records, or earning information from the Summary Earnings Record would be expected to be most accurate—in fact they represent standards against which we evaluate accuracy of reporting of these items in other systems. Addresses of persons receiving benefits also falls into this category, except, as noted earlier, mailing addresses do not necessarily reflect addresses of residence. Much research and evaluation would be required to fully understand the quality, including consistency and comparability, of other information in this federal record system.
STATE FILES
A summary of the content status of various state administrative record systems is provided in Table J.2. Twelve systems are summarized ranging from the broad (perhaps comprehensive) coverage of state income tax files and driver's license records to the more limited universe of birth records and probation and parolee files. Not all states have record systems covering the types of programs indicated, and a large number of program agencies failed to respond to this survey. In general, state programs and their files are much more limited than the federal system in the percentage of the population covered.
In terms of content, census-type information (long form) is available on only a very limited basis and in many cases only partially reported. Name, age, sex, and address (mail or residence) are almost universally available. Most of the other items are infrequently included, although income is reported most of the time on five record systems (income tax, AFDC, food stamps, unemployment insurance, and worker's compensation), but little is known on available detail or comparability with census data.
As stated, although little is known specifically about the quality (loosely defined) of the individual record items, the Census Bureau survey did attempt to elicit information from file managers on what is known about the quality of its data. Table J.3 provides an accuracy assessment of the state record systems and summarizes survey responses to a series of questions designed to inform on quality aspects of the files. The table shows how many program managers answered "yes" to such questions as to whether studies were carried out relative to record and file accuracy, comparability, and other type studies. The survey did not ask for the results of the studies, but a "yes" response presumes that such information should be forthcoming from the originating agencies (see Figure J.1).
REFERENCES
Gates, G.W., and H.L. Palacios 1993 ARIS: an administrative records information resource for statisticians. Pp. 189-193 in 1993 Proceedings of the Government Statistics Section. Alexandria, Va.: American Statistical Association.
Stater, D. 1992 Geographic Coding Research—Types of Addresses on Income Tax Returns. Memorandum to J. Knott dated February 6. Population Division, Bureau of the Census, U.S. Department of Commerce, Washington, D.C.
1993 Geographic Coding of Administrative Records—Past Experience and Current Research. Technical Working Paper No. 2. Population Division, Bureau of the Census. Washington, D.C.: U.S. Department of Commerce.
Schneider, P.J. 1992 Year 2000 Census Research Administrative Records Geographic Coding Research. Memoranda to S. Miskura dated June 8, June 15, and June 29. Population Division, Bureau of the Census, U.S. Department of Commerce, Washington, D.C.
TABLE J.1 Content of Selected Federal Administrative Records (all files except the decennial census contain Social Security numbers)
|
|
Social Security Files |
|||
Census Subjects (population only) |
IRS (including information documents) |
Master Beneficiary Record |
Numident File |
Summary Earning Record |
HCFA Health Insurance Master Record |
Name |
x |
x |
x |
x |
x |
Address |
x |
x |
— |
— |
x |
Relationship and/or household composition |
(partial) |
— |
— |
— |
— |
Sex |
— |
x |
x |
x |
x |
Race (15 categories) |
— |
w, b, other |
(2) |
(2) |
w, b, other |
Age |
(1) (primary taxpayer) |
x |
x |
— |
x |
Marital status |
x |
x |
— |
— |
— |
Spanish (4 categories) |
— |
Surname in 5 states |
(2) |
Spanish |
— |
State or country of birth |
— |
— |
x |
— |
— |
Citizenship |
— |
— |
— |
— |
— |
Year of immigration |
— |
— |
— |
— |
— |
School enrollment |
— |
— |
— |
— |
— |
Level of education |
— |
— |
— |
— |
— |
Ancestry/ethnic origin |
— |
— |
— |
— |
— |
Place of residence (5 years ago) |
— |
— |
— |
— |
— |
Language spoken at home |
— |
— |
— |
— |
— |
Ability to speak English |
— |
— |
— |
— |
— |
Military service/veteran status |
— |
— |
— |
— |
— |
Disability |
(1) |
x (if receiving disability benefits) |
x (if receiving disability benefits) |
x (if disabled) |
x (if disabled before age 65) |
TABLE J.2 Content of Twelve Major Administrative Record Systems Maintained by 52 Jurisdictions (50 states, the District of Columbia, and Puerto Rico)
TABLE J.3 Accuracy Assessment of 12 Major State Record Systems, 1992