Appendix G

2000 Census Basic (Complete-Count) Data Processing

The 2000 census, like every census since 1940, asked some basic items of every household and person in the nation, with other items asked only of a sample. In 2000, these basic data items were:

  • age and date of birth,

  • sex,

  • race,

  • ethnicity (Hispanic origin),

  • relationship to reference person (first person listed on the questionnaire), and

  • housing tenure (own or rent).

In addition, name (first, last, and middle initial) was captured as a basic item in 2000. (Vacant units were also included in the census and a few items collected for them.) The basic census items make up the short-form questionnaire and are included on the long-form



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 455
The 2000 Census: Counting Under Adversity Appendix G 2000 Census Basic (Complete-Count) Data Processing The 2000 census, like every census since 1940, asked some basic items of every household and person in the nation, with other items asked only of a sample. In 2000, these basic data items were: age and date of birth, sex, race, ethnicity (Hispanic origin), relationship to reference person (first person listed on the questionnaire), and housing tenure (own or rent). In addition, name (first, last, and middle initial) was captured as a basic item in 2000. (Vacant units were also included in the census and a few items collected for them.) The basic census items make up the short-form questionnaire and are included on the long-form

OCR for page 455
The 2000 Census: Counting Under Adversity questionnaire administered to a fraction of households and residents of group quarters. The basic items are also called 100 percent or complete-count items. This appendix describes the processing of the basic data items for households and persons, focusing on procedures to supply values for missing items (see Titan Corporation, 2003, Sheppard, 2003, and Alberti, 2003, respectively, for detailed information on the steps in data capture, coverage edit and telephone follow-up, and data processing). The Census Bureau distinguishes between “imputation” (or “allocation” in the Bureau’s terminology), in which information from another person or household is used to supply a missing value, and “assignment,” in which a value is assigned on the basis of other information for the same person (e.g., supplying a value for sex on the basis of first name). There is also “editing,” in which inconsistent reported values are reconciled. Sometimes an inconsistent value is deleted and imputation or assignment is used to supply a value for the item. Finally, there is whole-household imputation (or “substitution”), in which an entire household is duplicated for another enumerated household that lacks sufficient information to be termed “data-defined.” For the complete count, a household is data-defined if at least one member has reported values for at least two basic items (including name). G.1 DATA CAPTURE AND COVERAGE EDIT For the 2000 census, data capture of information on questionnaires was performed by optical mark and optical character recognition (OMR/OCR) after the questionnaires were scanned into computer files. Names were captured, along with write-in and checked values. Clerks keyed data items from images when the automated technology could not read the responses. Keying of long-form-sample information was set aside in the processing to permit the fastest possible keying of the basic information, which was captured from both short-form and long-form questionnaires to obtain complete-count records. After data capture, computer routines reviewed the complete-count records for mail returns (including the small number of Internet and Be Counted returns) to identify cases for telephone follow-up. The workload for the coverage edit follow-up operation totaled

OCR for page 455
The 2000 Census: Counting Under Adversity 2.5 million cases, or about 3 percent of mail return forms (Sheppard, 2003:vii). These cases included mail returns that reported household counts of seven people or more (55 percent of the workload); mail returns that reported more household members in question one (“How many people were living or staying in this house, apartment, or mobile home on April 1, 2000?”) than the number of members for which at least two basic items were provided (27 percent of the caseload); and mail returns that reported fewer household members in question one than the number for which at least two basic items were provided (18 percent of the caseload). The purpose of the edit and telephone follow-up was to obtain basic items for all members of large households and to resolve discrepancies in the household count for other households. However, there was no attempt to obtain responses for those members of a contacted household for whom some but not all items were filled out on the questionnaire; the only data collected were for additional people identified in the follow-up and people for whom there had not been room on the form to provide basic data. The telephone effort was successful in obtaining information from only 54 percent of the workload. There was no field follow-up when the telephone follow-up was unsuccessful nor for cases that lacked telephone numbers. As a result of the telephone follow-up, 153 thousand people were added to the census count and 258 thousand were deleted because they duplicated another person or for another reason should not have been included in the count (Sheppard, 2003:viii). Following data capture and coverage edit follow-up, census complete-count records for households and their members could fall into one of three categories—categories A, B, and C, as shown in Box 4.2 in Chapter 4. Records for group quarters residents could fall into categories A or B, but not C. G.2 ITEM IMPUTATION AND EDITING G.2.a Imputation Methodology Census records that meet the criteria for being data-defined (at least two reported items, counting name as an item for complete-count records) can still contain missing responses and responses that

OCR for page 455
The 2000 Census: Counting Under Adversity are inconsistent with other reported data. The Census Bureau first attempts to provide values for missing data and reconcile inconsistent responses by edit and assignment processes that use other information for the same person. Often, however, there is no other information on which to base a reasonable assignment. In these cases, the Bureau uses hot-deck imputation, which supplies values for missing or irreconcilable responses from reported information from a neighboring household (sometimes the imputation can use reported information from other members of the same household). The imputation process begins with a “cold-deck” value; then, in a procedure first used in 1990, passes through a “warm deck;” and then uses a “hot deck” (see Stiller and Dalzell, 2003). As a matter of history, the term “cold deck” derives from early data processing technology in which a set of punched cards (deck) contained numeric values to represent a known distribution of answers to a question in a previous census or survey. For example, using a long-form item as an example, a cold deck might contain a random sequence of values for veteran status such that a certain percentage of people who did not report veteran status would have the value for “served in the Armed Forces” assigned to them. A collection of such decks, with specific distributions for, say, men and women of different ages, would form an imputation matrix. The values from the appropriate cell of the cold deck matrix would be assigned sequentially to nonreporters and would be reused as many times as necessary without change. Cold decks were first used to impute age in the 1940 census. In the 1960 census computerized routines employed both cold-deck and hot-deck imputation routines. The cold-deck process did not vary the distribution of values that was used for imputation. The hot-deck process, in contrast, continually updated the values in the distribution from the census data themselves—imputation for a missing entry was made from the latest stored value that fit other known characteristics of the person or housing unit. Because the census records were stored in a geographic hierarchy (block, census tract, county, state), the hot-deck method generally ensured that a donor record would be in the same small neighborhood as the record requiring imputation. It also reproduced the variability in the reported data. Over the decades, hot-deck imputation matrixes (also edit matrixes) have been refined and enhanced. The matrixes are much

OCR for page 455
The 2000 Census: Counting Under Adversity more complex for long-form than for short-form items because of the availability of so many additional variables to work with on the long form. Some hot-deck matrixes apply to specific items; other matrixes jointly impute values for groups of items. Subject specialists in the Population Division and the Housing and Household Economic Statistics Division specify the edit and imputation matrixes that apply to their area of expertise. Unfortunately, while written specifications for edits and imputations are available, there is no documentation that makes the specifications readily interpretable. In 2000, the hot-deck process operated separately for each item or group of items for housing units, household members, and group quarters residents. First, programmers implemented a cold-deck matrix of starting values, chosen to be the most likely distribution of valid responses for the item in question. Then, within each state, the records with valid reported values for an item were processed to find the first four valid values for each cell of the matrix, replacing the cold-deck values (in 1990, each cell of an imputation matrix could contain up to 8 values). This procedure was called “warming the deck.” The reason to process the entire state was that some cells of a matrix could contain very few cases. Then all of the records in each state were run again. If the first record required imputation, then a value would be obtained from the first of the four values in the appropriate cell of the matrix. If the second through fifth records also required imputation, then the second, third, and fourth values, followed by the first value again, would be used. If the sixth record had a reported value, that value would be entered into the matrix and the value most recently used as a donor would be discarded from the matrix, and so on. There was no limit on how often a donor record could be used to provide a value for imputation. A simple illustration is given in Box G.1, using marital status (a short-form item in 1990, moved to the long form in 2000; see also the diagrams in Stiller and Dalzell, 2003). Very similar procedures were followed in 1990. G.2.b Example of Edit and Imputation Specifications: Housing Tenure The edit and imputation specifications for housing tenure in 2000 (U.S. Census Bureau, 2002a) illustrate both the simplicity and com-

OCR for page 455
The 2000 Census: Counting Under Adversity Box G.1 Simple Illustration of 2000 Census Hot Deck Imputation Process for a Single Cell of an Imputation Matrix “Cold Deck” cell values (e.g., for imputing married = 1 or not married = 2 to women of a specified age) 1, 2, 1, 1 “Warm Deck” cell values (reported value and record number) 1 (rec. 3), 1 (rec. 4), 2 (rec. 9), 1 (rec. 10) Record No. Missing? Value Assigned 1 Yes 1 (rec. 3) 2 Yes 1 (rec. 4) 3 No Keep reported value (1) 4 No Keep reported value (1) 5 Yes 2 (rec. 9) 6 Yes 1 (rec. 10) 7 Yes 1 (rec. 3) 8 Yes 1 (rec. 4) 9 No Keep reported value (2) 10 No Keep reported value (1) 11 No Keep reported value (2) “Hot Deck” cell values 2 (rec. 11), 2 (rec. 9), 1 (rec. 10), 1 (rec. 3) 12 No Keep reported value (1) “Hot Deck” cell values 1 (rec. 12), 2 (rec. 11), 2 (rec. 9), 1 (rec. 10) 13 Yes 1 (rec. 12) 14 Yes 2 (rec. 11) 15 Yes 2 (rec. 9) 16 No Keep reported value (1) “Hot Deck” cell values 1 (rec. 16), 1 (rec. 10), 2 (rec. 12), 2 (rec. 11) The resulting distribution at this point in the process would contain 11 values of “1” and 5 values of “2” for women in the specified age range.

OCR for page 455
The 2000 Census: Counting Under Adversity plexity of the process, depending on the extent of available information. Because housing tenure was the only short-form housing item in 2000, the imputation specifications for short-form records were simple: accept a reported value for tenure from an occupied housing unit; fill in a missing value by using the reported value for the preceding household that falls in the same cell as the nonreporting household. The imputation matrix included 5 cells: household size 1 person; household size 2 people (household with spouse present, other household); household size 3 or more people (household with spouse present, other household). There were no consistency edits for housing tenure on short forms because there were no other housing variables (e.g., reported rent) to employ. In contrast, the edit and imputation specifications for housing tenure on long-form records were complex because additional relevant information was available. Variables used included not only household size and type, but also property value, monthly rent amount, whether there is a mortgage, amount of monthly mortgage payment, whether there is a second mortgage or home equity loan, amount of monthly second mortgage payment, type of building (e.g., mobile home, detached one-family, apartment building with 50 or more units), and whether there is a mobile home installment loan. The edit and imputation matrix had over 50 cells. Some cells specified edits for reported values—for example, change a reported value for tenure of 3 (rented for cash rent) to 1 (owned with a mortgage) when monthly rent is blank and a mortgage value is reported. Some cells specified edits for blank values—for example, set tenure to 1 (owned with a mortgage) when tenure is blank but there is a reported value for a first or second mortgage. Other cells specified hot-deck imputations when tenure is blank and other variables (e.g., mortgage) are blank, too, or when tenure is blank and other reported values may be contradictory. The imputation cells were the same as for the short-form records (5 cells based on household size and type).

OCR for page 455
The 2000 Census: Counting Under Adversity Type of return (mail, enumerator) was not used as an imputation cell for housing tenure or any other basic (or long-form-sample) item. This omission may have biased the distributions of values if enumerator (mail) returns with missing values were more like other enumerator (mail) returns than total returns. G.3 PERSON IMPUTATION Whole-person or type 1 imputation (see Box 4.2 in Chapter 4) refers to instances when one or more people in a household, but not all household members, are not data-defined. In 2000, a large fraction of whole-person imputations (called “totally allocated people” by the Census Bureau) resulted from the decision to limit space on the mailback questionnaire for recording basic items to only 6 members. The coverage edit and telephone follow-up operation attempted to obtain characteristics for additional household members but was not always successful (see Section G.1 above). Editing and imputation were employed to construct values for the basic items for non-data-defined people in enumerated households. The items were edited and imputed one at a time by making use of information about the other household members in order to construct a household that made sense in terms of the relationships, ages, sex, race, and ethnicity of all of the household members. In 2000, there were 2.33 million whole-person imputations for household members, or 0.9 percent of the household population (Schindler, 2001). By contrast, in 1990, there were only 373,000 such imputations, or 0.2 percent of the total population (including some imputations for people in group quarters; Love and Dalzell, 2001). In 1990, the questionnaire had room to report characteristics of 7 household members, and field follow-up was used in addition to telephone follow-up to obtain data for members of enumerated households who lacked basic information. In 1980, the number of type 1 imputations was lower yet: about 152,000 persons, or less than 0.1 percent of the total population (including some imputations for people in group quarters; calculated from Love and Dalzell, 2001). Whole-person imputations in 2000 were most likely to involve children. Such imputations accounted for 1.9 percent of children in households, compared with 0.6 percent of people aged 18–29, and 0.4 percent of older people. By race/ethnicity domain and hous-

OCR for page 455
The 2000 Census: Counting Under Adversity ing tenure, whole-person imputations were most common among the following groups, accounting for 2.1 percent to 2.3 percent of each: American Indian and Alaska Native owners and renters on reservations, black owners and renters, and Native Hawaiian and Pacific Islander owners. Whole-person imputations were least common among white and other owners and renters, accounting for 0.6 percent and 0.4 percent of these two groups, respectively.1 One question about the success of the imputation methodology is whether it reproduced family living patterns appropriately for different groups. For example, large multigenerational Asian families may have listed elderly parents rather than children last on the questionnaire and therefore not have reported characteristics for them. Table G.1 shows the distribution of whole-person imputations by domain/tenure group for four age categories: 0–17, 18–29, 30–49, 50 and older, and the ratio of whole-person imputations for children under age 18 to those for adults aged 50 and older. These ratios are lower for renters than owners in all race/ethnicity domains, indicating a greater propensity to impute people of older ages in large renter households than in large owner households. The lowest ratios are for black and Native Hawaiian and Pacific Islander renters. It is difficult to know what to make of these patterns without information on the age distributions of large households with characteristics reported for all members versus those lacking data for some members. Data are available at the census tract level that could be analyzed to compare the age distribution by domain and tenure of data-defined people with the age distribution for whole-person imputations. However, these data do not permit direct analysis of households that had whole-person imputations versus comparable households that did not. G.4 HOUSEHOLD IMPUTATION Four types of situations can occur in the census that require whole-household imputation (what the Census Bureau terms “substitution”) because nothing is known about the basic characteristics 1   These and other characteristics of whole-person imputations were obtained from tabulations by panel staff of U.S. Census Bureau, File of Census Imputations by Postratum, provided to the panel July 30, 2002 (Schindler, 2001).

OCR for page 455
The 2000 Census: Counting Under Adversity Table G.1 Percent Whole-Person Imputations (Type 1) by Age and Domain/Tenure Category, Household Members, 2000 (Percent)   Age in Years (Percent Whole-Person Imputations of Category) Ratio, 0–17 Percent to 50 and Older Percent Domain/Tenure Category 0–17 18–29 30–49 50 and Older American Indian/Alaska Native on Reservation   Owner 3.8 2.1 0.7 0.5 7.6 Renter 3.3 1.6 0.6 0.8 4.1 American Indian/Alaska Native off Reservation   Owner 1.8 1.1 0.4 0.4 4.5 Renter 2.3 0.9 0.5 0.7 3.3 Black, non-Hispanic   Owner 4.1 2.6 0.9 1.0 4.1 Renter 3.6 1.7 1.0 1.5 2.4 Hispanic   Owner 2.3 1.7 0.6 0.5 4.6 Renter 2.3 1.1 0.6 0.8 2.9 Native Hawaiian and Other Pacific Islander   Owner 4.0 2.7 0.9 1.1 3.6 Renter 3.2 1.4 0.8 1.4 2.3 Asian, non-Hispanic   Owner 3.2 2.1 0.6 0.8 4.0 Renter 3.2 1.1 0.6 0.9 3.6 White and Other Race, non-Hispanic   Owner 1.1 0.6 0.2 0.2 5.5 Renter 1.5 0.5 0.3 0.4 3.8 Total   Owner 1.6 1.1 0.3 0.3 5.3 Renter 2.3 0.9 0.5 0.6 3.8 NOTE: Domain/tenure categories are those defined for the 2000 A.C.E. (see Table E.3). SOURCE: Tabulations by panel staff of U.S. Census Bureau, File of Census Imputations by Poststratum, provided to the panel July 30, 2001 (Schindler, 2001).

OCR for page 455
The 2000 Census: Counting Under Adversity of any of the household members (see Box 4.2 in Chapter 4). They are: household size (number of persons) is known for an occupied unit, but the characteristics of the household members are not known (characteristics or type 2 imputation); a housing unit is known to be occupied, but household size is not known (count or type 3 imputation); a housing unit is known to exist, but its status as occupied or vacant is not known (occupancy or type 4 imputation); an address is recorded, but its housing unit status (occupied, vacant, or not a housing unit) is not known (housing status or type 5 imputation). These situations can occur in field follow-up when repeated interview attempts are not successful in finding a respondent at home or in obtaining adequate information from a landlord or neighbor. These situations can also result from processing problems that result in lost or corrupted data. The procedure used in 2000 (and in 1990) for type 2 imputations was to duplicate the basic information from another housing unit record in the nearby area that had the same household size. For type 3 imputations, the imputation process first categorized them as units at single-unit or multiunit addresses. Then, household size and basic items were imputed from an occupied unit at a single-unit or multiunit address with a reported population count from an enumerator-completed form. (In 1990, mail returns were also included in the donor pool.) A similar process was followed for type 4 imputations, for which occupancy status, and, if need be, occupied household size and basic items had to be imputed (the donor pool consisted of occupied and vacant units from enumerator-completed forms). The same type of process was also followed for type 5 imputations, for which housing status had to be imputed first, followed by occupancy status, and, for occupied housing units, their size and characteristics (the donor pool consisted of occupied, vacant, and deleted units from enumerator-completed forms). A potential donor record could only be used once and, in general, was selected from the same census tract as the unit requiring imputation (see Griffin, 2001).

OCR for page 455
The 2000 Census: Counting Under Adversity Table 4.1 in Chapter 4 provides statistics on the numbers of people for whom 2000 census records were imputed in each of imputation types (2) through (5), with corresponding statistics for 1990 and 1980, when available. Table G.2 shows the distribution of people in whole-household imputation situations in 2000 among race/ethnicity and housing tenure groups by type of imputation required. It also shows the total percentage of people requiring whole-household imputation for each group. Several patterns stand out. Renters have twice as high a whole-household imputation rate as owners—2 percent compared with 0.9 percent. Most of the difference is due to a higher proportion of type 2 imputations (when household size is known but not characteristics of members)—72 percent of all whole-household imputations are type 2 for renters compared with 60 percent for owners. American Indians and Alaska Natives on reservations have the highest whole-household imputation rate of any race/ethnic group—3 percent. This result is largely due to very high proportions of imputations for status as a housing unit (type 5). Thus, 61 percent and 55 percent, respectively, of whole-household imputations for American Indian owners on reservations and American Indian renters on reservations are type 5 imputations, compared to 12 percent type 5 imputations among whole-household imputations for the nation as a whole. Analysis of the geographic distribution of whole-household imputations finds wide variations in the geographic location of most imputation types. Type 2 imputations (count is known but not characteristics) are clustered in large cities, such as New York and Chicago. However, they are not prominent in other large cities, notably Los Angeles. Type 3 imputations (size is not known for an occupied housing unit) follow the same general pattern as type 2 imputations. Type 4 imputations are the least common and do not show a particular geographic pattern. Type 5 imputations, which are the least well-founded, are clustered in rural areas that were enumerated by list/enumerate techniques in which the enumerator developed the address list and obtained responses to the census questions at the same time.

OCR for page 455
The 2000 Census: Counting Under Adversity Table G.2 Distribution of People Requiring Whole-Household Imputation by Type of Imputation, by Race/Ethnicity Domain and Housing Tenure, 2000 Census   Percent of People Requiring Imputation, by Type Percent of Total Household Population Requiring Imputation Domain and Tenure Group Characteristics (2) Count (3) Occupancy (4) Housing (5) Types 2–5 Types 3–5 American Indian/ Alaska Native on Reservation   Owner 15.7 20.5 3.2 60.6 3.0 2.5 Renter 17.8 25.4 2.2 54.6 2.8 2.3 American Indian/ Alaska Native off Reservation   Owner 56.5 15.1 7.3 21.1 1.4 0.6 Renter 68.1 13.1 6.1 12.7 1.7 0.5 Hispanic Origin   Owner 59.2 18.2 5.8 16.8 1.5 0.6 Renter 70.1 15.7 5.1 9.2 1.9 0.6 Black (Non-Hispanic)   Owner 69.2 16.6 6.0 8.2 1.6 0.5 Renter 74.4 15.8 4.9 4.9 2.5 0.6 Native Hawaiian and Other Pacific Islander   Owner 65.2 16.6 3.9 14.3 1.4 0.5 Renter 71.6 14.9 3.5 10.0 2.0 0.6 Asian (Non-Hispanic)   Owner 67.5 13.6 6.3 12.6 0.8 0.3 Renter 75.7 13.3 4.9 6.2 2.0 0.5 White and Other Races (Non-Hispanic)   Owner 58.8 13.6 10.8 16.9 0.8 0.3 Renter 71.2 12.5 7.0 9.4 1.7 0.5 Total   Owner 60.4 14.7 9.1 15.7 0.9 0.3 Renter 71.9 14.1 5.9 8.2 2.0 0.6 Grand Total 66.0 14.4 7.6 12.1 1.3 0.5 NOTES: See Box 4.2 in Chapter 4 for definition of imputation types; type 1 imputation is not included because it involves imputation for one or more people in a household with at least one data-defined person. SOURCE: Tabulations by panel staff of U.S. Census Bureau, File of Census Imputations by Poststratum, provided to the panel July 30, 2001 (Schindler, 2001).

OCR for page 455
The 2000 Census: Counting Under Adversity This page intentionally left blank.