CHAPTER 7
Assessment of Basic and Long-Form-Sample Data

THE CONTENT OF THE 2000 CENSUS, as in past censuses, included basic demographic items plus a wide range of social and economic characteristics. The basic items (or complete-count items) were asked of everyone, whether they received the short or the long form; the additional items (or sample items) were asked of people selected for the long-form sample of about one-sixth of the population (see Appendix B for the list of items). The demographic items have widespread use, particularly as they form the basis of small-area population estimates that the Census Bureau develops for years following each census (see Section 2-C). The additional long-form-sample items on such topics as income, employment, education, occupation and industry, transportation to work, disabilities, housing costs, and others are used extensively by federal, state, and local government agencies, the private sector, academic researchers, the media, and the public (see Section 2-D).

Users need to understand the quality of the basic and the sample data to interpret census results appropriately. The Census Bureau needs to understand data quality to determine how best to improve census processes to produce high-quality information and how to inform users about its strengths and weaknesses. Past censuses provided a rich array of basic and long-form-sample data quality



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 269
The 2000 Census: Counting Under Adversity CHAPTER 7 Assessment of Basic and Long-Form-Sample Data THE CONTENT OF THE 2000 CENSUS, as in past censuses, included basic demographic items plus a wide range of social and economic characteristics. The basic items (or complete-count items) were asked of everyone, whether they received the short or the long form; the additional items (or sample items) were asked of people selected for the long-form sample of about one-sixth of the population (see Appendix B for the list of items). The demographic items have widespread use, particularly as they form the basis of small-area population estimates that the Census Bureau develops for years following each census (see Section 2-C). The additional long-form-sample items on such topics as income, employment, education, occupation and industry, transportation to work, disabilities, housing costs, and others are used extensively by federal, state, and local government agencies, the private sector, academic researchers, the media, and the public (see Section 2-D). Users need to understand the quality of the basic and the sample data to interpret census results appropriately. The Census Bureau needs to understand data quality to determine how best to improve census processes to produce high-quality information and how to inform users about its strengths and weaknesses. Past censuses provided a rich array of basic and long-form-sample data quality

OCR for page 269
The 2000 Census: Counting Under Adversity measures from studies of nonresponse, exact matches with surveys and administrative records, content reinterviews with samples of respondents, and experiments to determine response effects of alternative questionnaire formats and wording (see, e.g., Bureau of the Census, 1964, 1970, 1975a,b, 1982a,b, 1983a,b, 1984). To date, data quality measures are somewhat sparser for the 2000 census. The panel requested and received detailed tabulations of basic and long-form-sample item imputation rates for the 2000 and 1990 censuses and more limited information on item nonresponse in the Census 2000 Supplementary Survey (C2SS).1 (An analysis commissioned by the Census Bureau used these tabulations; see Schneider, 2003.) The panel also compared the consistency of basic characteristics for people in the census-based E-sample who matched cases in the independent P-sample of the 2000 Accuracy and Coverage Evaluation (A.C.E.) Program.2 A Content Reinterview Survey, conducted primarily by telephone in June–November 2000 of 20,000 long-form recipient households, provided indexes of inconsistency between census and survey responses for most questionnaire items for one randomly chosen member of each household. Unlike previous censuses, the 2000 Content Reinterview Survey did not try to measure systematic response biases by including probing questions to determine the most accurate response (see Singer and Ennis, 2003). A set of questionnaire experiments in 2000 examined forms design, listing of household members, and race and ethnicity questions (see Martin et al., 2003). At a later date, information on response variance and bias will be available from an exact match of long-form census records and the April 2000 Current Population Survey. In this chapter, we briefly discuss the usefulness of three types of available 2000 census data quality measures: imputation rates, consistency measures, and variability and sample loss for the long-form sample (7-A). We then review available data quality measures 1   The C2SS surveyed 700,000 households, or 1.8 million people, by mail with computer-assisted telephone and in-person follow-up; it is a precursor to the planned American Community Survey (see Appendix I.3). 2   The 2000 P-sample surveyed 0.3 million households, or 0.7 million people, in about 11,000 block clusters, using computer-assisted telephone and in-person interviewing; the E-sample contained about the same number of households and people as the P-sample, drawn from the 2000 census records in the same block clusters (see Chapters 5 and 6).

OCR for page 269
The 2000 Census: Counting Under Adversity for three population groups: all household members (7-B); household members in the long-form sample (7-C); and group quarters residents in the long-form sample (7-D). Each part concludes with a summary of findings and recommendations for 2010. Appendixes G and H describe the imputation and other data processing procedures that affect the basic and long-form-sample items, respectively. Appendix F reviews alternative item imputation procedures that may be more accurate than the current “hot-deck” procedures. 7–A AVAILABLE QUALITY MEASURES 7–A.1 Imputation Rates The census enumeration will always have nonresponse: some households may not want to be found or are overlooked; respondents for participating households may not answer every question because they do not want to answer a particular item or do not know the answer; some responses are unintelligible and are voided in data processing. Since 1960, data processing for the complete-count census has used computer-based imputation for whole-household and item nonresponse (see Appendix C.5.d).3 The long-form sample, like other household surveys, has used imputation for missing items; it accounts for household nonresponse by weighting respondent cases. Imputation makes census data more useful because analysts do not have to discard cases with missing values.4 Imputation by the Census Bureau is also more efficient and facilitates consistency in uses of the data than if each analyst were to develop his or her own imputation procedure. However, imputation is a source of error. Because imputation commonly uses reported values, the distribution of values after imputation will be inaccurate to the extent that cases requiring imputation differ from cases for which there are responses, in ways that are not or cannot be made part of the imputation procedure. Furthermore, the relationships of two or more variables may be distorted 3   By “complete-count census,” we mean the 100 percent enumeration, including short forms and the basic-item responses on long forms. 4   One long-form-sample item first asked in 1980—namely, ancestry—is not imputed; instead, a “not reported” category is tabulated, as was common practice for census items prior to 1960.

OCR for page 269
The 2000 Census: Counting Under Adversity if imputation levels are high and imputation techniques do not take account of these relationships. Consequently, except when cases with reported and missing values are similar in their characteristics or auxiliary information is available with which to improve the accuracy of the imputations, higher missing data rates will indicate poorer data quality. For the 2000 census and the C2SS, codes on the data records distinguish imputations based on the use of another person’s or household’s information (“allocations”) from “assignments” based on known information for the specific record. For example, first name was used to assign values to a large fraction of the records with missing sex;5 answers to the race question were used to assign values to some cases with missing Hispanic origin; and answers to questions on housing costs were used to assign values to a large fraction of long-form cases missing housing tenure. Codes on the 1990 census records did not distinguish between imputations and assignments, so our tables for the 2000 census sometimes show both types of rates—the imputation/assignment rate is comparable to 1990 and indicates nonresponse;6 the imputation rate per se indicates the fraction of cases that required a donor record to supply missing values. 7–A.2 Consistency Measures Comparing the consistency of responses to the same question in two or more data sources can help identify possible reporting biases, although it is often not possible to say which source is more accurate. Consistency measures can also indicate response variability if responses tend to differ according to such factors as data collection mode, question format, and who answered for the household. 7–A.3 Sample Loss and Variability (Long Form) Estimates from the long-form sample, like other surveys, are subject to variability from sampling and also from unit nonresponse 5   This type of assignment was not possible in the 1990 census, which did not capture names except for records in the PES E-sample. 6   The imputation/assignment rate is not exactly the same as a nonresponse rate because reported values that are inconsistent with other reported values may be blanked and another value imputed or assigned.

OCR for page 269
The 2000 Census: Counting Under Adversity that further reduces the effective sample size. The long-form records for respondents are weighted to agree with complete-count totals (see Appendix H.2). This weighting effectively adjusts for sampling rates, instances of whole-household nonresponse, and additional sample loss due to some households having provided only minimal data. The 1990 and 2000 long-form sampling probabilities varied, by design, from 1 in 2 households to 1 in 8 households, depending on the population size and type of geographic area. To receive a nonzero weight, households had to include at least one member who reported at least two basic items and two long-form items. The measures of variability, or variance, that are constructed for the long-form sample take account of the weighting (see Appendix H.2), but not of the variability introduced by item imputation. 7–B QUALITY OF BASIC DEMOGRAPHIC CHARACTERISTICS The basic demographic characteristics in 2000, asked on both the short and long forms, were age; month, day, and year of birth; sex; ethnicity (Hispanic origin); race; relationship to household reference person (first person listed on the questionnaire); and housing tenure (own or rent). The 1990 census included additional basic items (see Appendix B). 7–B.1 Imputation Rates for Complete-Count Basic Items Table 7.1 provides imputation rates for the basic demographic items from the 2000 and 1990 census complete counts (separately for short and long forms) for total household members, people in households headed by blacks, and people in households headed by Hispanics.7 These rates include cases of whole-household imputation in addition to individual item imputation. In 2000 the combined whole-household and item imputation rates for all household members ranged from 2.3 percent for sex to 5.4 percent for ethnicity (Hispanic origin). The rates for long-form recipients were higher, ranging from 3 percent for sex to 9.3 percent for housing tenure. The rates for members of minority households 7   Complete-count data were made available in the P.L. 94-171 file for redistricting and in Summary Files 1 and 2 (see Box 2.1).

OCR for page 269
The 2000 Census: Counting Under Adversity Table 7.1 Basic Item Imputation Rates, 2000 and 1990 Complete-Count Census, by Type of Form and Race/Ethnicity, Household Population Population Group and Form Type Agea Sex Race Ethnicity (Hispanic Origin) Relation-ship to Head Tenure 2000 Census Total Household Population 4.8 2.3 5.1 5.4 3.4 4.8 Short-Form Recipients 4.6 2.1 5.1 5.3 3.2 3.9 Long-Form Recipients 6.0 3.0 5.5 6.1 4.4 9.3 Black Non-Hispanic 7.6 3.9 4.9 10.5 5.6 7.3 Short-Form Recipients 7.1 3.5 4.6 10.3 5.2 6.0 Long-Form Recipients 10.2 5.8 7.0 11.6 7.9 14.5 Hispanic 7.5 4.2 17.2 6.5 6.3 5.9 Short-Form Recipients 7.4 4.1 17.4 6.4 6.0 5.2 Long-Form Recipients 8.4 4.9 15.9 7.6 7.5 10.3 1990 Census Total Household Population 3.1 1.9 2.6 10.5 3.3 3.1 Short-Form Recipients 3.0 1.9 2.8 11.7 3.4 3.1 Long-Form Recipients 3.5 1.7 2.1 4.1 2.9 2.7 Black Non-Hispanic 5.6 3.6 3.5 18.4 6.1 5.4 Short-Form Recipients 5.4 3.6 3.5 20.3 6.2 5.6 Long-Form Recipients 6.6 3.1 3.1 7.8 5.5 4.2 Hispanic 3.9 2.6 9.3 7.8 5.9 4.3 Short-Form Recipients 3.8 2.7 10.0 8.3 6.1 4.5 Long-Form Recipients 4.6 2.4 5.5 4.8 5.0 3.1 NOTES: Rates include whole-household imputations (types 2–5; see Box 4.2); wholly imputed persons (type 1); and item imputations. Race and ethnicity population groups are defined by the response of the household reference person. Household population totals for 2000:273.6 million total (83.4 percent on short forms, 16.6 percent on long forms); 32.4 million people in households headed by non-Hispanic blacks (84.8 percent on short forms, 15.2 percent on long forms); 33.4 million people in households headed by Hispanics (85.4 percent on short forms, 14.6 percent on long forms). Household population totals for 1990:242 million total (83.1 percent on short forms, 16.9 percent on long forms; 28.0 million people in households headed by non-Hispanic blacks (84.8 percent on short forms, 15.2 percent on long forms); 21.2 million people in households headed by Hispanics (85.4 percent on short forms, 14.6 percent on long forms). a Excludes imputation of age from date of birth. SOURCE: Tabulations by U.S. Census Bureau staff from the 2000 and 1990 Household Census Edited Files (HCEF), provided to the panel spring 2003.

OCR for page 269
The 2000 Census: Counting Under Adversity were also higher. Of particular note are the response patterns to the ethnicity and race questions: while 5 percent of total household members did not respond to these items, 11 percent of members of households headed by blacks did not respond to the ethnicity question, and 17 percent of members of households headed by Hispanics did not respond to the race question, perhaps believing it did not apply. For the 10 percent of census tract neighborhoods with the highest percentages of basic item imputations, the imputation rates were one-half to three times higher than the total population rates shown (data not shown). In 1990 the combined whole-household and item imputation rates for household members were generally below the corresponding 2000 rates (Table 7.1). A reason for generally lower basic item imputation rates in 1990 compared with 2000 was that the 1990 census used telephone and field follow-up for missing or inconsistent data content, but the 2000 census did not. The exception was the ethnicity item on the short form, for which imputation rates in 1990 were twice as high as the 2000 rates (except for Hispanics). A reason for higher short-form imputation rates for ethnicity in 1990 than in 2000 was that the content review and follow-up procedures for mailed-back short forms in 1990 were trimmed back for budgetary reasons, so that only a one-tenth sample of short forms were reviewed and sent to follow-up if necessary (see Appendix C.3.c). Reordering the race and ethnicity items so that ethnicity came before race in 2000 (and not after as in 1990) also contributed to lower item imputation rates for ethnicity in 2000 compared with 1990. In 2000 about 1.3 percentage points of the total household population imputation rates shown in Table 7.1 were due to whole-household imputations (types 2–5—see Box 4.2). For these cases data from neighboring households were used to supply complete records of basic information for members of households for which information on members’ characteristics, and sometimes household size, was missing. Whole-household imputations contributed 1.2 percentage points to the imputation rates for short-form records and 1.8 percentage points to the imputation rates for long-form records. Whole-household imputation rates were highest for enumerator long forms (5.2 percent), followed by enumerator short forms (4.7 percent), with self-response forms by mail, telephone, Internet, or the Be Counted program including very few whole-household

OCR for page 269
The 2000 Census: Counting Under Adversity imputations (less than 0.1 percent).8 In 1990 whole-household imputations contributed 0.7 percentage point to the total household population imputation rates shown. In 2000 about 0.9 percentage point of the total household population imputation rates shown in Table 7.1 was due to wholly imputed persons (type 1—see Box 4.2 and Table 4.1), which occurred when there was not room to report basic characteristics for all household members on the questionnaire. The corresponding figure for 1990 was 0.2 percentage point. Imputations for missing members of enumerated households were made item by item, using information about the other household members to construct a reasonable household composition (e.g., imputing race and ethnicity to be consistent with other household members). Imputation rates for some basic items would have been higher in 2000 than shown in Table 7.1 if the 2000 imputation procedures had not been able to take advantage of names and other information for assigning rather than imputing missing values (see Section 7-A.1). For example, first names were used to assign sex for about 1 percent of household members (data not shown); if this procedure had not been feasible, the imputation rate for sex for total household members in 2000 would have been 3.3 percent, not 2.3 percent as in Table 7.1. Basic item imputation rates from the complete count for the nation as a whole and for large population groups were reasonably low for the most part, but some small geographic areas and population groups required much more imputation, which users should consider in their analyses. For example, imputation rates for race at the county level reached as high 17 percent, while imputation rates for ethnicity at the county level reached as high as 35 percent (see Section 8-C.2; see also Appendix H.3.b). 7–B.2 Missing Data Patterns for Basic Items An analysis by Zajac (2003) provides information on patterns of missing responses in 2000 census records; that is, percentages of person records that are missing one, two, or three or more of the 8   From tabulations by U.S. Census Bureau staff of the 2000 and 1990 Household Census Edited Files, provided to the panel in spring 2003.

OCR for page 269
The 2000 Census: Counting Under Adversity five basic person items. We also computed these statistics for the 2000 census records in the A.C.E. E-sample and for the records in the independent P-sample. Table 7.2 provides these data from all three sources; the E-sample percentages exclude whole-household and whole-person imputations, as well as reinstated cases from the special Master Address File (MAF) unduplication operation, which could not be matched to P-sample cases. More P-sample persons answered all five items—95 percent—than did census persons—87 percent (the corresponding figure for the E-sample—data not shown—is 89 percent). This result is not surprising because the interviewing for the P-sample was more carefully controlled than was the census enumeration. However, the census and the P-sample were much closer in the percentage of respondents who answered at least four items (97.6 percent P-sample, 96.1 percent 2000 census). The most commonly omitted basic items in the census were age and ethnicity (data not shown). Rates of answering all five basic person items in the census varied by whether the household responded for itself or answered to an enumerator (Table 7.2, panel A). Members of self-responding households (mail, Internet, telephone, Be Counted) were more likely to answer all five items (90 percent) than were members of households visited by enumerators (79 percent). By the race/ethnicity and housing composition of the A.C.E. block cluster (Table 7.2, panel B), household members living in white and some other race owner and renter block clusters were most likely to answer all five questions (92 and 89 percent, respectively); household members living in Hispanic renter block clusters were least likely to answer all five questions (77 percent). These data are from the A.C.E. E-sample and underestimate the extent of nonresponse in the census; in contrast, the P-sample achieved a high level of reporting of all five basic person items for all neighborhood types—92 to 95 percent. 7–B.3 Consistency of Responses to Basic Items Comparing census cases in the E-sample that matched P-sample cases revealed low rates of inconsistent reporting of basic items for the household population as a whole. Thus, 4.7 percent of matched cases (unweighted) had conflicting values for housing tenure; 5.1

OCR for page 269
The 2000 Census: Counting Under Adversity Table 7.2 Percentage of Household Members Reporting Basic Items, 2000 Census, 2000 A.C.E. E-Sample and Independent P-Sample (weighted) PANEL A Number of Items Reported Sample All Five Four Three Two or Less Census Totala 87.0 9.1 1.2 2.8 Self Enumerations 89.8 7.9 1.1 1.1 Interviewer Enumerations 79.4 11.9 1.6 7.1 P-Sample Total 94.9 2.7 0.8 1.6 PANEL B Household Members Reporting All Five Items by Neighborhood Typeb E-Samplec P-Sample American Indian and Alaska Native   Owner 91.7 94.9 Renter 85.4 93.2 Hispanic   Owner 80.6 94.5 Renter 77.2 94.1 Black   Owner 85.0 94.5 Renter 81.7 93.8 Native Hawaiian and Other Pacific Islander   Owner 84.7 94.0 Renter 85.4 92.2 Asian   Owner 86.7 93.9 Renter 80.4 93.9 White and Other   Owner 91.6 95.4 Renter 88.9 94.2 a Census percentages reporting two or fewer items include people in wholly imputed households (imputation types 2–5, see Box 4.2)and wholly imputed people (type1); census percentages for interviewer enumerations include all people in households that were contacted in the coverage edit and follow-up operation of mail returns to obtain basic characteristics for missing household members. b Neighborhood (A.C.E. block cluster) type determined by Census Bureau staff from 1990 characteristics (A.C.E. block clusters were defined as one or more contiguous blocks, intended to contain about 30 housing units on average—see Appendix E.1.a). c The E-sample excludes whole-household and whole-person imputations and reinstated records due to the special summer 2000 unduplication operation (see Section 4-E). SOURCE: For 2000 census, Zajac (2003:Table 34), adjusted to include people in wholly-imputed households; for E-sample and P-sample, tabulations by panel staff of P-Sample and E-Sample Dual-System Estimation Output Files (U.S. Census Bureau, 2001b), provided to the panel February 16, 2001, weighted using TESFINWT.

OCR for page 269
The 2000 Census: Counting Under Adversity percent had conflicting values for age and sex group; and 3.9 percent had conflicting values for race/ethnicity domain (Farber, 2001a:Table 1). Reasons for conflicting values could include reporting error in one or both samples, differences in question format and mode and time of collection, different household respondents for the census enumeration and P-sample interview, different imputation methods, errors in imputation, and errors in matching. Rates of inconsistency were higher for matched cases for which the characteristic in question was imputed in one or both samples than for nonimputed cases: thus, 12 percent of cases with imputed race or ethnicity, 22 percent of cases with imputed housing tenure, and 36 percent of cases with imputed age or sex were inconsistent between the E-sample and P-sample, compared with 3, 4, and 3 percent, respectively, of nonimputed cases (Farber, 2001a:Table 1). This result indicates that, at an individual level, imputations were often not accurate, which could be consequential for analyses of public use microdata samples and for small geographic areas. However, because overall imputation rates were low, the effects of imputation error were not large for the household population as a whole. Overall distributions by age and sex group, housing tenure, and race/ethnicity domain remained very similar in both the E-sample and P-sample (Farber, 2001a:Tables A-1, A-2, A-3). When population groups were defined by multiple characteristics, instead of a single variable, then very high rates of inconsistency often occurred, particularly for imputed cases (see Farber, 2001a:Tables E-1 through E-64). As a best-case example, people who were classified in one or both samples as non-Hispanic white owners in medium-sized mailout/mailback areas with high response rates in the Midwest were classified inconsistently in another poststratum 10 percent of the time overall, 6 percent for nonimputed cases, and 41 percent for imputed cases, which accounted for only 11 percent of this group. As a worst-case example, people who were classified in one or both samples as American Indians and Alaska Natives off reservations were classified inconsistently in another poststratum 59 and 57 percent of the time overall for owners and renters, respectively; 55 and 52 percent for nonimputed owner and renter cases, respectively; and 74 and 77 percent for imputed owner and renter cases, respectively. Imputations accounted for 21 percent of each of these two groups (owners and renters).

OCR for page 269
The 2000 Census: Counting Under Adversity Table 7.7 Whole-Household Nonresponse in the 2000 and 1990 Census Long-Form Samples   2000 Long-Form Sample 1990 Long-Form Sample Measure Total Households Households in Worst 10% of Tracts Total Households Households in Worst 10% of Tracts Percent, Long Forms Received of Number Expected 98.5 96.6 97.8 94.0 Percent, Households Retained in Edited Long-Form Sample of Number of Forms Received 93.2 84.3 91.2 78.8 No. Long Forms Expected from Households (millions) 17.9 1.3 15.9 1.1 NOTES: Households not retained in the edited long-form sample include wholly imputed households from the complete-count processing (types 2–5; see Box 4.2) and households in which no person had at least two basic and two long-form items reported (i.e., they were not sample data-defined). Worst 10 percent census tracts were defined as those with the highest rates of basic item imputations. SOURCE: Tabulations by U.S. Census Bureau staff from the 2000 and 1990 Sample Census Edited Files (SCEF), provided to the panel spring 2003. person items for the first person, whereas the 1990 long form asked all of the basic items for each household member, followed by the housing items, followed by the sample person items. It was easier in 2000 to meet the criterion for being “sample-data-defined,” so long as the first person answered the basic items and, say, marital status and education level, which came first among the additional person items. Table 7.8 provides another measure of loss for the 2000 long-form sample; it shows the percentages of non-sample-data-defined persons by race and ethnicity of the household reference person and type of return. For the total long-form population, the rates of non-sample-data-defined persons range from 1.6 percent of white self returns to 25.6 percent of black enumerator returns. In the 10

OCR for page 269
The 2000 Census: Counting Under Adversity Table 7.8 Whole-Person Nonresponse in the 2000 Long-Form Sample, by Race of Reference Person   Non-Hispanic Measure Total Persons Hispanic Black White Percent Non-Sample-Data-Defined Persons of:   Total Persons on Long Forms 8.4 11.0 14.2 7.1 Self Returns 2.1 4.7 3.7 1.6 Enumerator Returns 20.9 18.5 25.6 20.7 Persons on Long Forms in Worst 10% Census Tracts 18.5 15.4 23.6 17.3 Self Returns 4.8 6.8 4.8 2.5 Enumerator Returns 31.5 24.1 39.3 33.9 No. Persons on Long Forms Received (millions)   Total 45.4 4.9 4.9 33.1 Worst 10% Census Tracts 3.7 1.3 1.0 1.1 NOTES: Non-sample-data-defined persons include persons in wholly imputed households and other non-sample-data-defined households (see text), plus wholly imputed persons (type 1) in enumerated households. Wholly imputed persons did receive a sample weight. Worst 10 percent census tracts were defined as those with the highest rates of basic item imputations. SOURCE: Tabulations by U.S. Census Bureau staff from the 2000 and 1990 Sample Census Edited Files (SCEF), provided to the panel spring 2003. percent of census tracts with the highest percentage of basic item imputations, the rates range from 2.5 percent of white self-returns to 39.3 percent of black enumerator returns. For some groups and geographic areas, the levels of sample loss are large, adding to the variability, and, hence, uncertainty, of long-form-sample estimates. Moreover, because the Census Bureau provides variability estimates that do not account for item imputation and, separately, provides item imputation rates based on the people who were retained in the sample, it is easy to overlook the fact that weighting and item imputation are both methods of dealing with sample loss. Ideally, the two kinds of sample loss should be considered together. For example, if income imputation rates are combined with sample loss due to households with no or minimal response, then it could result that the effective sample size for a poverty estimate for the total household population in 2000 would

OCR for page 269
The 2000 Census: Counting Under Adversity be only 60 percent of the original sample size (8 percent sample loss of persons from Table 7.8 plus 30 percent nonresponse to one or more income items from Table 7.5). 7–C.6 Long-Form-Sample Data Quality: Summary of Findings and Recommendations The additional long-form-sample items exhibited higher item imputation rates in 2000 than the basic items. More serious, the long-form-sample data quality, as measured by nonresponse, deteriorated in 2000 compared with 1990. While sample loss was not quite as high as in 1990, imputation rates for many items were considerably higher, reaching levels as great as 32 percent for property taxes and 30 percent for some or all income items (compared with 12 and 13 percent, respectively, for these two items in 1990). A major reason for the disparities in rates was the effort devoted to telephone and field follow-up for missing data items in 1990; such effort was almost nonexistent in 2000. When sample loss is considered together with item imputation, the variability in the 2000 long-form-sample estimates could be much more than expected from the original sample selection probabilities. However, the Census Bureau’s variance estimates for the long-form sample took account of sample selection rates and whole-household sample loss, but not item imputation. What we do not know is the extent of bias in the 2000 long-form-sample estimates that might be attributable to the high rates of imputation. With regard to measures of response variance or consistency of reporting for the long-form items, the 2000 Content Reinterview Survey showed a wide range of values for an index of inconsistency, as did a similar survey in 1990. A total of 8 of the 2000 items and 7 of the 1990 items had index values greater than 50, indicating that the data were not reliably measured; another 13 items in 2000 and 14 items in 1990 had index values between 20 and 50, indicating that the data were only moderately reliable. Little information is available with regard to possible response bias for the long-form-sample items (e.g., systematic overreporting or underreporting of income). Because the 2000 Content Reinterview Survey did not ask probing questions to try to obtain an accurate response, it did not provide measures of response bias. Compar-

OCR for page 269
The 2000 Census: Counting Under Adversity isons of long-form-sample estimates with other sources have been performed for only a few variables thus far. Aggregate comparisons with the 2000 April Current Population Survey (CPS) and the C2SS found relatively consistent estimates of the percent poor in 1999: the estimated poverty rates were 12.4 percent for the long-form sample, 11.9 percent for the CPS, and 12.2 percent for the C2SS (Schneider, 2004:16). However, aggregate comparisons with the 2000 April CPS found sizeable discrepancies in estimates of employed and unemployed people (Clark et al., 2003). Thus, the census estimate of the number of employed people was 5 percent lower than the 2000 April CPS estimate, while the census estimate of the unemployment rate was 2.1 percentage points higher than the CPS rate, representing a 50 percent larger number of unemployed people (7.9 million in the census compared with 5.2 million in the CPS). The differences were much more pronounced for blacks and Hispanics than for other groups and much larger than those found in similar comparisons for 1990. They appear to be due in part to changes in question wording and imputation procedures from those used in 1990. Finding 7.2: For the household population, missing data rates were at least moderately high (10 percent or more) for over one-half of the 2000 census long-form-sample items and very high (20 percent or more) for one-sixth of the long-form-sample items. Missing data rates also varied widely among population groups and geographic areas. By comparison with 1990, missing data rates were higher in 2000 for most long-form-sample items asked in both years and substantially higher—by 5 or more percentage points—for one-half of the items asked in both years. In addition, close to 10 percent of long-form-sample households in 2000 (similar to 1990) provided too little information for inclusion in the sample data file. When dropped households and individually missing data are considered together, the effective sample size that is available for analysis for some characteristics is 60 percent or less of the original long-form-sample size. Many long-form-sample items had moderate to high rates of inconsistent reporting, as measured in a content reinterview survey. Few assessments have yet been

OCR for page 269
The 2000 Census: Counting Under Adversity made of systematic reporting errors for the long-form-sample items, although aggregate comparisons of employment data between the 2000 census and the Current Population Survey (CPS) found sizeable discrepancies in estimates of employed and unemployed people—much larger than the discrepancies found in similar comparisons for 1990. No analysis of the effects of item imputation and weighting on the distributions of characteristics or the relationships among them has yet been undertaken, although analysis determined that changes in imputation procedures contributed to the 50 percent higher unemployment rate estimate in the 2000 census compared with the April 2000 CPS. Recommendation 7.1: Given the high rates of imputation for many 2000 long-form-sample items, the Census Bureau should develop procedures to quantify and report the variability of the 2000 long-form estimates due to imputation, in addition to the variability due to sampling and weighting adjustments for whole-household weight adjustments. The Bureau should also study the effects of imputation on the distributions of characteristics and the relationships among them and conduct research on improved imputation methods for use in the American Community Survey (or the 2010 census if it includes a long-form sample). Recommendation 7.2: The Census Bureau should make users aware of the high missing data rates and measures of inconsistent reporting for many long-form sample items, and inform users of the 2000 census long-form-sample data products (Summary Files 3 and 4 and the Public Use Microdata Samples) about the need for caution in analyzing and interpreting those data. In particular, users should review Census Bureau documentation of imputation and weighting procedures, examine imputation rates and estimates of standard errors provided by the Bureau, be alert for User Notes from the Bureau about data errors and other reports

OCR for page 269
The 2000 Census: Counting Under Adversity on data quality, and inform Census Bureau staff of data anomalies for investigation. 7–D QUALITY OF GROUP QUARTERS DATA Residents of group quarters accounted for 7.8 million people in the 2000 census, up from 6.7 million in 1990. The census is the only source at present of detailed information for residents of all types of group quarters, including prisons, juvenile institutions, nursing homes, hospitals and schools for the handicapped, military quarters, shelters, group homes, and other group quarters. The quality of the data for group quarters residents in the 2000 census long-form sample was poor in comparison with the data for household residents and also in comparison with the group quarters data in 1990. Table 7.9 shows 2000 imputation/assignment rates and comparable 1990 imputation rates for selected person data items for the total group quarters population, prison inmates, and students in college dormitories.14 In 2000 missing data rates for the items shown reached as high as 50 percent for all group quarters residents and as high as 75 percent for prison inmates. Generally, missing data rates were highest for inmates of institutions (prisons, juvenile institutions, nursing homes, hospitals and schools for the handicapped) and lowest for college students and the military. The particularly high rates for institutional residents were probably due in part to the high rates of use of administrative records to provide information instead of enumeration of residents. The Census Bureau was not prepared for the widespread resort to administrative records by institutions (see Section 4-F.2), and, very often, the available records did not contain long-form-type information, or institutions were unwilling to provide such information. The great extent of missing data among group quarters residents in 2000 raises a question as to whether the Census Bureau should have published long-form-sample estimates for some or all types of group quarters. Given the decision to publish, it was unfortunate that many tabulations in census data products (e.g., age, employment, income) combined group quarters and household residents. 14   See Appendix H (Table H.8) for rates for all of the basic and additional items for group quarters residents in nine categories of type of facility.

OCR for page 269
The 2000 Census: Counting Under Adversity Table7.9 Imputation/Assignment Rates for Selected Person Items, 2000 and 1990 Census Long-Form Samples, by Type of Residence, Group Quarters Population (weighted)   2000 1990   Total Group Quarters Inmates of Prisons Students in College Dormitories Total Group Quarters Inmates of Prisons Students in College Dormitories Agea 3.8 5.5 3.4 1.5 2.1 1.3 Sex 3.0 2.7 1.9 0.6 1.1 0.2 Race 4.5 5.4 5.4 1.8 2.7 1.4 Ethnicity 8.0 11.8 7.1 7.6 16.8 3.4 Marital Status 18.0 30.9 8.1 4.2 11.1 1.2 Educational Attainment 39.3 53.8 19.2 17.9 24.6 2.8 English- Speaking Ability 33.9 56.8 16.3 22.1 29.8 11.0 Place of Birth 40.2 54.0 22.2 19.2 31.7 6.7 Citizenship 36.5 53.0 19.9 14.0 24.7 3.9 Residence 5 Years Ago 44.9 70.6 23.7 18.1 33.5 4.7 Mobility Disability 46.9 66.2 22.3 16.7 31.5 6.3 Work Disabilityb 47.7 66.7 22.7 18.1 34.2 6.0 Grandchildren at Home 30.0 36.5 0.5 — — — Veteran Status 39.6 57.5 21.6 18.0 29.2 5.7 Occupation Last Year 46.9 75.4 30.7 21.3 44.2 11.1 Weeks Worked Last Year 42.8 72.5 29.1 21.4 40.2 13.2 Wages Last Year 50.1 74.3 34.7 27.4 49.7 15.0 Population (millions) 7.78 1.98 2.06 6.66 — — NOTES:—; not available. a Excludes imputation of age from date of birth. b In 1990, “work disability ” refers to a disability that prevents working; in 2000, the term refers to a disability that makes it difficult to work. SOURCE: Tabulations by U.S.Census Bureau staff from the 2000 and 1990 Sample Census Edited Files (SCEF), provided to the panel spring 2003.

OCR for page 269
The 2000 Census: Counting Under Adversity Combining the data makes it harder for users to compare census results for such statistics as the poverty rate and unemployment rate with other household surveys, which typically do not include the military or institutional residents. Combining the data also obscures the differences in data quality between group quarters residents and household members. The difference is further obscured because published item imputation (allocation) rates that accompany the long-form-sample data products combined group quarters and household member rates. No systematic investigation has yet been undertaken of the effects on distributions of characteristics of the high rates of missing data and the imputation procedures used. However, the discovery by users of very high unemployment rates in some communities, such as college towns, led to a determination by the Census Bureau that a particular combination of missing responses to some questions and reports of availability for work by residents of group quarters resulted in an inappropriate imputation of unemployed status to many such residents. The problem affected residents of noninstitutional group quarters, such as students in college dormitories, people living in group homes, and others. Residents of institutions showed similar reporting patterns, but the imputation did not allow an unemployed status for them. The problem is described in User Note 4 for Summary Tape File 3 (U.S. Census Bureau, 2003d:Data Note 4),15 and the magnitude is such that the Census Bureau reissued employment status tabulations for states, counties, and places in late 2003 to exclude group quarters residents, limiting the tabulations to household residents only.16 Finding 7.3: For group quarters residents, missing data rates for most long-form-sample items were very high in 2000 (20 percent or more for four-fifths of the items and 40 percent or more for one-half of the items). The 2000 rates were much higher than missing data rates for household members and considerably higher than missing data rates for group quarters residents in 1990. The 15   The note is also available at http://www.census.gov/prod/cen2000/doc/sf3.pdf [2/25/04]. 16   Available at http://www.census.gov/population/www/census2000/phc-28.html [2/25/04].  

OCR for page 269
The 2000 Census: Counting Under Adversity 2000 missing data rates were particularly high for prisoners, residents of nursing homes, and residents of long-term-care hospitals perhaps because of heavy reliance on administrative records for enumerating them. Few assessments have yet been made of systematic reporting errors for group quarters residents for long-form-sample items, nor of the effects of imputations on the distributions of characteristics or the relationships among them. However, a systematic error was found in the imputation of employment status for people living in noninstitutional group quarters because of a particular pattern of missing data. The result was a substantial overestimate of unemployment rates for these people, so much so that the Census Bureau reissued employment status tabulations for household members only, excluding group quarters residents. Earlier we called for a complete redesign of the enumeration procedures for group quarters residents in the 2010 census (see Recommendation 4.4 in Section 4-F.3). That redesign should include consideration of changes to questionnaire content as well. If the American Community Survey is fully funded as a replacement for the census long-form sample, then it is likely to and should provide detailed data for group quarters residents. (At present, the C2SS and other precursors to the ACS include only household residents.) The designers for the ACS should consider how best to obtain long-form-type data for different types of group quarters. For institutions, the use of administrative records may make most sense provided the cooperation of the facility staff can be obtained. It may also be that accurate responses to some of the long-form-sample questions (e.g., income and employment last year) are too difficult to obtain in institutional settings, either from records or the residents themselves, at least without special training of interviewers and other measures to elicit responses. For other types of group quarters, a household-type questionnaire may work best. This small, but important, population merits dedication of sufficient resources for research and testing on questionnaire design and content and enumeration procedures that can produce useful, high-quality information for policy making, program planning, and other purposes.

OCR for page 269
The 2000 Census: Counting Under Adversity Recommendation 7.3: The Census Bureau should publish distributions of characteristics and item imputation rates, for the 2010 census and the American Community Survey (when it includes group quarters residents), that distinguish household residents from the group quarters population (at least the institutionalized component). Such separation would make it easier for data users to compare census and ACS estimates with household surveys and would facilitate comparative assessments of data quality for these two populations by the Census Bureau and others.

OCR for page 269
The 2000 Census: Counting Under Adversity This page intentionally left blank.