Introduction and Overview
CONDUCTING A POPULATION CENSUS of the United States is a task of awesome scope and difficulty. It requires massive effort—in a short amount of time—to enumerate and collect data on over 280 million residents at their precise geographic location, striving always to count every resident once and only once. In order to meet this challenge, the decennial census-taking process has evolved into an extremely complex set of operations and components; each new census includes some techniques that have been integral parts of the process for decades and others that are introduced as the newest and possibly best means of improving the count.
The decennial census is formidable in its operations, and the importance of the information generated by the census is similarly impressive. Census data today serve not only the constitutionally mandated purpose of reapportioning seats in the U.S. House of Representatives, but also such purposes as redrawing legislative district boundaries, allocating federal and state program funds, planning and evaluating government programs, informing private- and public-sector decisions, and supporting a wide range of research. Census data not only provide the means of periodically recalibrating the political mechanisms of democracy, but also contribute to an informed citizenry; these data inform people of the civic health of the nation as a whole and allow them to see how their state, city,
town, and neighborhood compare with other areas and how they have changed over time.
Due to the scope of census operations and the demands on census data, evaluating a decennial census is a challenging but essential mission, requiring careful scrutiny of every procedure and careful assessment of the effect of each procedure on the quality of the resulting data. The ultimate benchmark against which the results of a census could be compared—namely, an unambiguously true count of the population—is as unknown and elusive to census evaluators as it is to census collectors.
1–A THE PANEL AND ITS CHARGE
In 1998, the U.S. Census Bureau asked the National Research Council’s Committee on National Statistics (CNSTAT) to convene a Panel to Review the 2000 Census in order to provide an independent assessment of the 2000 census. A companion CNSTAT Panel on Research on Future Census Methods was convened in 1999 to observe the 2000 census and its evaluation process in order to assess the Bureau’s plans for the 2010 census (see National Research Council, 2000a, 2001c, 2003a, 2004).
The charge to our panel is broad:
to review the statistical methods of the 2000 census, particularly the use of the Accuracy and Coverage Evaluation Program and dual-systems estimation, and other census procedures that may affect the completeness and quality of the data. Features the panel may review include the Master Address File, follow-up for nonresponse, race and ethnicity classifications, mail return rates, quality of long-form data, and other areas.
We conducted a variety of activities to carry out our charge: making observation visits to census offices during 2000; convening three open workshops on issues of coverage evaluation and adjustment; commissioning jointly with the Panel on Research on Future Census Methods a group of local government representatives to evaluate the Local Update of Census Addresses (LUCA) Program; commissioning a paper on race and ethnicity reporting in the census (Harris,
2003); reviewing voluminous evaluation reports and other documents that are publicly available from the Census Bureau; reviewing reports of other groups monitoring the census; and conducting original analysis with aggregate and microdata files provided to the panel by the Bureau under special access arrangements.
Prior to this, our final report, we published three workshop proceedings (National Research Council, 2001e,f,g), the report of the LUCA Working Group (Working Group on LUCA, 2001), three letter reports to the Census Bureau (National Research Council, 1999a, 2000b, 2001d), and an interim assessment of 2000 census operations (National Research Council, 2001a). Appendix A contains a summary of our activities and reports.
1–B OVERVIEW OF THIS REPORT
Our report contains 10 chapters and nine appendixes. The remainder of this first chapter describes sources of error in a census and methods of evaluating the completeness of census coverage and the quality of census data (1-C) and presents 11 summary findings about the 2000 census (1-D).
In our view, it is crucially important that census information be interpreted and evaluated in the context of the goals and uses for the data. Chapter 2 describes major census goals and uses, including the use of census data in congressional reapportionment and redistricting as well as the broader uses of the basic demographic data and the additional data on the long-form sample.
We have divided our analysis of the census as a whole into two major pieces. One is general census operations, which progress from the development of an address list through the completion of census evaluations. The second is coverage evaluation—attempts to assess undercount or overcount in the census and the debate over statistical adjustment of census figures. We present each of these major pieces in two-chapter segments that are parallel in structure. Chapter 3 describes the road to 2000—a summary of the planning cycle of the 2000 census and the complex political and operational environment that surrounded it. Chapter 4 presents our assessment of overall census operations, including the contributions of major procedural innovations to mail response, timeliness, cost, and com-
pleteness of response, as well as the enumeration of group quarters residents. Similarly, Chapter 5 introduces general concepts in coverage measurement and statistical adjustment and the evolution of plans for 2000 census coverage evaluation over the 1990s, and Chapter 6 presents our assessment of coverage evaluation in 2000.
In Chapters 7 and 8, we focus on the quality of the census data content. Chapter 7 reviews the quality of the 2000 census basic demographic data and additional long-form-sample data, relative to alternative sources and to past censuses. Chapter 8 addresses the measurement of race and ethnicity (Hispanic origin) in 2000. In Chapter 9, we offer brief comments of a general nature on the organization and management of census operations and the nature of an appropriate research program for 2010. The panel’s detailed findings and recommendations are collected and restated for the reader’s convenience in Chapter 10.
Appendixes provide information on nine topics: the panel’s activities and prior reports (A); basic questionnaire items and additional long-form items in the 2000 census, compared with questionnaire content in the 1990 census and the Census 2000 Supplementary Survey (B); 2000 census operations and major differences from 1990 census operations (C); quality of mail and enumerator returns (D); operations of the 2000 Accuracy and Coverage Evaluation Program (E); the theory and statistical basis of imputation for nonresponse (F); basic (complete-count) data processing (G); long-form-sample data processing (H); and topics covered in experiments and evaluations by the Census Bureau of 2000 census operations and data content (I). A Glossary defines technical terms and acronyms, and a Bibliography provides citations for references in the text and other pertinent references.
1–C EVALUATING A CENSUS
Our focus in this report is the quality of the 2000 census results—not only the completeness of population coverage, which was the primary concern of the U.S. Congress and others leading up to the census, but also the accuracy of the data on population characteristics. Such data include the basic items collected for all persons (about 281 million people) and additional items asked on the long
form of about a one-sixth sample of the population (about 45 million people). Where possible, we consider the quality of census results not only at the national level, but also for population groups and subnational areas. We consider how aspects of the design and operation of the census may have affected data quality in 2000 and suggest priority areas for research to improve procedures for the 2010 census. Cost and management issues are also important for a full assessment, and we consider some aspects of them, but our review is not systematic because our charge did not include a management review. More generally, we evaluate some aspects of the 2000 census more fully than others, depending on the importance of the topic, the availability of relevant data, and the expertise of the panel.
In this section, we briefly describe our general approach to the task of evaluating the 2000 census and describe some general principles regarding sources of errors and types of evaluation.
1–C.1 Errors in the Census
At the most basic level, an ideal census evaluation would measure the differences between census-based counts or estimates and their associated true values. An estimated count greater than the true value would be considered a net overcount, and an estimated count less than the truth would be a net undercount. These differences between estimates and (unknown) truth are errors, in the statistical sense. Despite the word’s colloquial meaning, these errors are not necessarily an indication that a mistake has been made.
Another measure of error would be the sum of the positive deviations and the sum of the negative deviations from the true values for a population group or geographic area. This measure is the gross coverage error, comprising gross overcount and gross undercount, each of which comprises several components. For example, one type of gross error occurs for geographic areas when households and people are assigned to an incorrect location (geocoding error); another type of gross error is duplicate census enumerations. There is not complete agreement on all of the components of gross error—for example, whether to count as errors those people whose status as a correct or erroneous enumeration could not be
determined. Furthermore, the definition of gross errors depends on the level of geographic aggregation—in particular, some geocoding errors affect only individual census blocks or tracts but not larger areas. Nonetheless, gross coverage errors are important to examine because they may indicate problems that are not obvious from a measure of net error. For example, net coverage error could be zero for the total population, but large errors of omission could just balance erroneous enumerations. Moreover, to the degree that these two types of coverage error differ among population groups andgeographicareas,therecouldbedifferent net undercounts for population groups and geographic areas even when total net error is zero. Even when gross omissions and erroneous enumerations happen to balance, examination of their components can help identify sources of error that would be useful to address by changing enumeration procedures or other aspects of the census.
Errors in the census can generally be categorized as involving one of two broad types of problems. First, they may result from problems of coverage—that is, some addresses fail to appear once and only once in the Census Bureau’s address list, or some individuals fail to be included once and only once in the enumeration. Second, errors may arise due to problems of response, in that responses on a collected questionnaire may be incomplete or inaccurate. Other types of error occur in the census as well. For example, the census long form is distributed to only a sample (17 percent) of the population, and so estimates based on the long form—like all survey results—are subject to sampling error.
Examples of potential errors in coverage are numerous. One natural source of such errors is the list of mailing addresses used by the Census Bureau to deliver census questionnaires, known as the Master Address File (MAF) in the 2000 census. We will discuss the MAF often and in detail later in this report, but for the moment we consider it in the abstract. The MAF was constructed in order to be as thorough as possible, but the dynamic nature of the U.S. population and its living arrangements make it all but impossible for the list to be completely accurate. It is difficult to fully capture additions and deletions to the list that result from construction of new residences, demolition of former residences, and restructuring of existing residences. Identifying all residences in remote rural
areas and in multiunit structures is also a major challenge. In the 2000 census, the MAF was built in part with sources that were not used in previous censuses, such as the Delivery Sequence File used by the U.S. Postal Service to coordinate its mail carriers and the direct addition of new addresses from local and tribal governments. Again, the intent was to make the MAF as complete as possible and improve coverage; however, these sources are not guaranteed to be complete, and they may have inadvertently added duplicate addresses to the final address list. All of these complications—representing possible gaps or overages in the address list—may result in either undercounts or overcounts.
Errors in response are similarly numerous and impossible to avoid. Many people simply do not fully cooperate in filling out census forms; this is a particular concern for the census long form, parts of which some respondents may believe violate their privacy. Some households and individuals do not respond to all the questions on the questionnaire; there may also be some degree of intentional misresponse. Some error is also introduced—often unintentionally—by members of the massive corps of temporary census field workers assigned to follow up on nonresponding residents. Another source of census response error is confusion over the questionnaire itself; language difficulties may deter some respondents, while others may not understand who should and should not be included, such as students on temporary visas or away at college. Many individuals have more than one home (examples include “snowbirds” from cold weather states with winter homes in Florida or the Southwest and children in joint custody arrangements), while many others are homeless most or all of the time.
Any evaluation of a decennial census must confront a common-sense but nevertheless critical reality: error is an inevitable part of the census, and perfection—the absence of all error—is an unrealistic and unattainable standard for evaluation. The sources of census error are numerous, and many are difficult to control. In this light, census evaluators must be aware of potential sources of error, estimate their potential effects, and develop strategies to more fully measure and (when possible) minimize errors.
1–C.2 Methods of Evaluation
Acknowledging the inevitability of error is one important point in framing an evaluation of a census; so too is realizing that “evaluation” means many things and spans many different approaches. In this section, we briefly describe several general approaches to evaluation; each of these approaches plays some role in the arguments laid out in the remainder of this report.
In such a large and complex operation as the decennial census, it is important to assess the conduct of specific procedures as they contribute to the success of the overall effort. Real-time process evaluations (quality control or quality assurance efforts) are needed to determine that a procedure is being executed as originally specified with minimal variation and on the needed time schedule. If schedule delays or more-than-minimal variation (e.g., different application of rules for accepting proxy enumerations across local census offices) are identified, there will hopefully be sufficient time to correct the problems. For example, sound census practice is to reinterview a sample of each enumerator’s work in nonresponse follow-up to be sure that he or she understands the job and is carrying it out in a conscientious manner. An interviewer who fails the reinterviewing standards will be replaced and his or her workload reenumerated as needed.
After-the-fact evaluations of individual procedures are important to determine their effects—positive and negative—on the timing and cost of the census and on data quality. After-the-fact process evaluations cannot assess data quality directly, but they can help answer such questions as whether a procedure may have introduced a systematic bias in an estimate (e.g., a systematic overestimate or underestimate of people who reported being of more than one race in follow-up due to inappropriate cues from enumerators). Such evaluations can also indicate whether a procedure may have introduced variation in data quality (e.g., variation in rates of missing long-form information due to variation in local census office instructions to enumerators on how aggressively to push for the information).
After-the-fact process evaluations can suggest important areas for research to improve operations in future censuses.
Having effective real-time quality control and assurance procedures in place for key census operations will contribute to but not ensure high-quality data. First, processes may have been well executed but poorly designed to achieve their goal. Furthermore, even when well-designed operations are carried out as planned, respondent errors from such sources as nonreporting, misreporting, and variability in reporting may result in poor-quality data. As signals of possible problems that require further research, it is common to construct and review a variety of census quality indicators, such as mail return rates, household (unit) nonresponse rates, item nonresponse rates, inconsistency in reporting of specific items, and estimates of duplicates in and omissions from the census count. For census long-form-sample estimates, it is also important to look at variability from sampling and imputation for missing data.
Each of these quality measures must itself be evaluated and refined. For example, the initial measure of duplicates and other erroneous enumerations from the Census Bureau’s 2000 Accuracy and Coverage Evaluation (A.C.E.) Program—issued in March 2001—turned out to be substantially lower than a revised estimate released in October 2001, which in turn was lower than the final estimate issued in March 2003. As another example, it is generally assumed that a high nonresponse rate to a questionnaire item impairs data quality. The reason is the twofold assumption that reported values are likely to differ from the values that would have been reported by the nonrespondents and that imputations for missing responses will tend to reflect the values for reporters more than the true values for nonrespondents (as well as add variability). While often true, neither part of this assumption is always or necessarily true (see Groves and Couper, 2002).
Quality indicators (and process evaluations) also need a standard for comparison—for example, what is a “high” versus a “low” nonresponse rate? Such standards are commonly drawn from prior censuses; they may also be drawn from other surveys and admin-
istrative records. It is important to evaluate the applicability and quality of the source(s) for the standards. Differences in 1990 and 2000 census procedures complicate the task of comparative evaluation, as do differences in survey procedures from census procedures.
Comparisons of Estimates With Other Sources
Comparison of specific estimates from the census with the same estimates from other sources is another important form of data quality evaluation. For evaluation of the completeness of population coverage, the two traditional comparison sources have been demographic analysis and dual-systems estimation from an independent survey matched with a sample of census enumerations. For evaluation of the accuracy of such estimates as median household income, percentage multirace reporting, or the unemployment rate for specific population groups, comparison sources include household surveys, such as the Current Population Survey and the Census 2000 Supplementary Survey, and administrative records, such as Social Security records. The independent A.C.E. survey is also a comparison source for some estimates, such as people who reported more than one race.
Neither the census nor any comparison source can be viewed as the “truth,” and it can often be difficult to establish which sources, if any, may be “better” than the census. A careful assessment of the quality of each source is needed to help determine the relative superiority or inferiority of different sources used for comparisons with census estimates. Also, a careful examination of concepts and procedures for various sources and the census is needed to identify important differences that could affect the validity of comparisons of estimates.
Comparisons With Previous Censuses
An almost inevitable standard of comparison for a census is to compare its execution and outcomes—costs, timing, completeness of population coverage, and other criteria—with past censuses. Typically, the design for a census is shaped by perceptions of the major problems for the preceding census. However, differences in procedures and in the social and political context may affect the
appropriateness of comparisons with previous censuses, and such differences need to be interpreted carefully.
Explaining Data Quality Strengths and Weaknesses
An assessment of census data quality is important not only for data users but also for the Census Bureau to help plan future censuses. For this purpose, it is important to conduct analyses to determine one or more underlying explanations for quality successes and problems so that changes can address relevant causal factors.
The ideal method for causal analysis is experimentation in which subjects are randomly assigned to treatment and control groups. Such experimentation is of limited use for understanding census data quality. Experiments were conducted in the 2000 census, but they were few in number, small in scale, and narrow in scope.1 Census planning tests conducted between census years are often experimental in nature, but they, too, are necessarily limited in scope. Moreover, census tests cannot be conducted under the same conditions as an actual census with regard to publicity, scale of operations, and other features.
So, while experimentation can help identify some reasons for the levels of census data quality achieved nationwide and for areas and population groups, the search for underlying explanations must rely primarily on nonexperimental methods, such as multivariate analysis of quality indicators. We used such methods for analyzing mail return rates as a function of neighborhood characteristics and imputation rates as a function of geographic location and type of census enumeration. While these types of analysis will probably not produce definitive conclusions regarding cause and effect, they are important to pursue for clues to alternative census procedures that merit testing for 2010.
1–D SUMMARY OF FINDINGS: OVERALL ASSESSMENT
To convey our overall assessment of the 2000 census, we provide 11 major findings and a summary conclusion. These findings cover the successes of the census in the face of considerable adversity,
problem areas in census operations and data quality, decisions about adjustment of census population counts for coverage errors, and the Census Bureau’s two major evaluation programs for 2000 (one on coverage, the other on census processes and content).
The decennial census is the federal government’s largest and most complex peacetime logistical operation. To be most effective and efficient, design decisions, budget decisions, detailed planning, and operational trials must be completed well before the census is conducted. Yet as we relate in Chapters 3 and 5, the design and planning process for the 2000 census was unusually contentious. Partisan criticism of censuses has occurred in the past, but never before had the Census Bureau faced the need to develop several different operational plans for conducting the census, a budgetary compromise that occurred so late in the planning process, and an atmosphere in which the objectivity of the Bureau itself was under strong attack. In these circumstances, and given the enormous complexity and scale of census operations, the 2000 census was a considerable success. It was carried out in a completely objective manner in which decisions were made on the basis of professional judgment using the best available information.
Finding 1.1: The 2000 census was generally well executed, even though the Census Bureau had to contend with externally imposed last-minute changes in design, delayed budget decisions, consequent changes in plans, and insufficient time for operational trials. The dedication of the Census Bureau staff made possible the success of several operational innovations. Because of the many delays and last-minute changes in census design, the ample funding appropriated in spring 1999 was essential for the successful execution of the census.
The Bureau made innovations in the design of census questionnaires and mailing materials and in the scope and content of advertising and outreach programs that boosted overall mail response rates, even as the population had become more resistant to government requests for information and more concerned about privacy
issues. The Bureau also succeeded in hiring the massive workforce required for census field operations, completed follow-up of nonresponding households in a timely manner, and made effective use of new technology for data input (see Chapter 4).
Finding 1.2: The decline in mail response rates observed in the 1980 and 1990 censuses was successfully halted in the 2000 census, and most census operations were well executed and completed on or ahead of schedule.
Finally, the Bureau made progress toward one of its major goals, which was to reduce differential net undercount rates between historically less-well-counted groups (minorities, renters, and children) and others compared with 1990 (see Chapter 6). Such reductions generally imply reductions as well in differential net undercount rates among states and other geographic areas. This assessment is based on comparing the results from the 1990 Post-Enumeration Survey (PES) and the 2000 A.C.E. Program as originally estimated. Subsequent revisions of the A.C.E. estimates are difficult to compare with the PES, but they do not contradict this assessment (see Chapter 6).
Finding 1.3: Although significant differences in methods for estimating net undercount in the 1990 Post-Enumeration Survey and the most recent revision of the 2000 Accuracy and Coverage Evaluation Program make it difficult to compare net undercount estimates, there is sufficient evidence to conclude that the 2000 census was successful in reducing the differences in national net undercount rates between historically less-well-counted groups (minorities, renters, children) and others.
Although the census was successful in achieving its overall goals for mail response, timely execution of operations, and reduction in differential net undercount, it also experienced problems. They included large numbers of duplicate enumerations—because of undetected duplications in the MAF and people being reported by two households—and high rates of missing data for many content items
on the long-form questionnaire (see Chapter 7). Rates of missing data were even higher for residents of group quarters (e.g., college students residing in dormitories, prisoners, nursing home residents, and others).
Finding 1.4: The 2000 census experienced four major problems of enumeration: (1) errors in the Master Address File; (2) large numbers of duplicate and other erroneous enumerations in the census; (3) high rates of missing data for many long-form items; and (4) inaccuracies in enumerating residents of group quarters.
1–D.3 Adjustment Decisions
A decision by the U.S. Supreme Court in 1999 prohibited the use of sampling-based methods for generating census counts used to reapportion the U.S. House of Representatives. But the Census Bureau faced three major decision points as to whether it would use the results of the A.C.E. Program to statistically adjust 2000 census data to reflect estimated coverage errors. In March 2001 the Census Bureau weighed whether to adjust census data to be used for legislative redistricting; in October 2001 the question was whether to adjust census data for such uses as allocating federal funds to states and localities; and in March 2003 the issue was whether to use a revised set of adjusted totals as the base for postcensal population estimates. Each of these decisions was preceded by a wave of intensive evaluation studies dissecting the A.C.E. Program in great detail; at each of these decision points, the Census Bureau ultimately elected not to adjust the census data.
Finding 1.5: In March 2001, October 2001, and March 2003, the Census Bureau announced that it would not adjust the 2000 census results for incomplete coverage of some population groups (and overcounting of other groups). In our judgment, all three of the Bureau’s decisions are justified, for different reasons. The March and October 2001 decisions are justified given (1) the Bureau’s conclusion in March that evaluation studies were not sufficient to determine the quality of the A.C.E. population estimates and (2) its conclusion in October, after
further study, that the A.C.E. population estimates were too high and an adjustment using A.C.E. results as originally calculated would have overstated the population.
We view the Bureau’s March 2003 conclusion as justified because the final A.C.E. estimates that it produced from further analysis, while reflecting high-quality, innovative work, had to make use of incomplete data and incorporate assumptions that cannot be well supported.
Although the A.C.E. Program experienced problems, the program and its supporting research provided important insights into the census process. Moreover, the 2000 A.C.E. experience reaffirmed the utility of in-depth assessment of both coverage gaps and duplications.
Finding 1.6: The Accuracy and Coverage Evaluation Program provided invaluable information about the quality of the 2000 census enumeration. The A.C.E. was well executed, although problems in reporting Census Day residence led to underestimation of duplicate census enumerations in the original (March 2001) A.C.E. estimates.
In the panel’s assessment, the 2000 census and A.C.E. experiences suggest three basic findings (or conclusions): first, on the need to examine components of gross error, as well as net coverage error; second, on the need for adequate time to evaluate the completeness of population coverage and the quality of data content; and, third, about the root variability of census totals at the finest grained geographic level.
Finding 1.7: The long-standing focus of key stakeholders on net undercount distracts attention from the components of error in the census, which include duplications, other erroneous enumerations, and omissions. Although the most recently released national net undercount estimate for 2000 is a small net overcount of 0.5 percent of the population (1.3 million additional people counted), there were large numbers of gross errors, almost as many as in
1990. These errors, which would not have been detected without the A.C.E., have greatest impact on estimates for population groups and subnational geographic areas.
Finding 1.8: The experience with the 2000 Accuracy and Coverage Evaluation Program and the evaluation of census processes and data content make clear that useful evaluation requires considerable time. In particular, it appears difficult to complete sufficiently comprehensive assessments of population coverage and the quality of basic characteristics by the currently mandated schedule for releasing block-level census data for use in redistricting (which is 12 months after Census Day).
Finding 1.9: Census counts at the block level—whether adjusted or unadjusted—are subject to high levels of error and hence should be used only when aggregated to larger geographic areas.
Our last two major findings concern the vitally important evaluation programs for the 2000 census, which addressed population coverage and census processes and content.
Finding 1.10: Under tight time constraints, the Census Bureau’s coverage evaluation technical staff conducted comprehensive and insightful research of high quality on the completeness of coverage in the 2000 census. Their results for the A.C.E. and demographic analysis were well documented and provided useful information to 2010 census planners, stakeholders, and the public.
Finding 1.11: The Census Bureau’s evaluations of census processes and the quality of the data content were slow to appear, are often of limited value to users for understanding differences in data quality among population groups and geographic areas, and are often of limited use for 2010 planning.
1–D.5 Summary Assessment
Because the census is a complex, multifaceted operation, our major assessments (and our more detailed assessments in subsequent chapters) are difficult to reduce to one or two sentences. Overall, we conclude that the 2000 census was at least as good as and possibly better than previous censuses in completeness of population coverage and the quality of the basic demographic data. The census also experienced problems that made some of the data—principally some of the long-form-sample information and the data for group quarters residents—no better and in some cases worse than in previous censuses.
The decennial census is the largest and most difficult of all the nation’s statistical programs. The fact that our assessment found areas that need improvement should not detract from the success of the 2000 census in providing relevant, timely data for many uses. Yet it should also make clear the importance of evaluating and documenting the quality of census data and operations not only to facilitate the appropriate use of the data but also to target research and development efforts most effectively for future censuses.