Conducting a population census of the United States is a task of awesome difficulty; it requires massive effort—in a short amount of time—to enumerate and collect data on over 280 million residents at their precise geographic location, striving always to count every resident once and only once. In order to meet this challenge, the decennial census-taking process has evolved into an extremely complex set of operations and components, some of which have been integral parts of the process for decades and others of which were introduced in 2000 as the newest and possibly best means of improving the count.
Evaluating a decennial census is a similarly daunting mission, requiring careful scrutiny of every procedure and careful assessment of the effect of each procedure on the quality of the resulting data. The ultimate benchmark against which the results of a census could be compared—namely, an unambiguously true count of the population—is as unknown and elusive to census evaluators as it is to census collectors. Thus, the task of rendering a summary judgment on the quality of a particular decennial census is a very complicated undertaking.
The overall charge of the Panel to Review the 2000 Census is to assist the Census Bureau in evaluating the 2000 census, and this report constitutes the panel’s preliminary findings. Hence, the panel believes that it is appropriate to begin this report by explaining the manner in which it conceptualizes its primary objective—that is, describing how it defines the job of evaluating the decennial census. In this chapter we discuss two general principles that shape our approach to the problem: the necessity of assessing data quality in the context in which the data are used and the inevitability of error in the decennial census. We then outline a basic program for evaluating the 2000 census. This program is necessarily ambitious, and this interim report is decidedly not intended to be a full realization of that program. Rather, this chapter illustrates the panel’s general orientation toward census evaluation and provides the context for the panel’s own work and its recommendations for evaluation to the Census Bureau.
CENSUS DATA IN CONTEXT
The result of a decennial census is a collection of data products, which can generally be classified into two broad categories: basic population counts and summaries of the characteristics of areas or groups.1 Collectively, these data products are used for a wide variety of public and private needs; they are examined in a myriad of contexts and interpreted in many different ways.2 Proper evaluation of a census demands assessment of the quality and usefulness of its results in the context in which those results are actually used.
The first type of census data product—population counts for the nation as a whole and for subnational areas—satisfies the constitutional mandate for the census by providing the state-level counts required to reapportion the U.S. House of Representatives. Likewise, small-area population counts are the essential building blocks used for redistricting within states and localities. Basic population counts, sometimes differentiated by demographic group, are crucial for a variety of other uses, including:
calibration of data from other collection and survey programs, such as the Current Population Survey, the Vital Statistics of the United States, and the Uniform Crime Reports;
determination of eligibility for federal and state government funding programs;
comparison and ranking of areas (such as cities and metropolitan areas) for such purposes as advertising, marketing, and public information; and
benchmarking of intercensal population estimates.
Census count data are used to estimate both the level (raw count) and the share (proportion) of total population across different geographic areas; they are also used to compute change over time for either levels or shares.
The second type of census data product—local area or group characteristics—includes the various counts and averages that result from detailed cross-classification by geographic, demographic, and socioeconomic variables, particularly those collected on the decennial census long form. Examples of these characteristics include per capita income by census tract and state-level counts by demographic group, educational level, and employment type. Data sets of this type form the cornerstone of basic and applied socioeconomic research.
They play central roles in the evaluation of equal employment opportunity and other programs, and they are used in fund allocation formulas.
Both census data types are vital ingredients for general planning, analysis, and decision making by both governmental and nongovernmental (commercial) entities of all sizes. State and local governments rely on these data for such purposes as assigning personnel to police and fire precincts, identifying the areas of a city in greatest need of service facilities, and conducting traffic planning studies. Likewise, business plans and decisions depend on census count and characteristics data: applications include locating retail outlets, comparing the market potential of different cities, and assessing the availability of needed occupational skills in different labor market areas. Both census data types are essential to many academic and private-sector researchers whose work depends on charting population differences and their changes over time.
There is no single, dominant use of census data. The significance of this fact for census evaluation is that there is no single, dominant metric against which census results can be compared in order to unequivocably determine that they are either good or bad. For example, a census could provide outstanding count data but subpar characteristics data: this could happen if serious problems occurred with the census long form. The data from such a census would be perfectly adequate for some uses but would fail to satisfy others. Similarly, the representation of the data as levels or shares, or the level of geographic aggregation, might affect one’s judgment of the quality of the data. For instance, a purely hypothetical census that—for some reason—did an excellent job of collecting information from males but not from females could still produce reasonably accurate inferences when the data are presented as shares across different geographic areas, but would suffer badly when used as count data. Similarly, changes in census processes could improve the precision of counts while hurting the use of the same data to represent changes in counts over time. For example, the change to allow multiple responses to race and ethnicity questions in the 2000 census may make it possible to capture data on more focused demographic groups, but complicate inferences about the relative sizes of minority groups relative to past censuses. A comprehensive evaluation of a census must, therefore, strive to interpret census results in the context of all of their possible uses.
At the most basic level, an ideal census evaluation would measure the differences between census-based counts or estimates and their associated true values. An estimated count greater than the true value would be considered a net overcount, and an estimated count less than the truth would be a net undercount. These differences between estimates and (unknown) truth are errors, in the statistical sense. Despite the word’s colloquial meaning, these errors are not necessarily an indication that a mistake has been made.
Another measure of error would be the sum of the deviations from the true values for a population group, which would be the gross coverage error, comprising gross overcount and gross undercount. Gross error is a useful quality indicator since it may indicate problems that are not obvious from a measure of net error. For example, net coverage error could be zero for the total population, but there could be large gross errors of omissions and erroneous enumerations. To the extent that these two types of error differ among population groups, there could be different net undercounts for population groups even when total net error is zero. Moreover, even when gross omissions and erroneous enumerations balance, examination of them could help identify sources of error that would be useful to address by changing enumeration procedures or other aspects of the census.
Any evaluation of a decennial census must necessarily attempt to get some reading of the level of various types of error in the census, even though those errors cannot be computed directly. Census evaluations must confront a commonsense but nevertheless critical reality: error is an inevitable part of the census, and perfection—the absence of all error—is an unrealistic and unattainable standard for evaluation. The sources of census error are numerous, and many are simply uncontrollable. In this light, census evaluators must be aware of potential sources of error, gauge their potential effects, and develop strategies to measure and (when possible) minimize errors. We do not attempt in this section an exhaustive study of each of these topics, but intend only to provide a general flavor of the problems related to census error.
Sources of Error
Errors in the census can generally be categorized as one of two broad types. First, they may result from problems of coverage—that is, each address does not appear once and only once in the Census Bureau’s address list, and each individual is not included once and only once in the enumeration. Second, errors may arise due to problems of response, in that responses on a collected questionnaire may be incomplete or inaccurate.3
Examples of potential errors in coverage are numerous; one natural source of such errors is the list of mailing addresses used by the Census Bureau to deliver census questionnaires. This list, the Master Address File (MAF) in the 2000 census, was constructed in order to be as thorough as possible. The dynamic nature of the U.S. population and their living arrangements make it all but impossible for the list to be completely accurate: it is difficult to fully capture additions and deletions to the list that result from construction of new
residences, demolition of former residences, and restructuring of existing residences. Identifying all residences in remote rural areas and in multi-unit structures is also a major challenge. Many individuals have more than one home (examples include “snowbirds” from cold-weather states with winter homes in Florida or the Southwest and children in joint custody arrangements), while many others are homeless most or all of the time. In the 2000 census, the Master Address File was built in part with sources that were not used in previous censuses, such as the Delivery Sequence File used by the U.S. Postal Service to coordinate its mail carriers and the direct addition of new addresses from local and tribal governments. The intent was to make the MAF as complete as possible and improve coverage; however, these sources are not guaranteed to be complete, and they may have inadvertently added duplicate addresses to the final address list. All of these complications—representing possible gaps or overages in the address list—may result in either undercounts or overcounts.
Errors in response are similarly numerous and impossible to avoid. Many people simply do not fully cooperate in filling out census forms; this is a particular concern for the census long form, parts of which some respondents may believe violate their privacy or confidentiality.4 Some households and individuals do not respond to all the questions on the questionnaire; there may also be some degree of intentional misresponse. Of course, not all of the census response errors result from the actions of respondents; some error is also introduced—often unintentionally but sometimes deliberately—by members of the massive corps of temporary census field workers assigned to follow up on nonresponding residents. Although steps are taken to prevent the fabrication of a census response by filling in information for a housing unit without actually visiting it and conducting an interview—a practice known as curbstoning—this practice remains a well-documented source of survey error. Another source of census response error is confusion over the questionnaire itself; language difficulties may deter some respondents, while others may not understand who should and should not be included, such as students on temporary visas or away at college.
Consequences of Error
The potential effect of different levels of census error is difficult to quantify succinctly, primarily because the effects of error depend greatly on the use to which census data are put and on the fineness with which the data are aggregated geographically.5 To date, the focus of research on the consequences of census error has been on their effect on three major uses of census data: reapportionment of the U.S. House of Representatives (as mandated in the U.S.
Constitution), legislative redistricting, and formula allocation programs. In all these cases, high levels of net undercount or overcount can have major consequences, but there is no easy way to determine a threshold beyond which the level of error in a census is somehow unacceptable.
Concern over the possible effect of census coverage error on congressional reapportionment—and, more specifically, the effect of competing statistical adjustments to try to correct for said error—has fueled debate over census methodology for years. Although the U.S. Supreme Court’s 1999 decision prohibited the use of sampling-based census estimates for reapportionment, the potential effects of error and adjustment on apportionment are still viable concerns, given the prominence of reapportionment as a use of census data. In studies related to the 1990 census, census data adjusted to reflect estimated net undercount produced different results when input into the “method of equal proportions” formula used for reapportionment than did unadjusted counts; specifically, one or two seats would have shifted between states if adjusted counts had been used. The sensitivity of the apportionment formula to small shifts in population counts was cited by the then Commerce Secretary Robert Mosbacher in his decision not to adjust the 1990 census. The change in political clout that can result from a shift of even one seat defies estimation; moreover, the mere fact that different levels of census error and adjustment strategies can alter apportionment opens the door to the unfounded but damaging assertion that the census can be manipulated to produce a desired political effect.
The potential effect of census error on legislative redistricting is particularly hard to assess, given the intensely political nature of the process. The shrewdness of a mapmaker in piecing together blocks into districts arguably has more effect on any perceived bias in the district than do block-level census errors. However, it is certainly possible that high levels of error in the census could have major effects on districts within states. For instance, errors in the census might affect the urban-rural balance within a state, and any resulting district map could dilute the vote of urban residents at the expense of rural residents—or vice versa. Such outcomes would depend on the average size of the districts, the differential undercoverage rates of major population groups, the proportionate distribution among areas of these population groups, and the number of contiguous districts with high rates of census undercoverage.
The large amount of federal funds distributed per year to states and localities on the basis of population counts or characteristics—about $180 billion—raises the potential for considerable effects from census errors. However, the large number of complicated (and, at times, undocumented) formulas makes it extremely difficult to carry out a comprehensive analysis. The research to date on a small subset of programs suggests two basic effects. First, the effects on allocations are larger for programs that distribute funds on a per capita basis than for programs that allocate shares of a fixed total, since in the latter
situation, it is not total net undercoverage that determines additional funds, but differential undercoverage. Second, when formulas are based on factors in addition to population counts, such as per capita income, errors in census coverage often have less effect on fund distribution than the errors in the other factors (see National Research Council, 1995:Ch.2; Steffey, 1997).
Methods to Measure Error
Because true values for population counts for geographic areas are never available, estimates of the level of error in those data must necessarily be carried out indirectly. The most common measures of census error are based on external validation: comparison values from other (partly or completely independent) sources are obtained and used as substitutes for the true values, and census errors are then approximated. For the 2000 census, such comparison values either are or will be available from a host of sources: the Accuracy and Coverage Evaluation (A.C.E.) Program; demographic analysis; the Census 2000 Supplementary Survey (the pilot American Community Survey); other household surveys, such as the Current Population Survey; and administrative records such as those collected by the Internal Revenue Service. It is important to note, however, that the use of these comparison values does not guarantee accurate approximations of census error. All of these comparison values are themselves subject to error that is only partially understood, and it is possible that they are more subject to error than are the census data to which they are compared. Still, external validation can provide useful insights.
Another method is to scrutinize and study individual components of the census process for their potential effect on census undercoverage or overcoverage. For instance, when there are unresponsive households, census enumerators collect information on those households from proxies, such as neighbors or landlords. Since these proxy responses may contain inaccurate or incomplete data, the rate of proxy responses in the census is an important barometer of census response error. Similarly, such operations as the “Be Counted” Program (which made census forms available in public places) and the effort to enumerate people in soup kitchens and homeless shelters may mistakenly duplicate people already included in the mailout/mailback component of the 2000 census. Careful attention to each individual component of the decennial census process produces a long list of measures that, when viewed as a whole, can inform a judgment on census error. Such a list can also be helpful in crafting strategies—such as techniques to impute for nonresponse—to try to curb census error.
AN EVALUATION PROGRAM
Bearing the preceding concepts in mind, we now outline some primary steps of a comprehensive census evaluation program. This sketch will evolve as the panel continues its investigations and discussions. Over its lifetime, the panel will undertake as much of this evaluation program as its resources permit. In many cases, though, the Census Bureau must play a major role in compiling and analyzing relevant evaluation data from the confidential census records and voluminous ancillary data sets (e.g., performance records for census operations). Accordingly, the panel awaits results from the Bureau’s own evaluation program and reports. The panel urges the Census Bureau to continue its efforts to provide complete documentation, supporting data, and evaluation results to the research community.
Assessments of Primary Components
A central part of the panel’s evaluation will be an examination of the major components of the 2000 census and their subcomponents. These major components include, but are not necessarily limited to, the following (each of which is described more fully in Chapter 3 and, subsequently, in Appendix A):
development of the Master Address File, including: updates from the Postal Service Delivery Sequence File, block canvassing, and the Local Update of Census Addresses Program; special efforts to list group quarters and special places; and operations to filter duplicate addresses;
delivery and return of the census questionnaires by mail or in person by census enumerators, including: the procedures for identifying geographic areas in which different enumeration strategies would be used; analysis of respondent cooperation in areas enumerated through the mailout/mailback and update/leave operations; and analysis of mail response and return rates for both the short and the long forms;
conduct of auxiliary operations, such as advertising and local outreach, to enhance mail response and follow-up cooperation;
field follow-up of addresses that failed to report; and
data processing, including the optical scanning and reading of questionnaires and the techniques used to impute missing or erroneous responses.
For each of these components and their constituent procedures, there are five major questions for which answers are needed:
What sorts of error (either undercount or overcount) might be induced by the procedure, and what evidence exists as to the actual level of error the procedure added to the 2000 census?
In what ways did the procedures differ from those used in the 1990 or earlier censuses? In particular, what effects did new additions to the census process have on the level of census error?
Were the procedures completed in a timely fashion?
Did any evidence of systematic problems arise during their implementation?
What parts of the procedure, if any, should be changed in order to improve the 2010 and later censuses?
Assessment of External Validity Measures
Demographic analysis, the Accuracy and Coverage Evaluation (A.C.E.) Program, and possibly the Census 2000 Supplementary Survey are the primary external measures for comparison with 2000 census estimates to gauge the overall level of error. Accordingly, the quality of these sources of information must be assessed prior to their use. Particular attention must be paid to the types of error intrinsic to these measures, to their underpinning assumptions and their validity, and to the possible interpretations of discrepancies between these measures and census counts. Since these external validity measures are the result of a complex set of operations and procedures—like the census itself—the effectiveness of those procedures should be subjected to the same scrutiny as the census process, as outlined in the preceding section.
Assessment of Types of Errors
Using external validation, a crucial task is to assess the amount of net undercount and gross coverage error for various demographic groups at various levels of geographic aggregation. An important question is whether any patterns of net undercount are affected when the census results examined are used as levels (counts), as shares (proportions), or as changes in counts or shares. It is also important to assess the error in estimates on the basis of the characteristics information collected on the census long form and how that error varies with level of geographic aggregation. One technique for this latter analysis is external validation from administrative records and other sources; another is a detailed component error analysis, attempting to sort out errors due to such sources as proxy response, imputation for item nonresponse, and sampling error.
Geographic Patterning of Error and Systematic Bias
The census data need to be carefully examined to identify patterns or clusters of either net undercount or net overcount, both to inform data users and to consider remedies for future censuses. Such patterns, if they correspond
to areas that share common characteristics (such as demographic composition or level of income), may be indicative of a census process that is biased for or against those types of areas. Should any such patterns emerge, they should be checked against the results of previous censuses as a confirmatory measure.
Evaluation of a decennial census is not an easy task, and it does not lend itself to snap summary judgments. That is, it is both futile and unfair to try to render verdicts like a “good census” or a “bad census,” a “perfect census” or a “failed census.” A thorough evaluation of a census must measure the quality of all of its various outputs, interpreting them in the context of their many possible uses; it must examine all procedures for the types of error they may introduce and use appropriate techniques to estimate the total level of error in the census. In this chapter the panel has sketched out its basic objectives and guidelines; in the remainder of this interim report, we begin this program of evaluation.