Back To Basics: What Are Census Errors and How Can They Be Measured?
As a result of the issues that the Census Bureau has raised regarding census coverage measurement, it is useful to go over some concepts needed for the remainder of this report. First, we address the various types of census error that can occur, defining omissions, duplications, erroneous enumerations, and errors of geography and demographic characteristics, and consider the adequacy of these terms to categorize types of error.
We next describe and assess the two basic approaches to error measurement currently used by the Census Bureau, dual-systems estimation (DSE) and demographic analysis. We then discuss the use of coverage measurement for assessment of census quality, to support census process improvement, and for adjustment of census counts.
TYPES OF CENSUS ERRORS
Coverage errors in census enumerations are of two types: inclusion of people in the enumeration who should not have been included, and omission of people who should have been included. People mistakenly included in the census comprise two types. First, erroneous enumerations are those who should not have been included in the census because they were not residents of the United States on Census Day, such as babies born after Census Day, people who died before Census Day, temporary visitors, and fabricated people. Second, there are duplicates of correct enumerations, representing people who appear more than once in the list of census enumerations. Duplicates can be repeat enumerations of the same individual at the same address, either as a result of the multiple opportunities for being enumerated in the census, or from an address being represented in more than one way on the Census Bureau’s Master Address File (MAF). Duplicates can also result from the inclusion of an individual at two different residences, possibly both of which are part-time residences. (We do not consider whole-person imputations or whole-household imputations, used either when an enumeration has less than two characteristics or when the number of persons living at a housing unit is estimated, to be a source of either duplications or erroneous enumerations, but rather to be a means for
producing counts that are as accurate as possible when aggregated to various levels of geography.)
People who were not included in the list of census enumerations but should have been are census omissions. Omissions can result from a missed address on the MAF, a missed housing unit in a multiunit residence in which other residences were enumerated, a missed individual in a household with other enumerated people, and people with no residence.
In addition to omissions, erroneous enumerations, and duplications, enumerations in the wrong location can also affect the accuracy of census counts. A count in the wrong location can result from (1) a misunderstanding of the census residence rules and the resulting reporting of someone in the wrong residence—for example, having an enumerator assign someone to the wrong choice from among several part-time residences, and (2) placing an address in the wrong census geographic location (called a geocoding error).
Furthermore, demographic errors, which occur when a person’s demographic characteristics are incorrectly reported or assigned and which can also result from an improper imputation of an individual’s demographic characteristics, can add error to census counts. For example, if someone’s age is misreported on the census form, this adds one tally in error to the count for one age group and subtracts one tally in error for another. However, this does not impact census counts that are not disaggregated by age group.
Erroneous enumerations and omissions contribute to errors in census counts for any geographic aggregate that includes the addresses of the persons involved with those errors. Whether or not errors in geographic or demographic characteristics result in errors in census counts depends on the level of demographic and geographic aggregation for which the census counts are used. The more detailed the geographic and demographic domain of interest, the greater the chance that errors in geographic and demographic detail will affect the quality of the associated counts. For example, placing a person in the wrong census tract but in the right county is not an error for census applications except when one uses census counts below the county level. However, placing someone in the wrong state affects most uses of census counts. Similarly, attributing someone to the wrong age group does not affect overall population counts at any level of geographic aggregation, but it will result in an error for counts by age group.
As touched on above, errors in census counts can result from missing information and the resulting use of imputation for item and unit nonresponse. For example, missing information on the total number of residents in a housing unit can result in imputation of this number, which can add to errors in census counts of the total population for areas containing that housing unit. As described in National Research Council (2004b: 128, Box 4.2) the 2000 census used item imputation, whole-person imputation, and four types of whole-household imputation to complete responses with missing information. The procedure used depended on which persons in a household were and were not census
data-defined. (A person’s enumeration was data-defined if there were at least two basic data items reported, including name as an item). Item imputation was used when all members of a household were data-defined, but some basic items were not reported. Whole person imputation was used when at least one member of a household was data-defined, but at least one other member was not. (Therefore, any enumeration that is not data-defined results in a whole-person imputation.) For the members of a household who were not data-defined, all basic information was imputed, using characteristics of other household members. Finally, four types of whole-household imputation were used, depending on whether the number of residents was known, the number of residents was not known but the occupancy status was known, the occupancy status was not known but it was known that the address was a housing unit, and finally it was not known whether the address was a housing unit.
As discussed in Chapter 3, the result of an individual imputed enumeration should not be considered to be correct or incorrect, but rather, one should assess an imputation algorithm based on its contributions to the bias and variance of estimated counts for various geographic areas and demographic groups. Therefore, whole-person and whole-household imputations can increase the errors in census counts for any demographic and geographic domains containing the people in question. Furthermore, imputation of characteristics can impact the quality of the counts for the associated demographic groups.
Two approaches have been taken to date to assess the overall (coverage) quality of census counts. One view is that census quality should be measured, separately by domain, by estimating the percentage net coverage error for each domain, for example, for each state. A second view is that census quality for a given domain should be measured instead by the percentage of census error—by census error we mean the totality of omissions, erroneous enumerations, duplications, and errors in the wrong location, with all errors receiving the same weight. This statistic is often referred to as the rate of gross census error.
As explained in Chapter 1, the Census Bureau is moving away from the view that the primary measure of census quality should be net error, because net error ignores errors of omission and erroneous enumeration and duplication that balance out for some levels of aggregation. On one hand, such errors that cancel at some level might contribute to error in measures at a lower level at which they do not cancel. On the other hand, the rate of gross census error is also deficient as a summary measure, in that many enumerations in the wrong location will affect only the more detailed aggregates. This argues for separate treatment of errors in the wrong location. Furthermore, since component coverage errors have partially distinct causes, it is important to separate the summaries of these various components so that their magnitudes can be assessed individually, rather than trying to place them into a single error measure. These last two points argue for separate measures of the four components of census coverage error: duplicates, erroneous enumerations, omissions, and enumerations in the wrong location. In addition, for errors in the wrong location, rather than a percentage error measure, which would be appropriate for omissions, erroneous enumerations, and duplications, a
summary assessment would require a representation of the distribution of the size of the geographic errors to assess which applications of the counts are likely to be affected by various magnitudes of errors.
Measures of component census error consistent with the above considerations will provide useful information in support of a feedback loop for identifying alternative census processes that are preferred to current ones. However, this does not mean that the Census Bureau should not also continue to provide estimates of net coverage error. Such measures still have importance since (1) they can be compared with previously published estimates for historical comparisons of census quality, (2) as discussed in Chapter 3, net error measures are needed for estimating census omissions, and (3) users find net error measures useful for evaluating the utility of estimates for some applications.
HOW CENSUS ERRORS ARE MEASURED
In this section we provide some additional detail concerning the two main approaches to coverage measurement that were outlined in Chapter 1: DSE and demographic analysis.
A detailed description of DSE, which has been used as the primary methodology for coverage measurement for the last three censuses, can be found in National Research Council (2004b: 159-163 and Chapter 6) and in U.S. Census Bureau (2003). A history of DSE can be found in National Research Council (1985: Chapter 4) and in Cohen (2000). We provide a brief outline here.
A postenumeration survey (PES) is conducted, following the census data collection in any given housing unit, although possibly partially overlapping in terms of the overall schedule. This is a survey of the residents in a sample of census block clusters, who are referred to collectively as the P-sample. The addresses in those blocks are listed independently of (that is, not using any information from) the Census Bureau’s MAF, which is the address list used to take the decennial census. Then the housing units at those addresses are interviewed to establish who was resident on Census Day. Additional information is also collected to support matching to the census and to assign the persons to poststrata, which are defined by demographic characteristics, as well as household and area characteristics. For example, mailback rates and whether someone is an owner or renter, along with demographic and other characteristics, are used to define poststrata. The characteristics used to define poststrata are those that have been associated with the propensity to be missed in past censuses. Given the heterogeneous coverage properties across the poststrata, the coverage measurement described here is carried out separately by poststrata.
The P-sample enumerations are then matched to the census enumerations to determine who in the P-sample was also counted in the census. Persons who failed to match to the census are reinterviewed, to determine the reason for the failure to match,
and to make any needed corrections due to discovered errors in the data collection or matching.
The census enumerations in the P-sample block clusters are referred to as the E-sample.The E-sample is used to determine the percentage of the census enumerations resident in the P-sample block clusters that are correct. This is accomplished by visiting the E-sample people who fail to match P-sample records to determine whether each individual was enumerated in the census, or whether they were enumerated in error.
The independence of the P-sample enumerations and the census enumerations is crucial to support the estimation of census undercoverage for the following reason. The fundamental relationship underlying this approach to the estimation of census net undercoverage is that, poststratum by poststratum, the following approximate equation should obtain:
M stands for the estimate of the number of P-sample persons who match with an E-sample person,
P stands for the estimate of the number of all valid P-sample persons,
C stands for the number of census enumerations, and
DSE stands for the dual-systems estimate of the total number of residents, that is, the estimated true count.
This approximate equation should hold because, ignoring some complications, the first ratio (M/P) is an estimate of the percentage of census enumerations in the subpopulation of P-sample enumerations, that is, an estimate of the census “capture ” rate within the P-sample population, and the second ratio (C/DSE) is an estimate of the percentage of census enumerations in the full population (all within some poststratum). If the P-sample selection and field measurement processes are independent of the census processes, and if the operational independence of the census and the PES also engenders statistical independence, then the fact of enumeration in the P-sample should provide no information as to whether a person was or was not enumerated in the census. Therefore, the subpopulation of P-sample enumerations should have the same underlying probability, conditional on poststratum, of being enumerated in the census as the full population. Given that, and temporarily ignoring erroneous enumerations, duplications, and whole-person imputations in the census, these two ratios should be approximately equal (except for sampling and other variation). The above relationship can be reexpressed as , that is, the estimate for the total population size is the product of the number of census enumerations times the number of P-sample enumerations, divided by the number of matches. The calculation of estimates within poststrata is motivated by the additional assumption that both the census and the PES
enumeration propensities are uncorrelated, which is supported by the homogeneity of coverage properties within poststrata. Failure of this assumption results in correlation bias.1
This derivation ignores the key role of the E-sample, which provides a needed correction to the above, given that a percentage of census enumerations are either duplicates, erroneous (including in the wrong location), or whole-person imputations and therefore not able to be matched to the P-sample. To address this, C, the census count, in the above formula is replaced by , where II represents the number of people lacking sufficient information for matching, CE represents an estimate of the number of E-sample persons correctly enumerated in the census, and E represents an estimate of the number of E-sample enumerations in the P-sample block clusters. (Note that E is a sample weighted quantity, whereas C is not). The number of people lacking sufficient information for matching, II, is subtracted from the census count since their match or correct enumeration status cannot be determined. The assumption is that their net coverage error is the same as that for the remaining census enumerations. The number of matchable persons, C − II , is multiplied by to estimate the percentage of matchable persons that are correct enumerations, that is, we multiply the matchable count by the percentage of correct census enumerations. The resulting DSE formula is
This derivation still ignores several additional nontrivial complications, including the treatment of other forms of missing data, the treatment of data from movers, and the precise area of search for matches of census enumerations outside the P-sample blocks. Other complications arise in Revisions I and II of Accuracy and Coverage Evaluation (A.C.E.) due, among other causes, to incorporation of information from the Matching Error Study, the Evaluation Follow-up Study, and the Person Duplication Study.
Problems with Dual-Systems Estimation
Both the decennial census data collection and the data collection for the PES inevitably involve errors and unit and item nonresponse. As a result, matching errors are made, quite likely more in the direction of false nonmatches than false matches. Furthermore, while the use of poststrata is intended to partition the population into subgroups that have relatively homogeneous propensities to be enumerated in the census (in order to reduce correlation bias), the poststrata are still not completely homogeneous.
Missing data complicate the application of DSE in the following ways. While erroneous enumerations in the E-sample that have sufficient information for matching are typically identified as erroneous (though there are cases for which correct enumeration status has to be imputed), as discussed above, erroneous enumerations that have insufficient information for matching (or are nondata-defined—see the discussion of what are called KEs in Chapter 3) are removed from the computation and are assumed to behave like the remaining E-sample enumerations in their poststratum through an implicit reweighting adjustment. This assumption can be examined using studies like the Evaluation Follow-Up Study in 2000. For duplicates, data-defined enumerations with name and date of birth in the E-sample within the P-sample block clusters are typically discovered, but until 2010, those duplicates outside the P-sample blocks were categorized as erroneous enumerations. These cases will now be identified as duplicates in 2010, assuming the national search for duplicates is implemented.
P-sample persons with sufficient information for matching that are missed in the census are typically correctly identified as census omissions. However, cases with insufficient information for matching are accounted for by giving additional weight to those cases with sufficient information that are similar on available characteristics thought to be predictive of match status. The validity of these weighting models can also be examined using such studies as the Evaluation Follow-Up Study in 2000. In previous censuses, when the corresponding census enumeration was located outside the P-sample block, a number of P-sample persons that were not omissions were designated as such, resulting in overestimation of the number of omissions. However, this should be addressed in 2010 with the implementation of the national search for matches. Finally, the number of P-sample persons missed both in the P-sample and the census is estimated assuming both independence of the two enumerations and homogeneity of enumeration propensities (the absence of which engenders correlation bias). However, since no data are collected for this group, it is unclear how well this group is estimated (although merged administrative records might be used for this purpose). The general expectation is that this group of census omissions is underestimated.
In addition, A.C.E. and its predecessors in 1980 and 1990 were not designed to distinguish among different types of census errors. An important limitation in this regard arises from the restriction of searches for E-sample matches to P-sample enumerations to be either in the P-sample block cluster or sometimes slightly outside in a (targeted) extended search area. Given this, a failure to match a P-sample enumeration to the census could result from any of several types of error, including (a) a person’s name and date of birth were captured with substantial error (possibly by the optical character recognition used in scanning the census form), (b) a housing unit was erroneously geocoded a few blocks outside the search area, or (c) the census enumeration was mistakenly located at a 3-month winter residence rather than at the 9-month residence during the remainder of the year. These situations are all represented as an omission in the census of the associated P-sample enumeration, along with an erroneous enumeration in the census either (a) at the correct residence, (b) a few blocks away, or (c) possibly hundreds of miles away. For some applications of census counts, these errors will cancel
each other, and for others they will not. For example, in counting the population for states, geocoding errors of short distances are unlikely to matter.
The local restriction of the search for matches in the 1980, 1990, and 2000 censuses was intended to overstate the number of census omissions and the number of erroneous enumerations by the same amount. If one is interested only in net error, the intention is that these errors would balance out, resulting in a zero net effect. (There is, however, an increase in the variance of the estimate of net coverage error due to the need for these random amounts to balance out for various domains.) Furthermore, various deficiencies in the operation of the field processes will cause the balance to be inexact even in expectation, leading to “balancing error.” However, for the new objective of assessment of census component coverage errors, the (necessarily) restricted search area results in a substantial increase in the estimated rates of omission and erroneous enumeration, much of which is due to counting someone in the wrong location, which may not be an error for many applications of census data.
Finally, errors in geography or demographics can also result in the placement of individuals in the wrong poststratum, which can also bias the estimation of net coverage error.
The Census Bureau has made substantial use of demographic analysis for several censuses, going back to 1940 (see for example Price, 1947; Coale, 1955; Coale and Zelnik, 1963; Coale and Rives, 1972). We present a short overview here; for a more detailed treatment relevant to the 2000 application, see Robinson (2001).
Demographic analysis makes use of the following “balancing equation” to estimate the population in an age group from historical data sources:
P = the population in the age group at the census date;
B = births (or the population at a previous census date);
D = deaths to the group occurring from the initial date to the census date;
I = immigration to the group occurring from the initial date to the census date; and
E = emigration from the group occurring from the initial date to the census date.
Given their high quality, Medicare enrollment data are now used to estimate the population over 65 without resorting to this accounting equation.2
Problems with Demographic Analysis
The logic of demographic analysis requires that the population estimate constructed from the basic demographic accounting relationship be comparable with the population measured by the census. As a result of demographic changes in the U.S. population over the past generation, many of the assumptions made by demographic analysis have become more problematic than they were for the censuses of 1940 through 1980. Specifically, immigration and emigration have become much more important sources of population growth. Also, as a result of immigration, intermarriage, and larger societal trends, the current definition and measurement of racial and ethnic groups have become less consistent with historical definitions in the data used to construct the demographic estimates.
While historical data on the numbers of births and deaths are of relatively high quality, data on international migration are more problematic. Estimates of the number of emigrants are subject to considerable variability; in addition, undocumented immigration has become as important numerically as legal immigration, but the available measures are not very exact. Demographic analysis has generally been restricted to national estimates of age, sex, and race groups, since the available measures of subnational migration are not sufficiently reliable to support production of estimates at the state or lower levels of geographic aggregation. Furthermore, because ethnicity has been captured in vital records on a national scale only since the 1980s, demographic analysis has not been used to estimate net undercoverage for Hispanics. Finally, with the introduction of multiple-race responses in the 2000 census, it has become necessary to map the census race categories into historical single-race categories (or vice versa) with the attendant introduction of additional variability into the demographic analysis estimates. This introduction of multiple-race responses is part of the growing complexity of racial classification, which is likely to increase discrepancies between birth certificate reporting and self-reporting of race by adults.
Having pointed out some of the deficiencies of demographic analysis, it is important to emphasize its continuing value in coverage measurement. Demographic analysis places the census results within the well-defined, consistent, and essentially tautological framework of demographic change. The realities of the balancing equation shown above place severe limits on certain results from other studies. Thus, for example, with the passage of one year, all living people get exactly one year older; or, for every boy baby born, there will be approximately one girl baby born. If the results from other coverage measurement studies give results outside the bounds implied by such demographic realities, the departures need to be explained. Some explanations may be demographic—for example, higher levels of immigration or emigration than included in the demographic estimate. But they may also point to statistical or measurement issues— for example, the persistent correlation bias that affects DSE measures of adult black men.
The current demographic analysis program at the Census Bureau also links the measures from the current census with past censuses back to 1940. Each matching study stands alone as a measure of the particular census, but the demographic analysis program
grounds its current results in the historical data series, so that it is possible to assess one census relative to others.3 This linkage can place limits not only on the demographic analysis measures but also on the plausibility of results from other studies.
Due to the relatively deterministic nature of estimates of net coverage error from demographic analysis, estimates of its error or uncertainty are difficult to justify. However, there have been a few attempts to provide error estimates, in particular Robinson, et al. (1993).
It is important to recognize that some results from demographic analysis are more robust than the overall results, so that they may be incorporated into a comprehensive coverage measurement program. While immigration has become an increasingly important component of population change, it has very little impact on the youngest age groups. Thus, it is essential that DSE results for children be consistent with demographic analysis results. Many population ratios, including sex ratios, are much less sensitive to assumptions about problematic components (such as undocumented immigration) than the measured population size. As a result, it may be possible to incorporate demographic analysis results into overall measures of coverage. In 2000, demographic analysis proved to be very useful in the coverage measurement program, notwithstanding the noted deficiencies. Demographic analysis provided an early indication that the initial estimates of the total U.S. population from A.C.E. may have been too high. Demographic analysis may yet provide input to correct for correlation bias in DSE. (See Bell, 1993, for more discussion of this.)
USES OF COVERAGE MEASUREMENT
Coverage measurement serves multiple, not fully complementary purposes. These are, not necessarily in order of importance: (1) assessment of coverage, (2) process improvement, and potentially, (3) adjustment.
Assessment of Coverage Quality
Careful, thorough assessment of the quality of a decennial census is extremely important. Census counts serve a variety of important purposes for the nation, including apportionment, legislative redistricting, fund allocation, governmental planning, and many private uses, such as for business planning. It is important for users of census data to know how accurate the counts are to determine how well they can support various applications. Given that the census could never count every resident exactly once and in the correct location, users need to be able to assess the extent to which the census falls short, the extent to which the accuracy of census coverage differs by location or by
demographic group, and the extent to which progress has been made in comparison to the previous census.
The total population count of the United States is probably the most visible output of a census, so one obvious measure of coverage accuracy for the census is the error in the count for the entire United States over all demographic groups. However, almost all uses of the census depend on population counts at various levels of geographic and demographic detail (notably racial/ethnic). Uses of the counts, such as for redistricting and local planning, depend on the accuracy of the population counts at detailed levels of geography and for some demographic detail as well. Furthermore, many uses of census data (e.g., apportionment, fund allocation) depend on the counts only in the form of proportional shares of the population. Given this, rates of net undercoverage by various geographic or demographic domains and the impact on population counts, population shares, or both matter a great deal to many users of census data. A key issue has been the differential net undercount of blacks and Hispanics, which has persisted over several decades (see, e.g., Ericksen et al., 1991).
While it is important to assess census coverage, it would also be extremely helpful to use that assessment to improve the quality of subsequent censuses. Consequently, a valuable use of coverage measurement is to help to identify sources of census coverage errors and to suggest alternative processes to reduce the frequency of those errors. Although drawing a link between census coverage errors and deficient census processes is a challenging task, the Census Bureau thinks that substantial progress can be made in this direction, since its objective going into the 2010 census is to use, to the extent feasible, the 2010 coverage measurement programs to help indicate the sources of common errors in the census counts. This information can then be used to allocate resources toward developing alternative census designs and processes that will provide counts with higher quality in 2020. It is conceivable that use of such a feedback loop could provide sufficient savings in census costs, in addition to improvement in census quality, to more than fund the census coverage measurement program. The panel fully supports this modification of the objectives of coverage measurement in 2010.
Consider, for example, the finding from demographic analysis of the 2000 census that there was a substantial undercount of young children relative to older children. Specifically, the net undercount rates (i.e., (DSE – C)/DSE, where DSE indicates the adjusted count, and C indicates the corresponding census count), by demographic group in 2000 based on the revised demographic analysis estimates (March 2001) were as follows (National Research Council, 2004b: Chapter 6):
One hypothesis is that this undercoverage was at least in part due to the imputation of age for those left off the census form in households exceeding six members (given that the 2000 census forms collected characteristics data for at most six household members). For households that reported more than six members, characteristics data for the additional members either were collected by phone interview (for households that provided a telephone number) or were imputed on the basis of characteristics of other household members and the responses for other households. The hypothesis is that these imputations systematically underrepresented young children since they were underrepresented in the pool of “donor” households.4
While demographic analysis can measure the undercoverage of this group, it cannot shed further light on the validity of this hypothesis. However, A.C.E. data are useful in this regard, because characteristics data were collected for most residents counted in the PES, and those data allow an assessment of the extent to which imputations in large households distorted the age distribution. Support of this hypothesis would imply a need to improve either collection or imputation of data (or both) for members of large households in 2010.
While coverage measurement results should be used in support of census improvement whenever possible, coverage measurement will not always uniquely determine a deficient census process. On one hand, if people, ages 18-21 have a duplication rate that is extremely high, one might surmise that it is at least partially due to the inclusion of college students in their parents’ households as well as at their household at college. Here the process in need of modification is clear. On the other hand, a housing unit might be placed in the wrong location for many reasons, including an incorrect address in the MAF, a geocoding error using the TIGER geographic database, or an incorrect address entered by the respondent on a Be Counted form.5 The extent to which coverage measurement programs can specifically discriminate between different sources of census errors depends on the situation.
Specific reasons for errors are more likely to be determined if the Census Bureau saves as much contextual information as possible from the 2010 census. It will need assessments of which individuals and households were enumerated in error, along with various characteristics of persons and households, and of the census processes that gave rise to each enumeration. In Chapter 4 we present some initial ideas on what data might be useful to save, and we plan to provide more specific guidance on what data to save in 2010 for this purpose in our final report. One possibility is to design a comprehensive master trace sample database (see National Research Council, 2004a: Chapter 8).
As we have pointed out, the 1999 Supreme Court decision (Department of Commerce v. United States House of Representatives, 525 U.S. 316) precluded the use of adjustment based on a sample survey for congressional apportionment, and time constraints strongly argue against the feasibility of using adjusted counts (based on a PES) for redistricting (see National Research Council, 2004a: p. 267). Furthermore, the current approach to adjustment estimation has a number of remaining complications that continue to present a challenge to the production of high-quality estimated counts, including the treatment of movers, matching errors, the treatment of missing data, and the heterogeneity remaining after poststratification of match and correct enumeration status (resulting in correlation bias).
In addition, the use of adjustment is also complicated by the multitude of numbers needed, since one needs adjusted counts at relatively low levels of demographic and geographic aggregation. A decision whether to use adjusted counts for any purpose must therefore rest on an assessment of the relative accuracy of the adjusted counts compared with the census counts at the relevant level of geographic and/or demographic aggregation. The Census Bureau’s decision not to adjust the redistricting data, due for release by April 1, 2011, was based on the difficulty of making this assessment within the required time frame.