National Academies Press: OpenBook

Coverage Measurement in the 2010 Census (2009)

Chapter: 2 Fundamentals of Coverage Measurement

« Previous: 1 Introduction
Suggested Citation:"2 Fundamentals of Coverage Measurement." National Research Council. 2009. Coverage Measurement in the 2010 Census. Washington, DC: The National Academies Press. doi: 10.17226/12524.
×
Page 15
Suggested Citation:"2 Fundamentals of Coverage Measurement." National Research Council. 2009. Coverage Measurement in the 2010 Census. Washington, DC: The National Academies Press. doi: 10.17226/12524.
×
Page 16
Suggested Citation:"2 Fundamentals of Coverage Measurement." National Research Council. 2009. Coverage Measurement in the 2010 Census. Washington, DC: The National Academies Press. doi: 10.17226/12524.
×
Page 17
Suggested Citation:"2 Fundamentals of Coverage Measurement." National Research Council. 2009. Coverage Measurement in the 2010 Census. Washington, DC: The National Academies Press. doi: 10.17226/12524.
×
Page 18
Suggested Citation:"2 Fundamentals of Coverage Measurement." National Research Council. 2009. Coverage Measurement in the 2010 Census. Washington, DC: The National Academies Press. doi: 10.17226/12524.
×
Page 19
Suggested Citation:"2 Fundamentals of Coverage Measurement." National Research Council. 2009. Coverage Measurement in the 2010 Census. Washington, DC: The National Academies Press. doi: 10.17226/12524.
×
Page 20
Suggested Citation:"2 Fundamentals of Coverage Measurement." National Research Council. 2009. Coverage Measurement in the 2010 Census. Washington, DC: The National Academies Press. doi: 10.17226/12524.
×
Page 21
Suggested Citation:"2 Fundamentals of Coverage Measurement." National Research Council. 2009. Coverage Measurement in the 2010 Census. Washington, DC: The National Academies Press. doi: 10.17226/12524.
×
Page 22
Suggested Citation:"2 Fundamentals of Coverage Measurement." National Research Council. 2009. Coverage Measurement in the 2010 Census. Washington, DC: The National Academies Press. doi: 10.17226/12524.
×
Page 23
Suggested Citation:"2 Fundamentals of Coverage Measurement." National Research Council. 2009. Coverage Measurement in the 2010 Census. Washington, DC: The National Academies Press. doi: 10.17226/12524.
×
Page 24
Suggested Citation:"2 Fundamentals of Coverage Measurement." National Research Council. 2009. Coverage Measurement in the 2010 Census. Washington, DC: The National Academies Press. doi: 10.17226/12524.
×
Page 25
Suggested Citation:"2 Fundamentals of Coverage Measurement." National Research Council. 2009. Coverage Measurement in the 2010 Census. Washington, DC: The National Academies Press. doi: 10.17226/12524.
×
Page 26
Suggested Citation:"2 Fundamentals of Coverage Measurement." National Research Council. 2009. Coverage Measurement in the 2010 Census. Washington, DC: The National Academies Press. doi: 10.17226/12524.
×
Page 27
Suggested Citation:"2 Fundamentals of Coverage Measurement." National Research Council. 2009. Coverage Measurement in the 2010 Census. Washington, DC: The National Academies Press. doi: 10.17226/12524.
×
Page 28
Suggested Citation:"2 Fundamentals of Coverage Measurement." National Research Council. 2009. Coverage Measurement in the 2010 Census. Washington, DC: The National Academies Press. doi: 10.17226/12524.
×
Page 29
Suggested Citation:"2 Fundamentals of Coverage Measurement." National Research Council. 2009. Coverage Measurement in the 2010 Census. Washington, DC: The National Academies Press. doi: 10.17226/12524.
×
Page 30
Suggested Citation:"2 Fundamentals of Coverage Measurement." National Research Council. 2009. Coverage Measurement in the 2010 Census. Washington, DC: The National Academies Press. doi: 10.17226/12524.
×
Page 31
Suggested Citation:"2 Fundamentals of Coverage Measurement." National Research Council. 2009. Coverage Measurement in the 2010 Census. Washington, DC: The National Academies Press. doi: 10.17226/12524.
×
Page 32
Suggested Citation:"2 Fundamentals of Coverage Measurement." National Research Council. 2009. Coverage Measurement in the 2010 Census. Washington, DC: The National Academies Press. doi: 10.17226/12524.
×
Page 33
Suggested Citation:"2 Fundamentals of Coverage Measurement." National Research Council. 2009. Coverage Measurement in the 2010 Census. Washington, DC: The National Academies Press. doi: 10.17226/12524.
×
Page 34
Suggested Citation:"2 Fundamentals of Coverage Measurement." National Research Council. 2009. Coverage Measurement in the 2010 Census. Washington, DC: The National Academies Press. doi: 10.17226/12524.
×
Page 35
Suggested Citation:"2 Fundamentals of Coverage Measurement." National Research Council. 2009. Coverage Measurement in the 2010 Census. Washington, DC: The National Academies Press. doi: 10.17226/12524.
×
Page 36
Suggested Citation:"2 Fundamentals of Coverage Measurement." National Research Council. 2009. Coverage Measurement in the 2010 Census. Washington, DC: The National Academies Press. doi: 10.17226/12524.
×
Page 37
Suggested Citation:"2 Fundamentals of Coverage Measurement." National Research Council. 2009. Coverage Measurement in the 2010 Census. Washington, DC: The National Academies Press. doi: 10.17226/12524.
×
Page 38
Suggested Citation:"2 Fundamentals of Coverage Measurement." National Research Council. 2009. Coverage Measurement in the 2010 Census. Washington, DC: The National Academies Press. doi: 10.17226/12524.
×
Page 39
Suggested Citation:"2 Fundamentals of Coverage Measurement." National Research Council. 2009. Coverage Measurement in the 2010 Census. Washington, DC: The National Academies Press. doi: 10.17226/12524.
×
Page 40
Suggested Citation:"2 Fundamentals of Coverage Measurement." National Research Council. 2009. Coverage Measurement in the 2010 Census. Washington, DC: The National Academies Press. doi: 10.17226/12524.
×
Page 41
Suggested Citation:"2 Fundamentals of Coverage Measurement." National Research Council. 2009. Coverage Measurement in the 2010 Census. Washington, DC: The National Academies Press. doi: 10.17226/12524.
×
Page 42
Suggested Citation:"2 Fundamentals of Coverage Measurement." National Research Council. 2009. Coverage Measurement in the 2010 Census. Washington, DC: The National Academies Press. doi: 10.17226/12524.
×
Page 43
Suggested Citation:"2 Fundamentals of Coverage Measurement." National Research Council. 2009. Coverage Measurement in the 2010 Census. Washington, DC: The National Academies Press. doi: 10.17226/12524.
×
Page 44
Suggested Citation:"2 Fundamentals of Coverage Measurement." National Research Council. 2009. Coverage Measurement in the 2010 Census. Washington, DC: The National Academies Press. doi: 10.17226/12524.
×
Page 45
Suggested Citation:"2 Fundamentals of Coverage Measurement." National Research Council. 2009. Coverage Measurement in the 2010 Census. Washington, DC: The National Academies Press. doi: 10.17226/12524.
×
Page 46
Suggested Citation:"2 Fundamentals of Coverage Measurement." National Research Council. 2009. Coverage Measurement in the 2010 Census. Washington, DC: The National Academies Press. doi: 10.17226/12524.
×
Page 47
Suggested Citation:"2 Fundamentals of Coverage Measurement." National Research Council. 2009. Coverage Measurement in the 2010 Census. Washington, DC: The National Academies Press. doi: 10.17226/12524.
×
Page 48
Suggested Citation:"2 Fundamentals of Coverage Measurement." National Research Council. 2009. Coverage Measurement in the 2010 Census. Washington, DC: The National Academies Press. doi: 10.17226/12524.
×
Page 49
Suggested Citation:"2 Fundamentals of Coverage Measurement." National Research Council. 2009. Coverage Measurement in the 2010 Census. Washington, DC: The National Academies Press. doi: 10.17226/12524.
×
Page 50
Suggested Citation:"2 Fundamentals of Coverage Measurement." National Research Council. 2009. Coverage Measurement in the 2010 Census. Washington, DC: The National Academies Press. doi: 10.17226/12524.
×
Page 51
Suggested Citation:"2 Fundamentals of Coverage Measurement." National Research Council. 2009. Coverage Measurement in the 2010 Census. Washington, DC: The National Academies Press. doi: 10.17226/12524.
×
Page 52
Suggested Citation:"2 Fundamentals of Coverage Measurement." National Research Council. 2009. Coverage Measurement in the 2010 Census. Washington, DC: The National Academies Press. doi: 10.17226/12524.
×
Page 53
Suggested Citation:"2 Fundamentals of Coverage Measurement." National Research Council. 2009. Coverage Measurement in the 2010 Census. Washington, DC: The National Academies Press. doi: 10.17226/12524.
×
Page 54

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

2 Fundamentals of Coverage Measurement The decennial census is used for a wide variety of purposes by fed­ eral, state, and local governments, by businesses, and by academe. How­ ever, the Constitutional goal of the census is to allocate the population to the states and local areas to support apportionment and congressional redistrict­ing. This use of the census counts makes determination of the correct location of enumerations especially important and also focuses attention on racial differentials. Further, due to racial segregation, dif­ ferential net undercoverage is likely to impact geographic differential undercoverage. Clearly, the broad goal of measuring the quality of the coverage of the census is to assess the extent of census coverage error by domain and by demographic group. Coverage measurement is a collection of techniques that measure the differences between census enumerations and the corresponding true counts for groups or areas. Coverage measurement is the quantitative aspect of coverage evaluation, which also encompasses more qualitative techniques, such as ethnographic observation. The differences between census counts and the corresponding true counts at the level of the indi­ vidual (or the household) are referred to collectively as census coverage errors, and in this chapter we categorize types of census coverage error and indicate methods that can be used for their summarization. We then detail the three primary (potential) uses of census coverage measure­ ment that rely on summarizations. Finally, we provide a brief overview of the methods that are currently used in the U.S. census for coverage measurement. 15

16 COVERAGE MEASUREMENT IN THE 2010 CENSUS TYPES OF CENSUS ERRORS There are two obvious ways in which the census count for an indi­ vidual can be in error: A person could be included in the census as an enu­ meration when he or she should have been omitted—an overcount—or the person could be omitted from the census when she or he should have been included—an undercount. In addition, since the primary applica­ tions of census counts are for apportionment of the states and redrawing of congressional districts, it is important that each individual be counted in their appropriate location. When a person is counted in other than the correct location, the effect of this error depends on both the distance between the recorded location and the true location and on the intended application of the counts (see below). Given that, we decided in this report to separately categorize under­ counts from overcounts, which are always errors regardless of the location of the enumeration, and those from enumeration errors that result from counts in the wrong location. This approach is not due to any sense that the latter errors are less important, but that they have different causes and therefore different solutions, and second that they are of different types as a result of the various degrees of displacement. This classification of census coverage error differs from the classi­ fication that has been typical up now. In that classification scheme, an overcount was any erroneously included enumeration, which included enumerations that were in the wrong location, regardless of whether the error was a few blocks or hundreds of miles. Similarly, an undercount was any erroneously omitted enumeration, which included enumerations that were in the census but were attributed to another (incorrect) location. As a result, in the previous scheme, an enumeration in the wrong location was represented as two errors: an overcount for the location that was recorded and an undercount at the correct location. The approach adopted here for classifying coverage error is consistent with a framework developed by Mulry and Kostanich (2006), which is described in Chapter 5. We now provide more detail on the nature and causes of these various types of census coverage error. Undercounts Omissions result from a missed address on the decennial census’ Master Address File (MAF), a missed housing unit in a multiunit resi­ dence in which other residences were enumerated, a missed individual in a household with other enumerated people, or people missed due to having no usual residence.

FUNDAMENTALS OF COVERAGE MEASUREMENT 17 Overcounts Overcounts result from including enumerations that should not have been included in the census and from counting people more than once. Enumerations that should not have been included in the census are for people who were not residents of the United States on Census Day, and includes those born after census day and those who died prior to census day; people in the United States temporarily; and enumerations of ficti­ tious people. As explained above, we restrict the term “erroneous enu­ merations” to those enumerations that should not have been included in the census anywhere at all, thereby excluding duplicates and those counted in the wrong location. Duplicates Duplicates can result from: (1) repeat enumerations of a subset of the individuals from a household, sometimes as a result of the multiple opportunities for being enumerated in the census; (2) an address being represented in more than one way on the MAF, resulting in the dupli­ cation of all residents; and (3) the inclusion of a person at two distinct residences, possibly both of which are part-time residences or because of a move shortly before or shortly after Census Day. Counts in the Wrong Location The two fundamental types of census coverage error, overcounts and omissions (undercounts), reduce the accuracy of the total count for the people in any geographic area that contains or should contain the indi­ vidual counted in error. In addition, as mentioned above, there can also be errors in the geographic location of an individual or an entire household, which can also impact the accuracy of census counts. Counting a person in the wrong location can result from a mis­ understanding of the census residence rules and the resulting reporting of someone in the wrong residence. This can result from having an enumera­ tor assign a person to the wrong choice from among several part-time residences or from the Census Bureau’s placing an address in the wrong census geographic location (called a geocoding error). ­Placing a person in the wrong geographic area will lower the count for the correct geographic area and raise it for the incorrectly designated area. Therefore, whether there is an effect on census accuracy depends on the distance between the correct and incorrect locations and on the summary tabulation in ques­ tion: the more detailed the tabulation is with respect to geography, or the greater the displacement, the greater the chance that geographic errors will affect the quality of the associated counts. Placing a person in the

18 COVERAGE MEASUREMENT IN THE 2010 CENSUS wrong location can therefore result in zero additional errors or two addi­ tional errors. (One additional error is also possible, by placing a duplicate enumeration in the wrong location.) Demographic Errors A similar outcome will occur when a person’s demographic charac­ teristics are recorded in error. This happens when a person is assigned to the wrong demographic group through a reporting error or through use of imputation of an individual’s demographic characteristics when those characteristics are not provided by the respondent. Again, placing a per­ son in the wrong demographic group will lower the count for the correct demographic group and raise it for the incorrectly designated demo­ graphic group. Whether this error has an effect on the decennial census counts depends on the aggregate of interest: as above, the more detailed the tabulation demographically, the greater the chance that demographic errors will affect the quality of the associated counts. Imputations In addition to census coverage errors that result from the data col­ lected in the census, there are also enumeration errors that result from the methods, typically imputation, that are used to address census non­ response. As mentioned in National Research Council (2004a), in addition to item imputation (which is used to address missing characteristics for so-called data-defined enumerations), there are five different degrees of “missingness” for the residents of a housing unit that can result in five different types of whole person or whole household imputation (count imputation). Imputation methods used to address whole household non­ response will often result in counts for a housing unit that do not agree with the true number of residents of that housing unit, and these dif­ ferences contribute to coverage error. However, we assert that the dis­ crepancies that result from the application of an imputation technique are not errors of either omission or overcoverage and therefore should not contribute to assessments of the magnitudes of the components of census coverage error. Whole household imputation is simply a means for producing counts that are as accurate as possible when aggregated for various domains of interest. Thus, the effectiveness of an imputation algorithm should be assessed by its aggregate performance (e.g., bias, variance, mean-square error for domains of interest) and should not be   We use the term “domain” to refer to any demographic or geographic aggregate of interest.

FUNDAMENTALS OF COVERAGE MEASUREMENT 19 considered as correct or incorrect at the level of the household. To sum up then, there are four basic types or components of census coverage error: omissions, duplicates, erroneous enumerations, and enumerations in the wrong location. COVERAGE ERROR METRICS FOR AGGREGATES Since census coverage errors can be positive (overcounts) or nega­ tive (undercounts), they can partially cancel each other out when census counts are aggregated over a domain. Specifically, the difference between the census count and the true count for a domain is equal to the number of overcounts minus the number of undercounts, plus the net from enu­ merations in the wrong location for the residents of the housing units in that domain. The net coverage error or the net undercount, defined as the difference between the census count and the true count for a domain, is therefore a useful assessment of the effect of census coverage error on an aggregate of interest. Net coverage error has two benefits: (1) it directly assesses the utility of census counts for aggregates of interest, and (2) it can be compared with previously published estimates of net coverage error for historical com­ parisons of census quality. Percent net undercount expresses net coverage error as a percentage of the true count and therefore facilitates comparison of the net coverage error between domains. Differential net undercount, the difference between the percentage net undercount for a specific domain and the percentage net undercount for another domain (or for the nation), is therefore a useful measure of the degree to which one domain is (net) undercounted relative to another. To be precise, let Ci be the census count for the ith domain, and let C+ be the census national total. Similarly, let Ti and T+ be, respectively, the true count for the ith domain and the true national total. Then the differential net undercount is Ci − Ti C+ − T+ Ci C+ − = − . Ti T+ Ti T+ Many uses of census data (e.g., apportionment and fund allocation) depend on census counts as proportional shares of the population, rather than as population counts, and in those situations a measure of the quality of the counts for a domain of interest is Ci Ti − . C+ T+ For comparison of the quality of two sets of estimated counts used as counts, a common yardstick is the sum of squared net errors over domains. When comparing the quality of two sets of estimated counts

20 COVERAGE MEASUREMENT IN THE 2010 CENSUS used as population shares, the error of the shares is again commonly sum­ marized, with errors as the difference between population shares and true shares, by adding squared errors over domains, but now weighted by the population size (Ti ), since otherwise one is equating a given error in popula­ tion shares for a small and a large domain. Specifically, the following loss function would be reasonable to use: 2  Ci Ti  ∑  C − T  Ti .   i + + Although it is clearly very useful, net census error, or net undercount, is an inappropriate summary assessment of census coverage error when the objective is census improvement because a substantial number of overcounts and undercounts may cancel each other for a given domain, which may obscure problems with census processes. Also, while these errors may balance each other for a given domain for a given census, they may not balance to the same extent either in more detailed aggregates or in subsequent censuses. To address this possible imbalance, some have argued for tabulating census gross error, which is the sum of the number of errors, overcounts, undercounts, and errors in the wrong location, relative to domains of interest. However, there are two problems with gross error as a summary measure of the quality of the census enumeration process. First, as noted above, enumerations in the wrong location will only matter when the degree of displacement and the tabulation in question are such that the displacement places someone in the wrong tabulation cell. Therefore, enu­ merations in the wrong location should not be interpreted as equivalent to overcounts or omissions. Furthermore, census coverage errors, which we classify as erroneous enumerations, duplicates, omissions, and counts in the wrong place, all have somewhat different causes. Given the cur­ rent objective of supporting a feedback loop for census improvement, it is important to separate out the summaries of these various components so that their magnitudes can be assessed individually, rather than trying to aggregate them into a single error measure. Second, for counts in the wrong location, rather than a percentage error measure—which is an appropriate summary measure for omissions, erroneous enumerations, and duplications—a more useful summary assessment would provide a representation of the frequency of enumerations in the wrong location as a function of some representation of the degree of the displacement (so that location error rates would diminish as the displacement increases). This approach would facilitate the assessment of the degree to which errors from enumerations in the wrong place effect various applications of the counts.

FUNDAMENTALS OF COVERAGE MEASUREMENT 21 The term components of (census coverage) error communicates this idea of separating out the enumeration errors into these categories of duplica­ tions, erroneous enumerations, omissions, and geographic errors so that their individual causes can be better analyzed. For sake of completeness, we again mention that there are also errors in counts that are attributable to errors in a person’s demographic charac­ teristics, and there can, of course, also be errors in a person’s other char­ acteristics, for example, whether the residents own or rent their housing units. These are ignored in this discussion, though errors in characteristics used to model net coverage error can negatively affect its estimation, and it is therefore important to reduce the frequency of such errors. Whether one uses net coverage error or rates of components of census coverage error to represent the quality of the census counts for a domain clearly depends on the analysis that one has in mind. To support as much flexibility in summarization and analysis as possible, information on cen­ sus coverage error needs to be retained at as basic a level as possible, in addition to the summary tabulations that the Census Bureau provides. This retention would have two advantages. First, it would permit a more precise assessment of the effect of census errors on any specific applica­ tion of the counts. For instance, one could assess the impact of omissions (ignoring the extent to which they are offset by overcoverage errors) on a specific domain of interest that is not provided in the standard Census Bureau tabulations from the coverage measurement program. Second, and more importantly, retention of information on census coverage error at the level of the individual allows for the examination of (causal) asso­ ciations using statistical models that relate whether a coverage error was or was not made as a function of the census enumeration processes used and individual and housing unit characteristics. Such an analysis could also include correlates of whole-household omissions, correlates of omis­ sion errors that only affected some residents of a household, correlates of whole-household duplications, correlates of partial-household duplica­ tions, or correlates of the coverage error of counting individuals in the wrong place (for various degrees of misplacement). In sum, there are various components of census error that have vari­ ous applications, and there is therefore a need for access to those errors at an individual level and to link those errors to potential causal factors to support various descriptive and analytic needs. By “descriptive” we mean summary assessments of the quality of census counts for domains; by “analytic” we mean the development of statistical models that attempt to discriminate between individuals and households that are and are not counted in error.

22 COVERAGE MEASUREMENT IN THE 2010 CENSUS PURPOSES Coverage measurement has historically served multiple purposes. Since its earliest inception in the 1950 census, it has had the goal of evaluating the accuracy of census counts for geographic and demographic domains, with a focus on assessing net error for domains. The primary goal was to inform users as to the quality of the census counts for various applications. In addition, but to a much lesser extent, coverage measure­ ment has also been used to provide information relevant to developing a better understanding of census process inadequacies, leading to improve­ ments in design for the subsequent census. The estimation of net error has also raised the possibility of provid­ ing alternative counts for use in formal applications, known as census adjustment. We know of only one formal use of adjusted census counts to date, namely, the use of adjusted counts to modify population controls used for the Current Population Survey (CPS), the National Health Inter­ view Survey, the National Crime Victimization Survey, and the Survey of Income and Program Participation during the 1990s, which in turn affected the estimate of the number of people unemployed during the 1990–2000 intercensal period by the Bureau of Labor Statistics. However, the primary focus of coverage measurement in both 1990 and in 2000 was to produce adjusted census counts for official purposes, assuming that it could be demonstrated that the adjusted counts would be preferred to the unadjusted census counts for apportionment and redistricting. The stated Census Bureau plan that the primary purpose of the cover­ age measurement program in 2010 would be to measure the components of census coverage error in order to initiate a feedback loop for census process improvement is a substantial innovation. An interesting ques­ tion is the extent to which a coverage measurement program can be used for this purpose, and a major charge to this panel was to determine the extent to which this new focus of coverage measurement should affect the design of the coverage measurement program and the resulting output and analyses. Evaluation of the Accuracy of the Census Counts Census counts serve a variety of important purposes for the nation, including apportionment, legislative redistricting, fund allocation, gov­ ernmental planning, and support of many private uses, such as business planning. Users of census data need to know how accurate the counts are in order to determine how well they can support these various applica­ tions. The needed information includes an understanding of the extent to which the accuracy of census counts differ by location or by demographic

FUNDAMENTALS OF COVERAGE MEASUREMENT 23 group and the extent to which accuracy has improved from one census to the next. The total population count of the United States is probably the most visible output of a census, so one obvious measure of coverage accu­ racy for the census is the error in the count for the entire United States over all demographic groups. However, essentially all applications of the census—e.g., redistricting and local planning—use population counts at various levels of geographic and demographic detail. Consequently, it is important to assess the rates of net undercoverage by various geographic or demographic domains. Historically, a key issue has been, and remains, the differential net undercount of blacks, Hispanics, and Native Americans, which has resulted in the repeated underrepresentation of areas in which those groups make up a large fraction of the residents. In particular, the dif­ ferential net undercount of these groups has led to their receiving less than their share of federal funds and political representation (see, e.g., Ericksen et al., 1991, for more details). Given this, it is as important as ever for the Census Bureau, in evaluating possible alternative designs for the decennial census, to not only assess the likely impacts on the frequency of components of census coverage error, but also to assess the impacts on differential net coverage error for historically undercounted minority groups. Census Adjustment The 1999 Supreme Court decision (Department of Commerce v. United States House of Representatives, 525 U.S. 316) precluded the use of adjust­ ment based on a sample survey for congressional apportionment. In addition, the Census Bureau concluded that time constraints currently preclude the computation and evaluation of adjusted counts (based on a postenumeration survey) by April 1 the year after a census year, therefore preventing the use of adjusted counts for purposes of redistricting (see National Research Council, 2004a:267). Furthermore, the current approach to adjustment has a number of complications that continue to present a challenge to the production of high-quality estimated counts, including the quality of the data for ­movers (often missing or collected by proxy), matching errors, the treatment of missing data for nonmovers, the estimation of the number missed by both the census and the postenumeration survey, and the ­heterogeneity remaining after the use of poststratification of the match rate and the cor­ rect enumeration rate (resulting in correlation bias). This last objection will be reduced, but not eliminated, with the likely shift to the use of logis­ tic regression instead of poststratification in 2010 (discussed below).

24 COVERAGE MEASUREMENT IN THE 2010 CENSUS In addition, the use of adjustment is complicated since for some important applications one needs adjusted counts at low levels of demo­ graphic and geographic aggregation, and a sample survey, by design, is intended to make estimates at more aggregate levels. A decision whether to use adjusted counts for any purpose must therefore rest on an assess­ ment of the relative accuracy of the adjusted counts compared with the census counts at the needed level of geographic or demographic aggrega­ tion. One key issue that depends on the application is whether to base this assessment on population shares or population magnitudes. The Census Bureau’s decision not to adjust the redistricting data for the 2010 census, due for release by April 1, 2011, was based on the difficulty of making this assessment within the required time frame. Census Process Improvement Although it is important to assess census coverage, it would also be extremely helpful to use that assessment to improve the quality of sub­ sequent censuses. Consequently, an important use of coverage measure­ ment is to help to identify important sources of census coverage errors and possibly to suggest alternative processes to reduce the frequency of those errors in the future. Although drawing a link between census coverage errors and deficient census processes is a challenging task, the Census Bureau thinks that substantial progress can be made in this direc­ tion. Therefore, the 2010 coverage measurement program has the goal of identifying the sources of frequent coverage error in the census counts. This information can then be used to allocate resources toward develop­ ing alternative census designs and processes that will provide counts with higher quality in 2020. It is conceivable that use of such a feedback loop could also provide substantial savings in census costs, in addition to improvement in census quality because the tradeoff between the effect on accuracy and on census process costs might now be better understood. The panel fully supports this modification of the objectives of coverage measurement in 2010. To see the value of this shift in the objective of coverage measurement, consider, for example, the findings from demographic analysis for the 2000 census, which showed that there was a substantial undercount of young children relative to older children. Specifically, Table 2-1 shows the net undercount rates—(DSE – C)/DSE, where DSE indicates the adjusted count, and C indicates the corresponding census count)—by demographic group in 2000 on the basis of the revised demographic analysis estimates (March 2001) (see National Research Council, 2004b:Chapter 6): One hypothesis is that the undercoverage for children aged 10 and under was at least in part due to the imputation of age for those left off the census

FUNDAMENTALS OF COVERAGE MEASUREMENT 25 TABLE 2-1  Net Undercount in 2000 Age Group Demographic Group 0–10 10–17 Black male 3.26 –1.88 Black female 3.60 –1.20 Nonblack male 2.18 –2.01 Nonblack female 2.59 –1.55 NOTE: The undercount is as measured by demographic analysis. SOURCE: Data from National Research Council (2004b:Chapter 6). form in households exceeding six members; this hypothesis is examined in Keller (2006). The 2000 census forms only collected characteristics data for up to six household members. For households that reported more than six members, characteristics data for the additional members either were collected by phone interview (for households that provided a telephone number) or were imputed on the basis of the characteristics of other house­ hold members and the responses for other households. The ­hypothesis is that these imputations systematically underrepresented young children since they were underrepresented in the pool of “donor” households.  Although demographic analysis can measure the net undercoverage of these groups, it cannot currently shed further light on the validity of this hypothesis. Data from a postenumeration survey might be useful in this regard, because characteristics data are collected for most of the respon­ dents of the postenumeration survey, and those data would likely allow an assessment of the extent to which imputations in large households dis­ torted the age distribution. Potential alternatives that could be considered for the 2010 census include changes to the collection of data for members of large households and improved imputation techniques. The panel is optimistic that the use of coverage measurement can strongly support the improvement of census methods, but the operation of this feedback loop will not be straightforward. Coverage measurement results will sometimes provide strong indications of the likely source of some errors; for other errors, the source will often remain unclear. An example of the former is for people aged 18–21, who have a duplication rate that is extremely high: One might surmise that it is at least partly   We note that even if it were determined that increasing this limit from six to seven would reduce the rate of omission of young children in large households, other considerations involving the rate of nonresponse and the quality of the collected information would have to be evaluated before making such a change.

26 COVERAGE MEASUREMENT IN THE 2010 CENSUS due to the inclusion of college students in their parents’ households as well as at college. For this situation, the process in need of modification is clear. In contrast, a housing unit might be placed in the wrong location for many reasons, including erroneous coordinates for a MAF spot or a g ­ eocoding error using TIGER (the topologically integrated geographic encoding and reference database) and it may be difficult to specify the cause of the error. Many times such difficulties can be at least partly resolved through more detailed data analysis, assuming that additional information on the process history for the addresses in question is retained. Therefore, in moving toward the goal of process improvement, it is extremely important for the Census Bureau to save as much informa­ tion on the procedural history for each housing unit and each individual within each housing unit and as much contextual information as possible, to develop useful statistical models linking enumeration errors to their possible causes. One possibility is to develop a comprehensive master trace sample database (see National Research Council, 2004a:Chapter 8) that is directly linked to the coverage measurement sample. However, it is important to accept that there will always be limits to the attribution of errors to specific origins and therefore to the function­ ing of such a feedback loop. In particular, determining which alternative processes would best address a recognized deficiency would remain a challenge. For example, knowing that a geocoding error was the result of an error in TIGER does not necessarily tell one how to improve TIGER in a cost-­effective way to eliminate that type of error. There are limitations to how well the feedback loop on census improvement can operate. In C ­ hapter 5 we present some initial ideas on what data might be useful to save. In sum, the Census Bureau has made an important shift in its focus for coverage measurement in 2010 from that of estimating net coverage error, potentially in support of census adjustment, to that of developing portions of a feedback loop for census improvement. However, as can be seen in detail in subsequent chapters, there remain vestiges of the previ­ ous goals in the design of and the outputs produced for the coverage measurement program in 2010. They include the sample design for the postenumeration survey in 2010, the current focus on the release of census tabulations as the main products of census coverage measurement rather than analytic uses of the collected data and the continued high priority of statistical models for net coverage error (the logistic regression modeling) in coverage measurement research. Recommendation 1: The Census Bureau should more completely shift its focus in coverage measurement from that of collecting data

FUNDAMENTALS OF COVERAGE MEASUREMENT 27 and developing statistical models with the objective of estimating net coverage error to that of collecting data and developing ­statistical models that support the improvement of census processes. In order to ensure that the variety of issues identified in this report are addressed in support of improvements to the 2020 census design, it will be critical to have a team of high-quality researchers exclusively devoted to intercensal research on decennial census improvement and for this research program to be protected from year-to-year funding fluctuations and pressures. The activities of such a group would be focused on analyz­ ing the data collected from the census, the census coverage measurement program, and the various predictors discussed below. Recommendation 2: The Census Bureau should allocate sufficient resources, including funding and staff, to assemble and support an ongoing intercensal research program on decennial census improve- ment. Such a group should focus on using the data from the census and the census coverage measurement programs to identify defi- cient census processes and to propose better alternatives. The work of this group should be used to help design the census tests early in the next decade. DESCRIPTION AND History There are two primary methods that have been used for coverage measurement of the census: dual-systems estimation (DSE), supported by a postenumeration survey, and demographic analysis. To keep this docu­ ment self-contained, we provide a brief description of these techniques and their history of use. For a more detailed description of DSE, see National Research Council (2004b:159–163, Chapter 6), and U.S. Census Bureau (2003). For a history of the U.S. census coverage measurement programs from 1950 through 1980, see National Research Council (1985: Chapter 4). Dual-Systems Estimation DSE uses both the data from the census and an additional enumera­ tion to estimate the amount of net undercoverage in the census. The addi­ tional enumeration is typically a postenumeration survey (PES), which is a survey of the residents in a sample of census block clusters, who are referred to collectively as the P-sample. The PES, as its name indicates, is conducted after the main part of the census is completed on a hous­ ing unit by housing unit basis. In some cases, the census may still be

28 COVERAGE MEASUREMENT IN THE 2010 CENSUS o ­ ngoing for some households while the PES has initiated data collection for others. The first step of a PES taking place, in recent times about a year prior to census day, is that the addresses in the PES blocks are independently listed, using no information from the MAF. (However, information on the differences between MAF counts and PES listing counts could be used in the sample design of the PES, since blocks in which the independent list­ ing differed greatly from the count from the MAF might have been subject to a lot of recent construction or other recent dynamics that might have been a challenge to the census enumeration.) After the census is completed, interviewers visit the P-sample hous­ ing units to establish which people were residents on census day. Addi­ tional information is also collected to support matching the P-sample results to the census and either to assign the persons to poststrata (in the 1980, 1990, and 2000 coverage measurement programs) or to provide predictors for statistical models to estimate the total population size (see below). This additional information includes demographic and household characteristics and some area characteristics. For example, mailback rates and whether someone is an owner or renter, along with demographic characteristics, were used to define the poststrata in previous censuses. The purpose of the poststrata is to partition the P-sample into more homo­ geneous groups in terms of their coverage properties, so that when the coverage measurement is carried out separately by poststrata, the result is a reduction in a type of bias referred to as correlation bias (see below). Once this initial data collection has been concluded, the P-sample enumerations are matched to the census enumerations to determine who in the P-sample was also counted in the census. People who cannot be matched to the census are reinterviewed to make any needed corrections due to discovered errors either in the data collection or the matching. Estimation of Net Coverage Error We start from the implementation of DSE in the 1980 and 1990 cen­ suses, which formally used the construct of a 2 × 2 contingency table as shown in Table 2-2: • M is the estimate of the number of P-sample persons who match to a census enumeration (within the defined search area) in a post­ stratum, which will typically be a person in the E sample (the census enumerations in the P-sample block clusters,   Poststrata are broad population groups defined by demographic or other characteristics.   For a more comprehensive treatment of this subject, see Mulry and Spencer (1991).

FUNDAMENTALS OF COVERAGE MEASUREMENT 29 TABLE 2-2  Diagram of the 2 × 2 Contingency Table Underlying Dual-Systems Estimation Counted in PES Missed in PES Total Counted in Census M C Missed in Census Fourth Cell Total P DSE NOTE: See text for discussion. • P is the estimate of the number of all valid P-sample persons within a poststratum, • C is the number of census enumerations in a poststratum, and • DSE is the dual-systems estimate of the total number of residents within a poststratum, in other words the estimated true count. The cell designated as the fourth cell is the only shaded cell in the above table that cannot either be directly measured or measured by subtraction using other directly estimated quantities, and it therefore has to be mod­ eled and is assumption dependent, the assumption being that correla­ tion bias, defined below, is acceptably small. It makes sense that this cell would be the problematic one since it counts those missed by both the census and the PES enumeration processes. We assume initially for purposes of exposition that there are only omissions and therefore no sources of overcoverage. In addition, we assume that there are no missing data. Then the estimation of census undercoverage using DSE relies on two additional assumptions: (1) that the P-sample enumerations and the census enumerations are independent events, and (2) that all individuals within a poststratum have equal enu­ meration propensities. When these assumptions obtain to a reasonable extent, the following approximate equation will hold: M C (1) ≅ . P DSE The above approximate equation can also be expressed as C⋅P DSE ≅ , M

30 COVERAGE MEASUREMENT IN THE 2010 CENSUS which suggests that one can estimate the total population size by taking the product of the number of census enumerations and the number of P-sample enumerations, divided by the number of matches. The argument in support of the above approximate equation is as f ­ ollows. Again ignoring some complications (discussed below), in a given poststratum, the first ratio in (1), (M/P), is an estimate of the percentage of census enumerations in the P-sample population, that is, an estimate of the census “capture” rate in the P-sample population (which, again, is assumed to be constant). The second ratio in (1), (C/DSE), is an estimate of the percentage of census enumerations in the full population, again an estimate of the census “capture” rate, but this time in the full population. If the P-sample selection and field data collection processes are indepen­ dent of the census processes, and if the operational independence of the census and the PES also engenders statistical independence, then the fact of P-sample membership should provide no information as to whether a person was or was not enumerated in the census. Therefore, the popula­ tion of P-sample enumerations should have the same underlying prob­ ability, conditional on poststratum membership, of being enumerated in the census as for the full population. Given that, these two ratios should be approximately equal. As mentioned, the above argument assumes that each individual has the same chance of being enumerated in the census. However, the census and the P-sample enumeration probabilities are likely to be heteroge­ neous, i.e., dependent on various characteristics of the housing units and their residents. As a result, a bias (correlation bias) in the dual-systems estimate occurs, and its magnitude is a function of the degree to which the individual census enumeration propensities and the individual PES enumeration propensities are correlated. This correlation is small when either the census or the PES enumeration frequencies are relatively con­ stant; therefore, if DSE is restricted to poststrata in which the people are relatively homogeneous with respect to their enumeration propensi­ ties, correlation bias will be relatively minor (to see how enumeration heterogeneity can be related to dependence and therefore correlation bias, see Box 2-1). To address the complication of overcounts requires an additional data collection, which amounts to checking on the validity of the census enumerations. The E-sample, the census enumerations in the P-sample block clusters, is used to estimate the percentage of the census enumerations that are correct and not duplicated. This validation opera­ tion is carried out by visiting the E-sample people who fail to match to the P-sample to determine whether they were enumerated in error or in   For a more detailed exposition of the errors in dual-systems estimation, see Alho and Spencer (2005:Chapter 10).

FUNDAMENTALS OF COVERAGE MEASUREMENT 31 BOX 2-1 Example of the Relationship Between Enumeration Heterogeneity and Correlation Bias To see how enumeration heterogeneity can be related to dependence and therefore correlation bias, consider three 2 × 2 tables: one for Area A, one for Area B, and the third the cell-by-cell sum of the first two. Area A Census In Census Out Total PES In 25 15   40 PES Out 15  9   24 Total 40 24   64 Area B Census In Census Out Total PES In 24  3   27 PES Out  8  1    9 Total 32  4   36 Area A and Area B Census In Census Out Total PES In 49 18   67 PES Out 23 10   33 Total 72 28 100 The first two tables represent independence of the enumeration in the PES and the census. This is because, in each case, the fourth cell count is equal to the product of the probability of a census omission and the probability of a PES omis- sion multiplied by the total number of individuals. For Area A, this is the equality:  15   15   3  8  9 =     64, and for Area B, this is the equality: 1 =     36. However,  40   40   27   32   18   23  the equality does not hold for the combined area, since: 10 ≠ 8.6 =     100.  67   72  So capture heterogeneity between Areas A and B has resulted in correlation bias for the combined area.

32 COVERAGE MEASUREMENT IN THE 2010 CENSUS the wrong place and by matching the E-sample records to each other to check for duplicates. (The E-sample cases that match to the P-sample are validated as correct enumerations.) This field operation will be augmented in 2010 with national com­ puter matching to identify duplicate enumerations that are more remote geographically than those which jointly reside in the P-sample block cluster. This plan represents an important improvement when the goal is census improvement, since without it a PES cannot distinguish between a duplicate pair and a correct enumeration and another enumeration in the wrong place when one of the pair resides outside the P-sample blocks. In sum, the E-sample and the associated field work (and, soon, national matching) provide estimates of the percentage of people counted incor­ rectly in the census, the percentage of those counted more than once, and the percentage of those counted in the wrong place, which are used to adjust the above approach to accommodate overcoverage. Finally, there are census enumerations that cannot be matched to the P-sample enumerations because they are not data defined. (For the spe­ cific definition of enumerations that are not data defined, see Chapter 4.) Since without being able to match, one cannot determine whether or not those individuals were or were not counted in the P-sample, one cannot determine whether the associated P-sample people were census omissions or correct census enumerations. These cases also need to be accommodated in DSE. Using both the E-sample-based estimates of the rate of correct enumer­ ation and the number of census enumerations that are not data defined, one can compute the following modification of the above dual-systems estimate. C, the census count in the above formula is now replaced by (C − II)  CE  ,  E    where II represents the number of non-data-defined enumerations, CE represents the number of (survey weighted) E-sample persons correctly enumerated in the census, and E represents the (survey weighted) esti­ mate of C – II. The number of people that are not data defined, II, is sub­ tracted from the census count since their match or correct enumeration status cannot be determined. The implicit assumption is that their net c ­ overage error is the same as that for the remaining census enumera­   This discussion ignores the additional complication raised by the treatment of any late census additions, which are census enumerations that are too late to be included in the dual-systems computations.   can be interpreted to be the number of cases with insufficient information for matching, II but as discussed in Chapter 4, many of those cases will be matched to the census in 2010.

FUNDAMENTALS OF COVERAGE MEASUREMENT 33 tions (an assumption that may not be justified). Therefore, the number of matchable persons, C – II, is multiplied by CE/E, the percent of cor­ rect census enumerations, to estimate the number of data-defined census enumerations that are correct census enumerations. The resulting DSE formula is  CE   P  DSE = ( C − II)  . (2)  E   M   This derivation still ignores several additional nontrivial complica­ tions, including the treatment of other forms of missing data, the treat­ ment of data from movers, and the precise area of search for matches of census enumerations outside the P-sample blocks. Additional complica­ tions arose in computing various revisions of the original estimates of undercoverage in the 2000 coverage measurement program (the Accuracy and Coverage Evaluation Program [A.C.E.]), due to the incorporation of information from various follow-up studies, though that is nonstandard and would not typically factor into the estimates of net coverage error. A description of the effect of these additional complications on the formulas used to produce the 2000 undercoverage estimates can be found in U.S. Census Bureau (2003). In 1980 and 1990, the corrections generated by the E-sample were implemented by subtracting the number of cases insufficient for match­ ing, duplicates, and erroneous enumerations directly from the census total, C, prior to use of formula (1). In the coverage measurement pro­ gram used in 2000 and in the program planned for 2010, the estimation is no longer based on a 2 × 2 contingency table, but instead, the objective has become simply to provide estimates of the fractions CE/E and P/M for input into the formula (2). The estimates of population totals that result, assuming the use of poststrata, exist at the level of the individual poststrata. As noted above, these are groups of people that have similar characteristics related to enumeration propensity. For example, all have similar demographic characteristics, all are either renters or owners, have the same mail-back rate, and may share some high-level geography (e.g., the same census region). Given the minimal representation of geography in poststrata, it is the case that DSE provides very little information about where people are missed. What is needed for many applications, rather than estimates of the net coverage error for poststrata, are estimates of the population for small political jurisdictions that are much smaller than the geographic level of the poststrata. In the 1990 and 2000 censuses, synthetic estimation was used for these estimates.

34 COVERAGE MEASUREMENT IN THE 2010 CENSUS The basic idea of synthetic estimation is that the coverage correction factors,  C − II   CE   P   C   E   M ,     which are simply the dual-system estimates divided by the corresponding census counts at the level of the poststrata, are used to weight the cen­ sus counts for any defined subpopulation of individuals within a given poststratum in order to produce an adjusted count for that ­specific sub­ population within that poststratum. Examples of subpopulations might be people of a specific geographic jurisdiction, or those from a specific demographic age-sex-race combination, or people belonging to both sub­ populations. To arrive at an adjusted count for a geographic area of inter­ est, one would produce the adjusted count for the geographic area of interest for all relevant poststrata, and then add those estimates together. An example of synthetic estimation is shown in Box 2-2. Unresolved Aspects of Dual-Systems Estimation The success of DSE depends to a great extent on the quality of the data collected, especially as to an individual’s correct census residence. Both the decennial census data collection and the data collection for the PES inevitably involve misresponse and nonresponse. Misresponse, which might stem from a misunderstanding of census residence rules, can cause errors in determining both match status and correct enumeration status. The matching errors that are made are quite likely more in the direction of false nonmatches than false matches. Missing data complicate the application of DSE in the following ways. Although erroneous enumerations in the E-sample that have suf­ ficient information for matching are typically identified as erroneous (though there are cases for which correct enumeration status has to be imputed), erroneous enumerations (and correct enumerations) that have insufficient information for matching are removed from the computation and are assumed to behave like the remaining E-sample enumerations in their poststratum through an implicit reweighting adjustment. The extent to which this assumption is valid can be examined using follow- up studies. P-sample persons with sufficient information for matching that are missed in the census are typically correctly identified as nonmatches. However, P-sample cases with insufficient information for matching are accounted for by giving additional weight to those cases with sufficient information that are similar on available characteristics thought to be

FUNDAMENTALS OF COVERAGE MEASUREMENT 35 predictive of match status. The validity of these weighting (or possibly imputation) models can also be examined using follow-up studies. As described above, the number of persons missed in both the P-sample and the census is estimated based on the assumption of independence of the two enumeration processes and of homogeneity of the enumeration propensities in poststrata (the absence of which engenders correlation bias). Although the use of poststrata is intended to partition the population into subgroups that have relatively homogeneous enumeration propensi­ ties, the poststrata are not completely homogeneous. Since no data are collected for this “fourth cell,” it is unclear how well this subpopulation is estimated; merged administrative records might be used for this purpose. The general expectation is that this category of census omissions is under­ estimated. The difficulty in achieving homogeneity not only reduces the quality of the estimation of the fourth cell, but it also reduces the quality of any small-area estimates of net coverage error produced using synthetic estimation. For duplicates, data-defined enumerations with name and date of birth in the E-sample in the P-sample block clusters are typically discov­ ered, but duplicates outside the P-sample blocks were either undiscovered and erroneously treated as nonduplicate correct census enumerations or were categorized as erroneous enumerations. Assuming the national search for duplicates is implemented, these cases in 2010 will now have a chance of being identified as duplicates. A limited search for matches also raises other difficulties. In previous censuses, when the corresponding census enumeration was located out­ side the P-sample block due to geocoding errors or other reasons, P-sample people that were not census omissions were designated as such. The fre­ quency of this situation also should be reduced in 2010 with the imple­ mentation of the national search for matches. In the 1980, 1990, and 2000 censuses, the local restriction of the search for matches was designed to overstate the number of census omissions and the number of erroneous enumerations by the same amount. With a focus on estimation of net cov­ erage error, the intention was that these errors would balance out, result­ ing in a zero net effect. However, various deficiencies in the operation of the field processes may have upset this balance, leading to “balancing error.” With the new objective of estimating the frequency of the various components of census coverage error, relying on a restricted search area for matches would result in a substantial increase in the estimated rates of omission and erroneous enumeration, much of which would be due instead to counting someone in the wrong location. Therefore, the current   There is, however, an increase in the variance of the estimate of net coverage error due to the need for these random amounts to balance out for various domains.

36 COVERAGE MEASUREMENT IN THE 2010 CENSUS BOX 2-2 Example of Synthetic Estimation We present here a numerical example of synthetic estimation. Consider first the following fictional census count data: Demo Demo Demo Area Group I Group II Group III Total Area I 200 100 100 400 Area II 300 50 50 400 Area III 100 150 50 300 Total 600 300 200 1,100 Adjusted Total 800 300 300 1,400 Coverage Correction Factor 1.33 1.00 1.50 The adjusted totals are provided by the coverage measurement program at the level of poststrata, which in this fictional example is simply represented as demo­ graphic groups. It is realistic to represent the adjusted totals as having no geographic structure relevant to this table because the sample size of post­enumeration surveys are typically only large enough to provide reliable estimates at high ­levels of geo­ graphic aggregation. The problem that needs to be addressed is the 300 (1400 – 1100) “additional” enumerations that have been estimated to reside in the three areas and have to be allocated to the three areas to satisfy various uses of census counts. If one had an adjustment program that supported estimates at the level of the cells in the above table, one could take the adjusted count for each cell, and add across demographic groups to arrive at an estimated population count for each cell. This operation would be equivalent to estimating the coverage correction factor for each cell and multiplying that by the census count to estimate the count for each cell. However, a coverage correction factor for each cell does not exist. plans to search nationwide for PES matches will be extremely helpful in arriving at a better understanding of the types and frequencies of compo- nents of census coverage error. Lastly, errors in geography, or demographics, or other characteristics, can also result in the placement of individuals in the wrong poststratum, which can also bias the estimation of net coverage error. Demographic Analysis The Census Bureau has made substantial use of demographic a ­ nalysis for several censuses, going back to 1940 (see, for example,

FUNDAMENTALS OF COVERAGE MEASUREMENT 37 Underlying synthetic estimation is the assumption that the coverage correc- tion factors are much more variable by demographic group than they are by area, so that using the coverage correction factors at the level of demographic group across areas, rather than disaggregated by area, may be of acceptable quality. This assumption is extremely difficult to verify empirically, though information from administrative records may provide some support in the future. In this fictional case, the assumption is that the factor 1.33 would apply relatively well to each of the three areas, and similarly for the factors 1.00 and 1.50. Applying these coverage correction factors to the counts above results in the following synthetic estimates (adjusted cell counts) and the resulting column of estimates for each area: Demo Demo Demo Area Group I Group II Group III Total Area I 267 100 150 517 Area II 400 50 75 525 Area III 133 150 75 358 Total 1,400 The result is that 117 people are added to area I, 125 people are added to area II, and 58 people are added to area III. Tukey (1983) showed that ­synthetic e ­ stimates provide counts that are preferred to the census counts, regardless of where the additional people actually reside, for many typical loss functions when there is a single demographic group. The National Research Council (1985) showed that this optimality property does not extend to the situation in which one aggregates the estimates over demographic groups. Price, 1947; Coale, 1955; Coale and Zelnik, 1963; Coale and Rives, 1972; S ­ hryock and Siegel, 1973; Siegel, 1974). We present a short overview here; for a more detailed treatment relevant to the 2000 application, see Robinson (2001). Basic Approach Demographic analysis makes use of the following “balancing equation” to estimate the population in a demographic group: PNEW = POLD + B – D + I – E, where by demographic group, we mean in particular groups defined by age, sex, and black or nonblack:

38 COVERAGE MEASUREMENT IN THE 2010 CENSUS PNEW is the current population for the demographic group; POLD is the previous population for the demographic group, or can be zero when a population is started from births; B is the number of births into the group occurring either from an initial date or from the date of a previous census; D is the number of deaths to the group occurring either from an initial date or from the date of a previous census; I is the amount of immigration to the group occurring from an initial date or from the date of a previous census; and E is the amount of emigration from the group occurring from an ini­ tial date or from the date of a previous census. As suggested in the above definitions, this equation can either be applied from a date prior to the birth of any of the current residents in that demo­ graphic group, assessing how many members of that group have been born here, remained here, and have not died, and how many members of that group have moved here, remained here, and not died; or it can be applied to assess the changes in the size of the demographic group that have occurred since a previous (often the most recent) census. In addition to the use of data on births, deaths, and immigration, given their high quality, Medicare enrollment data are now used to esti­ mate the population aged 65 and older without resorting to the above accounting equation. Unresolved Aspects of Demographic Analysis The logic of demographic analysis requires that the population esti­ mates constructed from the basic demographic accounting relationship be comparable with the populations measured by the census. As a result of demographic changes in the U.S. population over the past generation, many of the assumptions made by demographic analysis have become more problematic than they were for the censuses of 1940 through 1980. Specifically, immigration and emigration have become much more important sources of population growth. Also, as a result of immigra­ tion, intermarriage, and larger societal trends, the current definition and measurement of racial and ethnic groups have become less consistent   The above equation is used in “reverse time” to backdate the Medicare-based population estimates to earlier censuses. For example, the population aged 55 and over in 1990 can be estimated by “reviving” the Medicare-based estimate for people aged 65 and over in 2000 adding deaths occurring in 1990–2000, subtracting 1990–2000 immigration, and adding estimated 1990–2000 emigration.

FUNDAMENTALS OF COVERAGE MEASUREMENT 39 with historical definitions in the data used to construct the demographic estimates. Historical data on the numbers of births and deaths are of relatively high quality; data on international migration are more problematic. Esti­ mates of the number of emigrants are subject to considerable ­variability. In addition, undocumented immigration has become as important numer­ ically as legal immigration, but the available measures are not very exact. Demographic analysis has generally been restricted to national estimates of age, sex, and race groups, since the available measures of subnational migration are not sufficiently reliable to support production of estimates at the state or lower levels of geographic aggregation. Furthermore, because ethnicity has been captured in vital records on a national scale only since the 1980s, demographic analysis has not been used to estimate net under­ coverage for Hispanics. Finally, with the introduction of ­ multiple-race responses in the 2000 census, it has become necessary to map the cen­ sus race categories into historical single-race categories (or vice versa), with the attendant introduction of additional bias into the demographic analysis estimates. The introduction of multiple-race responses is part of the growing complexity of racial classification, which is likely to increase discrepancies between birth certificate reporting and self-reporting of race by adults. Having pointed out some of the deficiencies of demographic anal­ ysis, it is important to emphasize its continuing value in coverage measurement. Demographic analysis places the census results within the well‑defined, consistent, and essentially tautological framework of demographic change. The realities of the balancing equation shown above place severe limits on certain results from other coverage mea­ surement programs. Thus, for example, with the passage of 1 year, all living people get exactly 1 year older; or, for every baby boy born, there will be approximately one baby girl born. If the results from other coverage measurement studies give results outside the bounds implied by such demographic realities, the results need to be explained. Some explanations may be demographic—for example, higher levels of immigration or emigration than included in the demographic esti­ mate. But they may also point to statistical or measurement issues—for example, the persistent correlation bias that affects DSE measures of adult black men. The current demographic analysis program at the Census Bureau also links the measures from the current census with censuses back to 1940. While DSE stands alone as a measure of the coverage of a particular cen­ sus, the demographic analysis program grounds its current results in the historical data series, so that it is possible to assess one census relative to

40 COVERAGE MEASUREMENT IN THE 2010 CENSUS others.10 This linkage can place limits not only on the demographic analy­ sis measures, but also on the plausibility of results from other studies. Due to the relatively deterministic nature of estimates of net coverage error from demographic analysis, estimates of its error or uncertainty are difficult to justify. However, there have been a few attempts to provide error estimates, in particular Robinson et al. (1993). It is important to recognize that some results from demographic analysis are of higher quality than the overall results, so that they may be incorporated into a comprehensive coverage measurement program. While immigration has become an increasingly important component of population change that is not well measured, it has very little effect on the youngest age groups. Thus, it would be difficult to support estimates from DSE for children that were not consistent with demographic analysis results. Also, many population ratios, including sex ratios, are much less sensitive to assumptions about such problematic components as undocu­ mented immigration than the measured population size. As a result, it may be possible to incorporate some byproducts of demographic analysis results into overall measures of coverage. In 2000, demographic analysis proved to be very useful in the cov­ erage measurement program, notwithstanding the noted deficiencies. Demographic analysis provided an early indication that the initial esti­ mates of the total U.S. population from A.C.E. may have been too high. Also, demographic analysis has now been used to provide input to reduce the effects of correlation bias in DSE. Such corrections were used for adult black men in the 2000 revised estimates of net undercoverage and the current plans are to continue this use in the 2010 coverage measurement program (for more discussion, see Chapter 4). 1950–1990 CENSUSES 1950 Census11 The 1950 census used both a postenumeration survey and demographic analysis for coverage measurement. The strategy that motivated the post­ enumeration survey coverage evaluation methodology in the 1950 census was that coverage errors were largely due to failures to correctly implement 10  Again, the realities of the balancing equation provide this linkage. Thus, if current r ­ esearch suggests that the demographic analysis coverage measures for a specific age group need to be adjusted (because, for example, the Medicare data show more or fewer enrollees than expected, or the births for a historical time period appear to be too high, or immigration during a decade had to have been higher or lower), the adjustment affects the size of the age group not only in the current census but also in past ones. 11  Much of the material in this section is taken from National Research Council (1985).

FUNDAMENTALS OF COVERAGE MEASUREMENT 41 census definitions and procedures and to imperfections in the materials and procedures used. Therefore, the idea was to take a sample of areas and in those areas “count again, but better.” The 1950 PES used an area sample to identify whole household census omissions and a list sample from the census to estimate both within-household and whole household erroneous inclusions and to estimate within-household omissions. The list sample and the area sample overlapped to a great extent. Interviewers for the area sample canvassed each area, noted housing units not included in the list sample as possibly omitted from the census, and interviewed the occupants to collect census data. Interviewers for the list sample visited each house­ hold and determined any erroneous inclusions and any potential omis­ sions, and in the latter case obtained census data from those individuals. All the interviewer records from both the area and the list samples were then matched to the census to directly determine the rates of erroneous inclusion and erroneous omission. To ensure high quality, interviewers were very carefully selected and trained. In addition, more detailed questions were asked, in comparison with the census, and interviewers were instructed to obtain responses from each adult rather than allow for proxy responses. The results were an estimated net undercount of 1.4 percent, with 2.2 per­ cent omissions and 0.9 erroneous inclusions. The demographic analysis in 1950 is described in Coale (1955). For people under 15 years of age, Coale used birth registration data to esti­ mate their population size. For those aged 15–64, Coale relied on com­ parisons with the results of preceding censuses, subtracting estimates of the degree of mortality and adding net immigration for the various age groups in the intercensal periods. For those over 65, Coale used the 1950 postenumeration survey results to estimate net undercoverage. The results were an estimated national net undercount of 3.5 percent,12 2.5 times the PES estimate. The apparent ineffectiveness of the PES was attributed to the tendency for the PES to miss the same types of people who are missed in the census, i.e., to reflect correlation bias. 1960 Census The 1960 census again used a PES that was composed of an area sample and a list sample (as in 1950). The area sample contained 2,500 segments comprising 25,000 housing units from the 1959 Survey of Com­ ponents of Change and Residential Finance. Enumerators were instructed to canvass their areas and reconcile their lists with that of the survey and to interview the occupants of any omitted housing units. The list sample, independent of the area sample, comprised a national sample of 15,000 12  Robinson and West (2000) give an estimate of 4.1 percent.

42 COVERAGE MEASUREMENT IN THE 2010 CENSUS housing units drawn from the census enumerators’ listing books. Inter­ viewers were given the list of housing units, but not the names of the resi­ dents, and were instructed to re-enumerate the housing units. The results from both the area sample and the list sample were matched to the census. The resulting estimate of national net undercount was 1.9 percent. Demographic analysis was also used to evaluate the coverage of the 1960 census. As described in Coale and Zelnick (1963), the method used was essentially an extension of the method used in 1950, though much more elaborate. Siegel (1974) utilized Medicare data to improve Coale and Zelnick’s estimated number of elderly. The resulting estimate of national net undercoverage was 2.7 percent,13 1.4 times that of the PES. In addition, the Census Bureau also carried out a reverse record check study to estimate net national undercount in 1960, which was very close to the methods used for many decades by Statistics Canada to assess net undercoverage. See Bureau of the Census (1964) and Gosselin and Theroux (1977) for details of this alternative approach to measuring net undercoverage. 1970 Census As a result of the problems experienced with the PES in 1950 and 1960, the Census Bureau did not use a PES to evaluate the coverage of the 1970 census, placing primary reliance on demographic analysis. However, there was a CPS-census match study, whose estimates were adjusted for additions to the census count resulting from imputations based on the National Vacancy Check, and the Post-Enumeration Post Office Check. (The National Vacancy Check was in some sense the only use of sampling to adjust census counts, by selecting a sample to provide an improved estimate of how many housing units were vacant.) The Census Bureau also carried out two record check studies of specific population groups. The Medicare Record Check involved use of a sample of 8,000 persons age 65 and over that was selected from Medicare records and matched to the 1970 census. The D.C. Driver’s License Study involved a match of driver’s license records with census records for roughly 1,000 males aged 20–29 living in selected tracts in the District of Columbia. The data used for demographic analysis estimates included birth and death statistics, immigration data, Medicare enrollments, life tables, and data from previous censuses. Siegel (1974) published a range of estimates, with a preferred estimate of national net undercoverage of 2.5 percent.14 From 1950 to 1960 to 1970, there was increasingly less reliance on data 13  Robinson and West (2000) give the estimate as 3.1 percent. 14  Robinson and West (2000) give the estimate as 2.7 percent.

FUNDAMENTALS OF COVERAGE MEASUREMENT 43 from previous censuses and more reliance on birth and death data, and, much less, data on immigration and emigration. 1980 Census The coverage measurement program in 1980, which used DSE based on a postenumeration survey, was the Post-­Enumeration Program (PEP). It was planned with the understanding that it might be used to adjust the 1980 census for net undercoverage. Therefore, the intent was to produce reliable net undercoverage estimates possibly down to the level of states and large cities. In contrast to the strategy used in the 1950 and 1960 PESs, the approach used in 1980 was “do it again, independently.” The recount was not assumed to be superior to the census count, just independent. If this assumption obtained, DSE, described above, could then be applied to estimate net census undercoverage (as argued by Sekar and Deming, 1949, however, in the context of birth registration). The P-sample in 1980 comprised the April and August CPS samples (roughly 70,000 households each and about 185,000 individuals). The two months of the CPS were used to provide estimates of suf­ ficient reliability to support estimates of undercoverage for states and major cities. For the purpose of coverage measurement, the CPS interview was augmented with a sketch map to help locate the residence, and, for the August P-sample, a list of all recent residences of each person was requested. The census questionnaires were then searched (for only the relevant enumeration district) for a person with closely matching infor­ mation on name, address, sex, age, and race. If no matching census record could be found, or if the information collected was insufficient, a follow- up interview was attempted. The E-sample was designed to estimate the rate of erroneous enumer­ ations, duplicates, and enumerations in the wrong place. The E-sample was composed of 100,000 census questionnaires, though only 50,000 were searched for duplicates. A substantial complication of the PEP was that the percentage of cases either not responding to the initial interview or whose matching status was unresolved was more than 8 percent nationally and was even larger for some demographic groups. Various treatments of the nonresponse and unresolved match cases and decisions about which CPS month to use (they were not used jointly to produce estimates) resulted in 12 different sets of undercoverage estimates, which varied considerably. For example, estimates of net national undercoverage ranged from 0.8 to 1.4 percent. Given this uncertainty, the Census Bureau decided against adjusting the 1980 census for differential net undercoverage. Demographic analysis was also used to estimate the net undercover­ age of the 1980 census. Passel et al. (1982) describes the methods used,

44 COVERAGE MEASUREMENT IN THE 2010 CENSUS which were similar to those used in the 1970 census. However, the rise in the population of undocumented aliens reduced the quality of the estimates from demographic analysis in comparison with those in 1970. Initial demographic analysis-based estimates of the net national under­ count were very close to 0 percent. However, there is evidence that the initial demographic analysis estimates failed to account for a substan­ tial increase in the undocumented immigrant population. Subsequent analysis suggests that the undercoverage of the census was as much as 1.5 percent given reasonable assumptions about the size of the uncounted undocumented population.15 1990 Census Initial plans for the 1990 coverage measurement program were for a PES of 300,000 housing units, which the Census Bureau argued was needed to support net undercoverage estimates (and potentially adjust­ ment) at the level of geographic aggregation consistent with such uses as reapportionment and redistricting. However, this design was rejected by the Secretary of Commerce and was replaced by a PES of 150,000 house­ holds, which was only to be used for purposes of coverage measurement and not for census adjustment. That decision precipitated a lawsuit that maintained the size of the PES at 150,000 households (ultimately, 165,000) but reopened the possibility of using the PES to adjust the 1990 census: the decision on adjustment, to be made by the Secretary of Commerce, would benefit from the deliberations of the members of a Special Secre­ tarial Advisory Panel. The 1990 PES was the first postenumeration survey with a survey instrument specifically designed for coverage measurement. The final design included more than 5,000 block clusters that were independently listed. (The design included people living in many types of group ­quarters residences.) The design also included a considerable amount of over­ sampling of blocks that contained a large fraction of historically hard-to- enumerate people. The P-sample residents were interviewed and matched to the E-sample, the census enumerations in the PES blocks, first using computer matching software, with difficult cases then subject to cleri­ cal review. Unmatched cases were followed up in the field. Given the development of a specific survey instrument, and due to better efforts to collect data, the percentage of nonrespondents and cases with ­unresolved match status was substantially lower than in 1980. Dual-systems estimates were constructed by separately computing net undercoverage estimates for 1,392 poststrata. Due to their high variance, these estimates were 15  Robinson and West (2000) give the estimate of 1.2 percent.

FUNDAMENTALS OF COVERAGE MEASUREMENT 45 smoothed using empirical Bayes regression methods. Finally, the esti­ mates were carried down to the level of census blocks using synthetic estimation. The problem of estimating both the number of undocumented aliens resident in the United States on census day and the percentage of those that were enumerated in the census posed more of a challenge to demographic analysis in 1990 than in 1980, given the larger size of the undocumented population in 1990. A “residual” process was developed to address this problem. The basic idea is as follows. An estimate of the number of legal foreign-born residents was developed using reported data on legal immi­ gration. This figure was subtracted, at the national level, from the esti­ mated number of foreign-born residents, either from the census long-form sample or from the CPS (and now, the American Community Survey), to arrive at an initial estimate of the number of illegal immigrants. This esti­ mate was then inflated to account for undercoverage of this population (for details, see Robinson, 2001). Clearly, this required a few assumptions that were unlikely to hold, even approximately. However, there was, and still is, no preferred alternative methodology. The Census Bureau released preliminary PES results in April 1991, estimating a national net undercount of 2.1 percent, with a difference of 3.1 percent between the rate of undercoverage of blacks and nonblacks. An internal Census Bureau group voted seven to two in favor of adjust­ ing the 1990 census to remedy differential undercoverage. The Special Secretarial Advisory Panel split equally on the decision on adjustment. Ultimately, the Secretary of Commerce decided not to adjust the 1990 census. 2000 CENSUS16 Background The initial planning for the 2000 census coverage measurement pro­ gram was based on two assumptions: (1) that coverage improvement programs were unlikely to greatly reduce black–nonblack differential net undercoverage, and therefore the 2000 census was likely to continue an historical pattern of substantial differential net undercoverage of minori­ ties, and (2) the time necessary to compute PES-based adjusted counts and validate them would make it extremely difficult to deliver adjusted counts in time for apportionment or redistricting of the U.S. House of Representatives unless substantial changes were made to the design of the census. Consequently, the Census Bureau initially decided to use 16  Much of the material in this section is taken from National Research Council (2004b).

46 COVERAGE MEASUREMENT IN THE 2010 CENSUS for its coverage measurement program in 2000 a strategy referred to as integrated coverage measurement. The idea of integrated coverage mea­ surement was to limit the more extreme efforts at coverage improvement that delayed the PES data collection and to rely on the PES to address some of the coverage problems in the primary enumeration. The result­ ing PES counts, which would be the official census counts in a so-called “one-number census,” would be of higher quality than the unadjusted census counts. It was anticipated that the execution of the PES would greatly benefit from the experience gained in carrying out the 1980 and 1990 PESs, and from the earlier (and therefore higher quality) data col­ lection made possible by the elimination of late-stage coverage improve­ ment programs. The integrated coverage measurement design called for a 700,000 household survey and accompanying matching operation, which would provide reliable direct estimates for states. (Direct estimates refers to each state’s estimate of net undercoverage, which are based on the sample collected from households in that state alone.) The plan to use integrated coverage measurement was jettisoned after the Supreme Court decision in January 1999 (Department of Commerce v. United States House of Representatives, 525 U.S. 316), which prohibited the use of sampling methods to produce counts for purposes of apportion­ ment. This decision required the Census Bureau to greatly modify the design of the 2000 census as well as the associated coverage measurement program. With respect to the census itself, the Census Bureau was not allowed to use sampling for nonresponse follow-up as planned, since that would result in sample-based census counts. This change, in turn, affected PES plans for the 2000 census because sampling for nonresponse follow-up ruled out using a particular ver­ sion of PES, referred to as PES-B. In a PES-B, one attempts to enumerate in-movers in the P-sample blocks, assuming that their size and charac­ teristics are roughly equivalent to those of the out-mover population. To determine whether those individuals were enumerated in the census, one determines the address at which they resided on census day, and then checks the census records at those locations to see if there is a match. However, with a census using sampling for nonresponse follow-up, many of those locations would not be included in the sample follow-up, or they would not have a census enumeration status, greatly complicating the use of PES-B. For that reason, PES-C was created, which again uses the size of the in-mover population to estimate the size of the out-mover population, as in PES-B, but instead uses the match status of the out-mover popula­ tion. Because of the difficulty locating and contacting the out-movers, match status is often based on proxy information of dubious quality. For that reason, it is generally believed that PES-C is inferior to PES-B. How­ ever, when the design for the 2000 Census was revised as a result of the

FUNDAMENTALS OF COVERAGE MEASUREMENT 47 Supreme Court decision of 1999, the Census Bureau did not have time to move back to a PES-B strategy. The Accuracy and Coverage Measurement Program The modified coverage measurement program, referred to as Accuracy and Coverage Evaluation Program (A.C.E.), was scaled back to a 300,000- household PES. Given the early work already carried out in selecting the integrated coverage measurement sample, the reduced A.C.E. sample was selected by subsampling from that sample. The smaller sample size for the A.C.E. could be justified since there was no longer a need to produce direct state estimates to support apportionment and so estimates from A.C.E. could borrow information across state boundaries. In addition, it is important to note that adjusted counts were not needed now until April 1, 2001, in support of redistricting, which provided additional time for validating the A.C.E. estimates. As noted above, the 2000 A.C.E. sample was twice as large as the PES in 1990, with a P-sample of 300,000 households in 11,000 block clusters. Given the larger sample, there was less need to oversample, and the resulting variances of net coverage estimates were very likely substantially reduced in comparison with those for 1990. A.C.E. also used ­ computer-assisted telephone and personal interviewing to facili­ tate the collection of P-sample data. A.C.E.’s dual-systems estimates used 448 poststrata (which were later collapsed to 416). This number, and the larger sample size, reduced the need, relative to 1990, for empiri­ cal Bayes smoothing. In light of the possibility of adjustment (before the Supreme Court decision), the idea was that if A.C.E. could be demonstrated to provide valid estimates of net coverage error for poststrata and if the estimated net error differed appreciably by poststrata, then adjusted population counts from A.C.E. should be used for redistricting and for other official purposes. In practice, however, the Census Bureau’s evaluations of A.C.E. discovered several problems. One problem was that A.C.E. estimated an overall net undercount of 1.2 percent while the initial demographic analysis estimated that the census had a 0.7 percent net overcount. Later revision of the demographic analysis estimates resulted in the estimate of a 0.3 percent net undercount, but this was still inconsistent with the estimate from A.C.E. Other problems with A.C.E. concerned balancing error, the uncertain effects of a substantial number of late additions to the census, the level of error from synthetic estimation, the relative lack of duplicates identi­ fied by A.C.E., and the validity of whole-household imputations. These problems collectively led to a recommendation by the Census Bureau,

48 COVERAGE MEASUREMENT IN THE 2010 CENSUS seconded by the Secretary of Commerce on March 6, 2001, not to use A.C.E. counts for redistricting. Post-2000 Research A.C.E. Revision I After the 2000 census, the Census Bureau carried out research to investigate the sources and magnitudes of error in the 2000 census, A.C.E., and demographic analysis. As part of this effort, the Census Bureau carried out two studies to collect additional information rel­ evant to specific concerns regarding A.C.E. The purpose of the work was to determine the extent of person duplication, and, on a sample basis, to identify the correct residence for E-sample enumerations and to determine the correct match status for P-sample cases. The studies were the Evaluation Follow-Up Study, which involved reinterviewing a subsample of 70,000 people in E-sample housing units in 20 percent of the A.C.E. block clusters to determine correct residences on ­Census Day (with additional clerical review of 17,500 people who were ­unresolved) and the Person Duplication ­ Studies, which involved nationwide com­ puter matching of E-sample records to census enumerations using name and date of birth. This nationwide search permitted the first determina­ tion of the extent of remote duplication in the census, that is, cases in which the duplicated individuals did not both reside in the PES block cluster. The Census Bureau also examined the implementation of the targeted extended search (searches outside of the relevant P-sample block cluster for a match for situations in which there was a likely error in identifying the correct block) for matches to P-sample cases; estima­ tion of the match rate and the correct enumeration rate for people who moved during the data collection for A.C.E.; and the effects of census imputations. As a result of these very detailed investigations, the Census Bureau judged that A.C.E. counts substantially underestimated the rate of census erroneous enumerations and hence tended to overestimate the true population size.17 The Census Bureau subsequently released revised A.C.E. estimated counts on October 17, 2001, which were referred to as A.C.E. Revision I counts. Given concern stemming from the finding that there were considerably more errors of duplication than were originally estimated by the A.C.E., the Census Bureau again recommended that 17  Fenstermaker and Mule (2002) estimated that there were a total of 5.8 million dupli­ cates in the 2000 census. The problem could have been even larger, but the Census Bureau mounted an ad hoc operation early in 2000 to identify duplicate MAF addresses and associ­ ated returns, which removed 3.6 million people from the 2000 census.

FUNDAMENTALS OF COVERAGE MEASUREMENT 49 these adjusted counts not be used, this time for the allocation of federal funds or other official purposes. A.C.E. Revision II Between October 2001 and March 2003, the Census Bureau undertook a further review of all the data collected in the census and the A.C.E. In addition, based on a comparison of the sex ratios from demographic analysis to those from the A.C.E., the Census Bureau decided to revise (increase) the A.C.E. estimates for males so that the resulting sex ratios were consistent with the sex ratios for demographic analysis for blacks 18 years old and older and for all other males more than 30 years old. This revision was based on the Census Bureau’s belief that the A.C.E. counts had been reduced by correlation bias. The adjustment that was imple­ mented is implicitly based on the assumption that the correlation bias for the parallel female groups was close to zero. The estimates based on these revisions are referred to as A.C.E. Revision II. Evaluations of the quality of these final A.C.E. estimates resulted in the announcement, on March 12, 2003, by the Census Bureau that the A.C.E. Revision II counts would not be used as the base for producing intercensal population estimates. The Panel to Review the 2000 Census (National Research Council, 2004b) generally agreed with the decisions made at each stage of this three-stage process, namely: not to use the A.C.E. counts—either the origi­ nal, Revision I, or Revision II—for purposes of redistricting, fund alloca­ tion or other official purposes or for intercensal estimation. However, the panel was not in complete agreement with the supporting arguments of the Census Bureau.18 As a by-product of this intensive effort to understand whether adjusted counts were preferable to unadjusted counts for various purposes, the Census Bureau produced comprehensive documentation and evaluation of the A.C.E. processes. A considerable amount of material is available at the following locations. Evaluations supporting the March 2001 decision can be found at http://www.census.gov/dmd/www/EscapRep.html; evalu­ ations supporting the October 2001 decision can be found at http://www. census.gov/dmd/www/EscapRep2.html; and evaluations supporting the March 2003 decision can be found out at http://www.census.gov/dmd/ www/ace2.html. Collectively, these reports document the A.C.E. proce­ dures in detail, examining what was learned about the quality of A.C.E. and A.C.E. Revisions I and II through the additional information collected. 18  For the panel’s arguments, a more detailed description of A.C.E. and the various evalu­ ation studies, and the material on which this abbreviated history is based, see National Research Council (2004b:Chapters 5–6).

50 COVERAGE MEASUREMENT IN THE 2010 CENSUS Limitations of A.C.E. The A.C.E. in the 2000 census was planned from the outset as a method for adjusting census counts for net coverage error so it did not focus on estimating the number or frequency of the various components of census coverage error. As an important example, the limited geographic search for matches in the A.C.E. (for estimation of net coverage error) relied on the balancing of some erroneous enumerations with omissions, in which the erroneous enumerations were at times valid E-sample enumerations but in the wrong location. People can be counted in the wrong location as a result of a geocoding error (placing an address in the wrong census geography) or enumerating a person at a second home. Because such erroneous enumerations and omissions were expected to balance each other on average, they were expected to have little effect on the measure­ ment of net coverage error. Therefore, the A.C.E. did not allocate the additional resources that would have been required to distinguish these situations from enumerations in the wrong place. Similarly, the A.C.E. did not always distinguish between an erroneous enumeration and counting a duplicate enumeration at the wrong location. In addition, the A.C.E. effectively treated all cases with insufficient information for matching as imputations although it is clear that a sub­ stantial fraction of them are correct. The Census Bureau has done research that demonstrates that for a substantial subset of the cases match status can be reliably assessed. (See Chapter 4 for the discussion on missing data methods for details on this research.) We now mention several other limitations of the A.C.E. in 2000 for measuring census component coverage error, along with any plans to address the limitation in the designs for the census and census coverage measurement in 2010. However, we must stress that our treatment here of the theoretical underpinnings of A.C.E. is incomplete. For a complete description of this, see Hogan (2003). First, inadequate information was collected in the census on census day residence. In 2000, comprehensive information was not collected from a household in the census interview regarding other residences that residents of a household often used or on other people who occa­ sionally stayed at the household in question. This limited the Census Bureau’s ability to correctly assess residency status for many people. The Census Bureau intends to include more probes to assess residence status in the 2010 census questionnaire, and the coverage follow-up interview will also collect additional information on residence status for those housing units that are likely to have incorrectly represented the number of residents on the census form. In addition, more probes about residence will also be included on the 2010 census coverage measure­ ment questionnaires.

FUNDAMENTALS OF COVERAGE MEASUREMENT 51 In 2010, the duplicate search will be done nationwide, and not only for the PES population, to help determine census day residence. In con­ junction with search, as part of the coverage follow-up interview, the Census Bureau plans on incorporating a real-time field verification of duplicate enumerations in 2010. (For details on issues in determining correct residence, see U.S. Census Bureau, 2003.) Second, nonresponse in the E- and P-samples complicated matching of the P-sample to the E-sample and of the E-sample to the census (to identify duplicates). It also complicated estimation because it interfered with assigning a person to the correct poststratum (for details, see Mulry, 2002). Third, the use of the methodology for individuals who moved between Census Day and the day of the postenumeration interview (known as PES-C; see above) resulted in a large percentage of proxy enumerations, which in turn resulted in matching errors. The Census Bureau is planning on returning in 2010 to the use of PES-B (similar to the 1990 methodol­ ogy), which relies completely on information from in-movers. Fourth, the A.C.E. Revision II estimates refined undercoverage esti­ mates for black men over 18 and for “all other men” over 30 using sex ratios from demographic analysis (ratios of the number of women to the number of men for a demographic group) to correct for correlation bias (for details, see Bell, 2001; Shores, 2002). This method assumes that the net coverage error for women for the relevant demographic group is ignor­ ably small. However, for nonblack Hispanics, refinement (other than that used for all nonblack males) using sex ratios would require a long his­ torical series of Hispanic births and deaths and, more importantly, highly accurate data on the magnitude and sex composition of immigration (both legal and undocumented). Yet the historical birth and death data for Hispanics are available only since the 1980s, and the available measures of immigration are too imprecise for this application. Consequently, this use of demographic analysis to refine A.C.E. estimates was not directly applicable to nonblack Hispanic males in 2000.19 In addition, there is a great deal of uncertainty about the degree to which various assumptions need to obtain to support the use of this methodology for either blacks or nonblack Hispanics. (The decision could also depend on the yardstick in question, i.e., whether counts or shares are the quantities of interest.) Fifth, poststratification is used to reduce correlation bias since it parti­ tions the U.S. population into more homogeneous groups regarding their enumeration propensities. The number of factors that could be included in 19  For example, it is noteworthy that about 55 percent of working-age (18–64) Hispanics are foreign born in comparison with less than 5 percent of whites and slightly more than 5 percent of blacks.

52 COVERAGE MEASUREMENT IN THE 2010 CENSUS the poststratification used in the A.C.E. was limited because the approach (essentially) fully cross-classified many of the defining factors, with the result that each additional factor greatly reduced the sample size per post­ stratum; for details of the 2000 poststrata, see U.S. Census Bureau (2003). The 2010 plan is to use logistic regression modeling to reflect the influence of many factors on coverage rates. Sixth, small-area variations in census coverage error that are not cor­ rected by application of the poststratum adjustment factors to produce estimates for subnational domains (referred to as synthetic estimation) were not reflected in the variance estimates of adjusted census counts. The Census Bureau is examining the use of random effects in their adjustment models to account for the residual variation in small-area coverage rates beyond that which is modeled through synthetic estimation. Finally, the sample design made the 2000 A.C.E. less informative than it might have been in measuring components of census coverage error by not providing a greater sample targeted on people and housing units that are difficult to enumerate. As noted above, this approach was under­ standable given the focus of the A.C.E. on producing adjusted census counts. However, given the new priority of measuring the components of census coverage error, a number of design and data collection decisions in the general framework of PES data collection, especially sample design, remain open to modification. The Role of Demographic Analysis From 1950 through 1980, demographic analysis served as the primary source of estimates of national net coverage error in the decennial census. However, demographic analysis is fundamentally limited: It can only provide estimates of net coverage error for some national demographic groups. Demographic analysis cannot yet provide estimates for any geo­ graphic region below the national level, and it does not provide estimates of net undercoverage for the Hispanic population. Because of these limita­ tions, DSE, based on a postenumeration survey, displaced demographic analysis as the primary coverage measurement instrument, starting with the 1990 census. However, its limitations do not mean that demographic analysis is no longer useful. As was evident in 2000, demographic analysis still performs two useful functions. First, it provides estimates of population size that can be used in a variety of ways to assess the quality of dual-systems esti­ mates, including comparison of dual-systems estimates to demographic analysis estimates that are less affected by the quality of the data on external migration and sex and through age ratios. Demographic analysis played this important role in the 2000 census. Second, sex ratios and other

FUNDAMENTALS OF COVERAGE MEASUREMENT 53 TABLE 2-3  Net Undercoverage for U.S. Censuses, 1940–2000 Group 1940 1950 1960 1970 1980 1990a 2000b U.S. Total   Demographic analysis 5.4 4.1 3.1 2.7 1.2 1.8 0.1   PES — — 1.9 — 0.8–1.4 1.6 –.5 Black   Demographic analysis 8.4 7.5 6.6 6.5 4.5 5.7 2.8   PES — — — 5.2–6.7 4.6 1.8 Nonblack   Demographic analysis 5.0 3.8 2.7 2.2 0.8 1.3 –0.3   PES — — — — — — — aRevised estimates. bRevised demographic analysis and final A.C.E. estimates. SOURCE: Data from Anderson (2000) and National Research Council (2004b). information from demographic analysis can be useful in improving the estimates from DSE. For example, the modification of the count for adult black men, mentioned above, remains a possibility for 2010. In addition, looking to the future, ongoing research is attempting to address the two primary deficiencies of demographic analysis—the lack of subnational estimates and the lack of estimates for the Hispanic population. To complete this history of coverage measurement, Table 2-3 shows the national estimates of net coverage error and the estimates dis­aggregated by black and nonblack for the decennial censuses from 1940 to 2000.

Next: 3 Plans for the 2010 Census »
Coverage Measurement in the 2010 Census Get This Book
×
Buy Paperback | $56.00 Buy Ebook | $44.99
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

The census coverage measurement programs have historically addressed three primary objectives: (1) to inform users about the quality of the census counts; (2) to help identify sources of error to improve census taking, and (3) to provide alternative counts based on information from the coverage measurement program.

In planning the 1990 and 2000 censuses, the main objective was to produce alternative counts based on the measurement of net coverage error. For the 2010 census coverage measurement program, the Census Bureau will deemphasize that goal, and is instead planning to focus on the second goal of improving census processes.

This book, which details the findings of the National Research Council's Panel on Coverage Evaluation and Correlation Bias, strongly supports the Census Bureau's change in goal. However, the panel finds that the current plans for data collection, data analysis, and data products are still too oriented towards measurement of net coverage error to fully exploit this new focus. Although the Census Bureau has taken several important steps to revise data collection and analysis procedures and data products, this book recommends further steps to enhance the value of coverage measurement for the improvement of future census processes.

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!