1
Introduction

The decennial census is an enormously complex endeavor. It requires counting residents in all types of living situations, from the densest urban setting to rural Alaska, in linguistically isolated areas and in gated communities, largely with a very temporary workforce that must be trained in only a few days. Given these circumstances, it is not surprising that the census counts are imperfect. Furthermore, even if an optimized census process could be developed for one census year, the dynamic nature of the United States population could make this process inefficient for the next census. It is therefore very important both to assess the quality of the census count and to learn as much as possible about what did and did not work well to inform future process improvements.


The Census Bureau has a 50-year history of carrying out careful assessments of the quality of its censuses. In particular, it has devoted substantial resources to the measurement of net coverage in the decennial census—that is, estimates of the difference between the census count and the “true” count, for various geographic and demographic groups.


Since 1978, panels of the National Research Council’s (NRC) Committee on National Statistics have advised the Census Bureau on the assessment of census coverage. In particular, the Panel to Review the 2000 Census fully reviewed the operations, statistical methods, and results of the Accuracy and Coverage Evaluation (A.C.E.) program that the Census Bureau used to evaluate the coverage of the 2000 census. The description and evaluation of A.C.E. in that panel’s final report (National Research Council, 2004b) is comprehensive and is referred to often in this report.


The work of the Panel on Correlation Bias and Coverage Measurement in the 2010 Census is the latest effort of the NRC to assist the Census Bureau as it plans the 2010 census. The panel is studying how to adapt census coverage measurement1 to assess coverage better and to guide improvements in census processes.

1

“Coverage measurement” and “coverage evaluation” are sometimes used synonymously. However, coverage measurement explicitly denotes a quantitative exercise, and coverage evaluation has more to do with the broad purposes of the activity, which is to assess, through a variety of operations, the completeness of the coverage through use of various quantitative and qualitative tools.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 5
Research and Plans for Coverage Measurement in the 2010 Census: Interim Assessment: Panel on Coverage Evaluation and Correlation Bias in the 2010 Census 1 Introduction The decennial census is an enormously complex endeavor. It requires counting residents in all types of living situations, from the densest urban setting to rural Alaska, in linguistically isolated areas and in gated communities, largely with a very temporary workforce that must be trained in only a few days. Given these circumstances, it is not surprising that the census counts are imperfect. Furthermore, even if an optimized census process could be developed for one census year, the dynamic nature of the United States population could make this process inefficient for the next census. It is therefore very important both to assess the quality of the census count and to learn as much as possible about what did and did not work well to inform future process improvements. The Census Bureau has a 50-year history of carrying out careful assessments of the quality of its censuses. In particular, it has devoted substantial resources to the measurement of net coverage in the decennial census—that is, estimates of the difference between the census count and the “true” count, for various geographic and demographic groups. Since 1978, panels of the National Research Council’s (NRC) Committee on National Statistics have advised the Census Bureau on the assessment of census coverage. In particular, the Panel to Review the 2000 Census fully reviewed the operations, statistical methods, and results of the Accuracy and Coverage Evaluation (A.C.E.) program that the Census Bureau used to evaluate the coverage of the 2000 census. The description and evaluation of A.C.E. in that panel’s final report (National Research Council, 2004b) is comprehensive and is referred to often in this report. The work of the Panel on Correlation Bias and Coverage Measurement in the 2010 Census is the latest effort of the NRC to assist the Census Bureau as it plans the 2010 census. The panel is studying how to adapt census coverage measurement1 to assess coverage better and to guide improvements in census processes. 1 “Coverage measurement” and “coverage evaluation” are sometimes used synonymously. However, coverage measurement explicitly denotes a quantitative exercise, and coverage evaluation has more to do with the broad purposes of the activity, which is to assess, through a variety of operations, the completeness of the coverage through use of various quantitative and qualitative tools.

OCR for page 5
Research and Plans for Coverage Measurement in the 2010 Census: Interim Assessment: Panel on Coverage Evaluation and Correlation Bias in the 2010 Census This report is the first installment in the panel’s work. Some of the topics taken up in this interim report will be examined in more depth in the panel’s final report. COVERAGE MEASUREMENT IN THE DECENNIAL CENSUS Uses of Coverage Measurement Broadly speaking, coverage measurement potentially serves three primary uses: (1) assessment of coverage accuracy, (2) guidance for improvement of census processes, and (3) adjustment of reported counts. Assessment of Coverage Accuracy Census counts are used for many purposes vital to the nation, including the apportionment of seats in the U.S. House of Representatives (the constitutional mandate of the census); federal, state, and local redistricting; fund allocation to state and local jurisdictions; public planning; and learning about the population. Consequently, it is important for the nation to monitor the quality of population coverage overall and that of demographic or other groups. The purpose of the Census Bureau’s coverage measurement programs for the 1950, 1960, and 1970 censuses was primarily to inform users as to the quality of census coverage. Census Process Improvement In addition to providing information to users about census quality, coverage measurement programs were also used to identify components of census processes that, if improved, could potentially reduce net coverage problems in the next census. For example, the relatively high undercoverage rate of black men ages 20 to 54 in the 1970 census motivated the implementation of several coverage improvement programs in the 1980 census. One such program, the nonhousehold sources program, which looked for names on certain administrative lists that did not match to census records, aimed to reduce differential coverage—that is, the difference between net coverage for a specific demographic group compared with that for the nation as a whole. However, the information that these coverage measurement programs provided was not very specific for identifying which components of the census process needed modification to address the measured undercoverage. Adjustment Starting with the 1980 census, an additional use was proposed for coverage evaluation programs, which was to use the information to adjust the census for

OCR for page 5
Research and Plans for Coverage Measurement in the 2010 Census: Interim Assessment: Panel on Coverage Evaluation and Correlation Bias in the 2010 Census undercoverage.2 That is, the alternative counts produced through coverage measurement were to be used for some or all of the purposes to which census counts are applied, especially reapportionment and redistricting. This possibility was a result of technical improvements in coverage measurement methodology (discussed below) that allowed for estimation of net coverage error at the level of states. It was also due to the increased importance given to the uses of census data, such as to support general revenue sharing and redistricting. This use of coverage measurement information was also proposed prior to the 1990 and the 2000 censuses. However, adjustment has proved to be controversial, and to date adjusted counts have not been used for any official purposes, with the exception that the population controls for the Current Population Survey (and possibly some other surveys) were adjusted for census undercoverage during the 1990s. Approaches to Coverage Measurement The approach currently employed as the primary method for coverage measurement was introduced for the 1980 census, when it provided the first subnational geographic information on census undercoverage. A postenumeration survey (PES) is used to collect data on households from a random sample of census block clusters, referred to as the P-sample (arrived at independently of the census Master Address File or MAF). The responses from the survey are matched to the census enumerations to assess, for each of hundreds of population groups throughout the United States called poststrata, the rate at which individuals in the P-sample match to the census. The match rate for a poststratum serves as an estimate for the proportion of the true population captured in the census. In addition, the census enumerations in the P-sample blocks (referred to as the E-sample) are checked to estimate, again by poststratum, the proportion of census enumerations that are correct. The match rate and the correct enumeration rate are then used to estimate the rate of net coverage in each of the poststrata. The specifics of the methodology are more complicated than this outline suggests. There are correlation bias and unmodeled local heterogeneous effects. (This report makes frequent mention of correlation bias, which is a bias in estimating the number of people missed by both the postenumeration survey and the census. This bias results from a departure from either the assumption of homogeneity of the enumeration propensities in the census and in the postenumeration survey, or from the assumption of independence of the two enumeration processes.) There is also unit and item nonresponse in both the census and the postenumeration survey due to a lack of full cooperation and because people move during census-taking, and there is also misresponse for a variety of reasons. Treatment of these issues greatly complicates the estimation of net undercoverage. Innovative approaches to deal with these complications have had varying degrees of success. The estimation method that depends on the matching of two independent attempts to count a population is referred to as dual-systems estimation (DSE), and it has been the 2 There were also post hoc proposals to adjust the 1970 census using synthetic estimates based on demographic analysis for intercensal purposes (e.g., Trussell, 1981).

OCR for page 5
Research and Plans for Coverage Measurement in the 2010 Census: Interim Assessment: Panel on Coverage Evaluation and Correlation Bias in the 2010 Census cornerstone of the Census Bureau’s coverage measurement efforts since 1980. The postenumeration survey and the DSE method that were used in the 2000 census are referred to as A.C.E.. An excellent description of A.C.E. methodology is given in U.S. Census Bureau (2003). Even before the introduction of DSE, the Census Bureau began using demographic analysis to estimate net coverage error for demographic groups classified by age, sex, and black or nonblack.3 Demographic analysis constructs an estimate of the population count at the census date for comparison with the census count using demographic accounting relationships. The data sources for demographic analysis include vital statistics for births and deaths, administrative data on immigration and the elderly, as well as analytic estimates developed from previous censuses and various surveys. Demographic analysis can provide a useful alternative to DSE in measuring the national net coverage for the indicated demographic groups and providing measures of differential undercount for some demographic groups (age groups, sexes, and some race groups). However, because of limitations in the accuracy and precision of measures of internal migration, demographic analysis cannot be used to provide subnational estimates of undercount. Furthermore, demographic analysis cannot provide reliable estimates of net undercoverage for Hispanic populations, due to limitations in vital records and immigration data. A.C.E., ADJUSTED CENSUS COUNTS, AND THE 2000 CENSUS Detailed Investigations of A.C.E. The understanding of Census Bureau officials, leading up to the 2000 census, was that if A.C.E. could provide reliable estimates of net coverage error for poststrata and if the estimated net error differed appreciably by poststrata, then adjusted population counts from A.C.E. would be used for redistricting and for other official purposes. The use of adjusted census counts for apportionment had already been precluded by the Supreme Court decision of 1999, which prohibited the use of sampling methods to produce counts for that purpose (Department of Commerce v. United States House of Representatives, 525 U.S. 316). As it turned out, however, several problems—including discrepancies between the initial A.C.E. counts and estimates from demographic analysis, concerns associated with balancing error (discussed below), the uncertain impact of a substantial number of late additions to the census, and the validity of whole-household imputations—led to a recommendation by the Census Bureau and the secretary of commerce’s decision on March 6, 2001, not to use A.C.E. counts for redistricting. 3 The first study of decennial census undercoverage assessed by demographic analysis may have been that of Price (1947) for the 1940 census. Other early applications are Coale (1955) for the 1950 census, and Siegel and Zelnik (1966) for the 1960 census.

OCR for page 5
Research and Plans for Coverage Measurement in the 2010 Census: Interim Assessment: Panel on Coverage Evaluation and Correlation Bias in the 2010 Census The Census Bureau subsequently carried out research to determine the sources and magnitudes of error in the 2000 census, A.C.E., and demographic analysis. It also collected additional information relevant to specific concerns regarding A.C.E. The Census Bureau evaluated the extent of person duplication, and, on a sample basis, collected additional information to identify the correct residence for E-sample enumerations and to determine the correct match status for P-sample cases. This research included (1) the Evaluation Follow-Up Study, which involved reinterviewing a subsample of 70,000 people in E-sample housing units in 20 percent of the A.C.E. block clusters to determine the correct residents on Census Day (with additional clerical review of 17,500 people who were unresolved) and (2) the Person Duplication Study, which involved nationwide computer matching of E-sample records to census enumerations using name and date of birth. This nationwide search therefore permitted the first determination of the extent of remote duplication in the census, that is, duplication in which both housing units do not reside in the postenumeration survey block cluster. The Census Bureau also examined the implementation of the targeted extended search for matches to P-sample cases (extended search means searching outside of the relevant P-sample block cluster for a match for situations in which there is a likely error in identifying the correct block), the match rate and correct enumeration rate for people who moved during the data collection for A.C.E., and the impact of census imputations. As a result of these very detailed investigations, the Census Bureau judged that A.C.E. counts substantially underestimated the rate of census duplication and hence tended to overestimate the true population size. (Mule, 2003, estimates a total of 9.8 million duplicates in the 2000 census.) The Census Bureau subsequently released revised A.C.E. estimated counts on October 17, 2001, which are referred to as A.C.E. Revision I counts. However, the Census Bureau recommended that these adjusted counts not be used for the allocation of federal funds or other official purposes. Between October 2001 and March 2003, the Census Bureau undertook further review of all the data collected in the census and the A.C.E., as well as the subsequent matching and checking for enumeration status in the A.C.E. It also increased the A.C.E. estimates of the number of black males—which contained some apparent discrepancies that may have been related to correlation bias—based on matching sex ratios from demographic analysis. The result of this effort is referred to as A.C.E. Revision II, along with a more extensive assessment of the error in the demographic analysis. On the basis of this work, on March 12, 2003, the Census Bureau announced that the A.C. E. Revision II counts, the final effort at coverage measurement in 2000, would not be used to produce intercensal population estimates. In its final report (National Research Council, 2004b), the Panel to Review the 2000 Census generally agreed with the decisions made at each stage of this three-stage process, namely not to use the A.C.E. counts—either the original, Revision I, or Revision II—for purposes of redistricting, fund allocation, or other official purposes or for purposes of intercensal estimation. However, the NRC panel was not in complete

OCR for page 5
Research and Plans for Coverage Measurement in the 2010 Census: Interim Assessment: Panel on Coverage Evaluation and Correlation Bias in the 2010 Census agreement with the supporting arguments of the Census Bureau. The specific arguments made by that panel, a much more detailed description of A.C.E. and the various evaluation studies, and the material on which this abbreviated history is based can be found in The 2000 Census: Counting Under Adversity (National Research Council, 2004b: Chapters 5-6). Extensive Documentation of the A.C.E. Process As a by-product of this intensive effort to understand whether adjusted counts were preferable to unadjusted counts for various purposes, the Census Bureau produced comprehensive documentation and evaluation of the A.C.E. processes. A considerable amount of material is available to those interested in more information. Evaluations supporting the March 2001 decision can be found at http://www.census.gov/dmd/www/EscapRep.html, evaluations supporting the October 2001 decision can be found at http://www.census.gov/dmd/www/EscapRep2.html, and evaluations supporting the March 2003 decision can be found out at http://www.census.gov/dmd/www/Ace2.html. Collectively, these reports document the A.C.E. procedures in detail, examining what was learned about the quality of A.C.E. and A.C.E. Revisions I and II through the additional information collected. PLANNING FOR 2010 Shift in the Purpose of Coverage Measurement The 2000 census demonstrated the great time and effort required to carefully collect data from the postenumeration survey, follow up the nonmatching cases, compute adjusted counts, and assess their quality in comparison to the census counts. On the basis of that experience, the Census Bureau concluded that there would not be the time needed to perform coverage measurement for adjustment of the counts used for redistricting by the mandated date of April 1, 2011, one year after census day (see, e.g., National Research Council, 2004b: 267). Although that decision did not rule out the possibility of adjustment for other purposes (e.g., intercensal estimates, fund allocation), it dramatically shifted the focus of the coverage measurement program for 2010. Even so, this shift does not reduce the importance of conducting a high-quality coverage measurement program. Evaluating the accuracy of coverage will remain at least as important in 2010 as it has been in previous censuses (perhaps more so, given such innovations as plans to delete duplicates in real time). In addition, the panel thinks that increased attention should be paid to the use of coverage measurement to inform efforts to improve census processes for the future. This shift does have implications for the desired output of coverage measurement. Census adjustment requires accurate estimates of net coverage at various levels of geography and for other population divisions, but given those estimates, information

OCR for page 5
Research and Plans for Coverage Measurement in the 2010 Census: Interim Assessment: Panel on Coverage Evaluation and Correlation Bias in the 2010 Census about components of error—the numbers of omissions, erroneous enumerations, duplicates, and enumerations in the wrong place—is basically irrelevant. If DSE concludes that there was a 1 percent net undercoverage for some group, the adjustment is the same whether that net undercoverage resulted from 3 percent omissions less 2 percent erroneous enumerations or from 8 percent omissions less 7 percent erroneous enumerations. For that reason, past implementations of DSE have not been designed to separately estimate the numbers of omissions and erroneous enumerations. In contrast, any evaluation of census quality should take into account information about both net coverage and components of error. The two scenarios mentioned in the last paragraph would lead to very different conclusions about the overall quality of the basic census processes, as well as confidence about any conclusions from coverage measurement. Likewise, information about specific components of error is critical to the use of coverage measurement to inform efforts to improve census processes. In this case, it is important not only to know the frequency of specific types of errors, but also to identify those cases accurately in the coverage measurement sample, so that errors can be linked to specific census processes for subsequent analysis. As a consequence of this change in the primary purpose of coverage measurement, that is, in support of census process improvement, the Census Bureau is putting much greater emphasis in 2010 on measuring the components of coverage error. However, the 2010 census coverage measurement (CCM) program will again rely on a postenumeration survey as the primary data collection in support of census coverage evaluation. Given the new focus on estimation of rates of census component coverage error and on developing a feedback loop in support of census improvement, while keeping in mind that all three goals of coverage measurement above remain important, three questions are raised that this panel has considered and will address more completely in its final report: How well can the goal of census component error measurement be met using an approach that was initially developed to measure net coverage? What modifications to the A.C.E. sample design would provide a CCM sample design that is more effective for this new purpose? How well can components of census coverage error be linked to the associated census processes? There is also the real possibility that decennial census management information data, which may not be routinely saved, may be useful in providing additional information on the functioning of some specific historically problematic census component processes. We also point out that an advantage of a postenumeration survey is its omnibus nature, providing information for any unanticipated problems in the census. It is also not yet clear what analyses will be most useful in diagnosing census deficiencies, given the data that are available.

OCR for page 5
Research and Plans for Coverage Measurement in the 2010 Census: Interim Assessment: Panel on Coverage Evaluation and Correlation Bias in the 2010 Census In summary, this change in the focus of coverage measurement has implications for CCM sample design, data collection more broadly, the production of coverage measurement statistics and databases, and subsequent data analysis. These questions are given some consideration in this report, and we will provide additional advice in our final report. Problems with A.C.E. in 2000 In addressing these and other questions, the panel has taken into consideration the previous findings of both the Census Bureau and the Panel to Review the 2000 Census regarding the limitations of the A.C.E. design for addressing both the previous goal of estimating net coverage and the new goal of measuring components of census coverage error. First and foremost, inadequate information collected as part of the census and the PES allowed too many mistakes in the A.C.E. final determination of Census Day residence. This problem was demonstrated most vividly for duplicates, but only learned well after the PES operation had been completed. Consequently, even when duplicates were identified, there was generally no basis for selecting one location as the place of the correct enumeration. In addition, there was some evidence that A.C.E. underestimated the number of omissions. (For details, see U.S. Census Bureau, 2003.) Demographic analysis provided evidence of correlation bias for at least black men. However, it is unclear whether the correlation bias “correction” applied to counts for black men was successful. Furthermore, due to the lack of data on ethnicity in vital statistics, this approach was not available for nonblack Hispanics, a group that might be expected to have similar levels of correlation bias. (For details, see Bell, 2001, and Haines, 2002.) The approach taken to estimate net census coverage error relied on balancing erroneous enumerations against omissions in cases in which there was insufficient information to match E-sample and P-sample cases. Consequently, A.C.E. was not effective at estimating components of census error. (For details, see Adams and Liu, 2001.) The poststratification in A.C.E. (which tries to partition the U.S. population into relatively homogeneous subgroups to reduce correlation bias) was constrained to use a very limited number of variables. Because the approach cross-classified many of the factors, each additional factor greatly increased the number of poststrata and correspondingly reduced the sample size per poststratum.4 (For details, see U.S. Census Bureau, 2003.) The remainder of this report is primarily concerned with the Census Bureau’s plans to address problems and the panel’s assessment of those plans. 4 In fact, collapsing of poststrata was needed because many of the cross-classified cells had such small sample sizes.

OCR for page 5
Research and Plans for Coverage Measurement in the 2010 Census: Interim Assessment: Panel on Coverage Evaluation and Correlation Bias in the 2010 Census How the Plans for the 2010 Census Differ from the 2000 Census Any consideration of changes to coverage measurement plans for 2010 should account for how plans for the 2010 census differ from those for the 2000 census. We list six major differences here; more details are provided in Chapter 3. With the American Community Survey, a continuous implementation of the census long form, now in full operation, it is anticipated that the 2010 census will use the short form only. Current plans are for field staff to use handheld computing devices during nonresponse follow-up for data collection, data transmission, real-time editing and error correction, and navigation to assignments. There are currently efforts to improve both the Census Bureau’s MAF and its geographic referencing system, TIGER (Topologically Integrated Geographic Encoding and Referencing). The Census Bureau plans to add selected coverage improvement questions to the short form asking whether there are alternative households in which someone may have been enumerated and whether there were any other people who sometimes live in the household.5 Use of the coverage follow-up (CFU) interview will be greatly expanded compared with 2000. Additional households that are planned to be followed up in 2010 include households with a possible duplicate enumeration, other addresses at which at least one resident sometimes lives, and those with other people who sometimes live in the household. This additional data collection close to the time of the CCM interviews may pose a contamination threat (i.e., the CCM interview may affect the census in the CCM blocks, making the CCM blocks unrepresentative), so the Census Bureau has asked the panel to examine a number of ways of addressing this possible problem. This is addressed in Chapter 3. Using information from the main census returns and the CFUs, the Census Bureau plans to delete from the census households persons identified as duplicates counted in the wrong place. INITIATIVES FOR IMPROVEMENTS IN THE 2010 COVERAGE MEASUREMENT PROGRAM In response to the change in the objectives for coverage measurement in 2010, the various limitations of A.C.E. to address those objectives, and the changes currently planned for the 2010 census in relation to the 2000 census, the coverage measurement staff of the Census Bureau, and decennial staff in general, has undertaken several important initiatives likely to improve the coverage measurement program for 2010. These include 5 The addition of these questions has been cognitively tested, and a report on this from the Census Bureau is expected soon.

OCR for page 5
Research and Plans for Coverage Measurement in the 2010 Census: Interim Assessment: Panel on Coverage Evaluation and Correlation Bias in the 2010 Census Estimating (and adjusting for) correlation bias. The Census Bureau has made only limited progress to date to directly address this difficult problem. However, the additional data collection mentioned above, other potential improvements in coverage measurement, and the use of logistic regression modeling (discussed below) provide some hope for reducing the size of correlation bias. Whether the Census Bureau will implement the correction based on sex ratios used in 2000 is unclear. Estimating components of census coverage error. The Census Bureau has produced an excellent report on the definition of census coverage component error and its measurement, “Framework for Coverage Error Components” (U.S. Bureau of the Census, 2005). This report greatly clarifies what component errors are to be measured and the assumptions underlying their measurement. In addition, the Census Bureau has several research initiatives to improve measurement of the rates of census component coverage errors. These include reducing matching error through collection of additional information on people’s residence and through attempts to match people with very limited information (discussed below). In addition, the Census Bureau views the estimation of remaining matching error as a large missing data problem, and it may apply multiple imputation techniques to provide better estimates. However, little work on this has been initiated to date. Finally, the Census Bureau would like to incorporate estimates of census omissions that take into account correlation bias when it estimates census component coverage error. This is a particularly challenging problem. Improving net coverage estimation. The Census Bureau has been developing an alternative approach to net coverage measurement by replacing measurement at the poststratum level of average match rate and average correct enumeration rate with logistic regression models of both match and correct enumeration probabilities at the level of the individual person. This approach accommodates more predictive factors than poststratification, and it allows use of continuous predictors. Also, this alternative to poststratification accommodates greater heterogeneity in match rates and correct enumeration rates. Using this approach may provide improvements through (1) reduction of bias through more flexible variable selection, (2) more options for handling missing data, and (3) reduction of unmodeled local heterogeneous effects. (For details, see Griffin, 2005a.) Designing the CCM sample of block clusters. Although the design of the CCM data collection will, in many respects, approximate the design of the A.C.E., the CCM might differ from the A.C.E. in terms of the design for sampling block clusters, to better support the new objectives of CCM in 2010. Finally, measuring residency status more reliably in the census. Through revisions of the census questionnaire and the CFU interview, the Census Bureau will be collecting more information on possible alternative households in which someone may have been enumerated, as well as more information on possible duplicate enumerations. The anticipated result is more reliable matching, more reliable assessment of correct enumeration status, and more reliable assessment of duplicate status. This additional data collection puts a large demand on census

OCR for page 5
Research and Plans for Coverage Measurement in the 2010 Census: Interim Assessment: Panel on Coverage Evaluation and Correlation Bias in the 2010 Census field staff, and it may have implications for the timing of various census operations. It is therefore unclear how much of this additional data collection will be feasible in 2010. PANEL CHARGE AND APPROACH To recapitulate: the Census Bureau is planning, in 2010, to return to a previous, although substantially expanded objective for CCM, which is to assess the amount of census component coverage error, both to inform users as to the quality of the census counts, but more importantly to support examination of ways of improving census-taking for the next census. To provide more targeted information for the latter purpose of improving census-taking over time, the ultimate hope is to attribute the various types of census errors to particular census processes and, as a result, to concentrate efforts for improvement on the parts of the census that are most in need. At the Census Bureau’s request, the National Academies established the Panel on Coverage Evaluation and Correlation Bias in the 2010 Census to examine coverage measurement plans for 2010. The panel’s charge reads as follows: This project involves a study of four issues concerning census coverage estimation with the goal of developing improved methods for use in evaluating coverage of the 2010 census. A panel of experts will conduct the study under the auspices of the Committee on National Statistics of the Division of Behavioral and Social Sciences and Education. The panel is charged to review Census Bureau work on these topics and recommend directions for research. The panel's work may require development of statistical models to extend the DSE approach, and may also include suggestions for the use of auxiliary data sources such as administrative records. DSE, as applied to the 1990 and 2000 censuses, had several benefits as well as limitations as a means for estimating net census coverage. Some of the limitations were The approach was designed for estimating net census coverage errors and did not provide accurate estimates of gross coverage errors, i.e., of gross census omissions separate from gross census erroneous enumerations. In the DSE approach applied in the 1990 and 2000 censuses, certain census enumerations classified as erroneous were balanced against certain coverage survey cases classified as nonmatches (census omissions) for the purpose of estimating net census coverage. Some of these paired census enumerations and coverage survey cases did not necessarily reflect gross errors. The application of DSE in A.C.E. Revision II during the 2000 census accounted for duplicates found in the census in a simplistic way due to lack of information as to which member of a duplicate pair was a correct enumeration and which was an erroneous enumeration. This led to

OCR for page 5
Research and Plans for Coverage Measurement in the 2010 Census: Interim Assessment: Panel on Coverage Evaluation and Correlation Bias in the 2010 Census estimation error, as did the simplistic treatment of A.C.E. cases (P-sample) that matched to census enumerations outside the search area. The post-stratification approach used to apply the DSE had certain limitations. First, the number of factors that could be included in the post-stratification was limited because the approach cross-classified the factors, so that each factor added to the post-stratification greatly split the sample. (Collapsing of post-strata was needed because many of the cross-classified cells had small sample sizes.) Second, the synthetic error that arose from the synthetic application of the post-stratum coverage correction factors to produce estimates for subnational areas and population subgroups was not reflected in their corresponding variance estimates. Comparisons of aggregate tabulations of DSEs with estimates from demographic analysis (DA), in both 1990 and 2000, suggested underestimation by DSE of persons missed by both the census and the coverage survey (correlation bias). In the 2000 A.C.E. Revision II, sex ratios from DA were used to determine factors to correct adult male estimates for correlation bias, assuming no correlation bias for children and adult females. This approach appeared effective for adult blacks, but there were concerns about the appropriateness of its assumptions for other race/origin groups (particularly Hispanics). Also, DA totals for young children (0-9) exceeded the corresponding aggregated DSEs from A.C.E. Revision II by a sufficient amount to suggest possible correlation bias in estimates for young children. The Census Bureau is interested in improving the DSE methodology to address the above issues to the extent possible, to develop improved methods for estimating coverage of the 2010 census both in regard to net errors and gross errors. We interpret the charge to the panel as follows: to evaluate the Census Bureau’s plans and to provide suggestions and recommendations for changes and additions to those plans, in determining how coverage measurement and related activities might be used to measure the components of census coverage error and thereby assess the role of the various census component processes in contributing to coverage error. The original charge to the panel had three areas of focus: (1) the treatment of duplicates, (2) the use of alternative approaches to poststratification, especially model-based alternatives, and their impact on the ability to model local heterogeneous effects, and (3) the use of demographic analysis to correct for correlation bias. It was understood from the outset that the panel’s work might involve assistance in the development of statistical models to modify or extend the dual-systems approach, and it might also include suggestions for the use of auxiliary data sources, such as administrative records, apart from their use in demographic analysis. While these areas are still of interest to the Census Bureau and to the panel, since the panel has started its work, the needs of the Census Bureau in the area of coverage evaluation have broadened. As a result, the panel has also been asked to review and examine additional issues related to coverage evaluation not explicitly mentioned in the original charge.

OCR for page 5
Research and Plans for Coverage Measurement in the 2010 Census: Interim Assessment: Panel on Coverage Evaluation and Correlation Bias in the 2010 Census Specifically, the panel has been asked to (a) examine the Census Bureau’s draft document providing a framework for the definition of component errors and estimation of their rates of occurrence, (b) examine the possibility of estimating the match status of cases previously categorized as having insufficient information for matching, in order to reduce the number of cases identified as erroneous enumerations due solely to item nonresponse, and (c) assess various alternatives that reduce or avoid the contamination likely to result from the similarity and simultaneity of the census CFU and the PES interviews in 2010. The Census Bureau has also asked for the panel’s views on a number of other issues, including the CCM postenumeration survey design and the form of the census CFU interview and the CCM initial and follow-up interviews. In addition, the Census Bureau is interested in having the panel look at other issues listed above as limitations for A.C.E. in addressing the new goals for coverage measurement in 2010, suggesting alternatives that could be implemented in time for 2010. Finally, part of this review is to evaluate the broad research priorities of coverage evaluation at the Census Bureau, leading up to the 2010 CCM, and to provide advice as to whether the priorities should be altered in light of the broader goals described above. The general data collection and matching operations of the 2010 CCM are taken as fixed. That is, we take as given that the CCM program will include a sizable postenumeration survey that will be matched to the census to assess match status for a sample of census block clusters. Given this, the panel is examining the alterable aspects of the data collection for the 2010 coverage measurement program, including sample design, to see if improvements can be recommended. The panel will not address the broader issue of what type of coverage measurement program, that is, what alternative to CCM, would best support improvement of census-taking over time. Furthermore, all the data retained from the 2010 census—not only the postenumeration survey and matching results, but also data collected by the various management information and quality assurance systems that monitor census processes— could affect the coverage measurement models that could be developed. Therefore, the panel will also advise on what data should be retained from the 2008 census test and the 2010 census. The panel also asserts that many of the design questions for the 2010 census and its coverage measurement program must be further informed through greater use of the data collected in 2000. We also consider how the Census Bureau can further exploit those data to improve the CCM design. The possibility remains that there will be a sizable differential undercount in 2010. One such scenario would arise if the 2010 census design is very effective in deleting duplicates in real time, but no more effective than the 2000 census in reducing census omissions. The result could then be a substantial differential undercount that one would like to reduce through the use of modified counts. We view a substantial

OCR for page 5
Research and Plans for Coverage Measurement in the 2010 Census: Interim Assessment: Panel on Coverage Evaluation and Correlation Bias in the 2010 Census differential undercount as an unlikely contingency, but what would be done in that event is deserving of greater consideration by the Census Bureau. Finally, the Census Bureau’s current program for research on coverage measurement is not as comprehensive as might be desired. The panel has therefore slightly expanded our scope in this report by suggesting additional activities that would support component census coverage error measurement. By doing this, we hope to encourage the Census Bureau to allocate greater resources to this effort in the years remaining prior to 2010. ORGANIZATION OF THE REPORT Following this introduction, this report consists of three chapters. Chapter 2 defines the components of census error, describes how census errors are measured through the use of DSE and demographic analysis, and then outlines the three purposes of census coverage measurement: the measurement of census quality, census process improvement, and potential census adjustment. Chapter 3 describes and assesses the Census Bureau’s current research program on coverage evaluation. It begins by listing the limitations of the 2000 A.C.E. for measuring component census errors and describing differences between the 2010 and 2000 census plans as well as plans for the coverage evaluation program in the 2006 test census. Next it describes the major topics of the current coverage evaluation research program, including measuring components of census error, models for net coverage error, contamination due to the extension of the CFU interview, the sample design for the CCM postenumeration survey, and use of the E-StARS administrative records system in coverage measurement. Chapter 4 describes the value of integrating census process data, and person, household, and area characteristics data, with census component coverage error data. It further argues that 2000 A.C.E. data can still be used to inform the design of the coverage measurement program in 2010. Finally, the issue of user requirements for documentation and tabulation of census coverage errors in 2010 is raised.