3
ASSESSMENT OF THE CENSUS BUREAU’S CURRENT RESEARCH PROGRAM FOR COVERAGE EVALUATION IN 2010

The Census Bureau is currently engaged in a number of important research initiatives that they expect will improve their coverage evaluation program in the 2010 census. Part of this research effort has been focused on the design of the coverage evaluation programs for the 2006 census test and for the 2008 dress rehearsal, the program for the dress rehearsal representing the last major opportunity to test plans for coverage evaluation prior to the 2010 census. In particular, the Census Bureau has devoted considerable energies to researching new methods that would be effective in the measurement of components of census coverage error in the 2010 census.


In this chapter, we describe and assess both the census test design in 2006 and the other major activities of the coverage evaluation research program. We introduce this by comparing the plans for the 2010 and 2000 censuses and then describing the limitations of Accuracy and Coverage Evaluation (A.C.E.) in measuring census component coverage error. Following the Census Bureau’s terminology, we refer to the 2010 coverage evaluation program as census coverage measurement, or CCM.

HOW THE 2010 CENSUS DIFFERS FROM THE 2000 CENSUS

The 2010 census has an innovative design, resulting in a census that differs from its predecessor as much as any since the incorporation of mailout-mailback data collection in 1970. Furthermore, the design for the 2010 census is dramatically different from the 2000 census in ways that will appreciably affect the 2010 coverage evaluation program. In this section we outline how the 2010 census will differ from the 2000 census and how those changes are likely to affect CCM.


The primary differences between the 2000 and 2010 census designs, as currently planned, are



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 32
Research and Plans for Coverage Measurement in the 2010 Census: Interim Assessment: Panel on Coverage Evaluation and Correlation Bias in the 2010 Census 3 ASSESSMENT OF THE CENSUS BUREAU’S CURRENT RESEARCH PROGRAM FOR COVERAGE EVALUATION IN 2010 The Census Bureau is currently engaged in a number of important research initiatives that they expect will improve their coverage evaluation program in the 2010 census. Part of this research effort has been focused on the design of the coverage evaluation programs for the 2006 census test and for the 2008 dress rehearsal, the program for the dress rehearsal representing the last major opportunity to test plans for coverage evaluation prior to the 2010 census. In particular, the Census Bureau has devoted considerable energies to researching new methods that would be effective in the measurement of components of census coverage error in the 2010 census. In this chapter, we describe and assess both the census test design in 2006 and the other major activities of the coverage evaluation research program. We introduce this by comparing the plans for the 2010 and 2000 censuses and then describing the limitations of Accuracy and Coverage Evaluation (A.C.E.) in measuring census component coverage error. Following the Census Bureau’s terminology, we refer to the 2010 coverage evaluation program as census coverage measurement, or CCM. HOW THE 2010 CENSUS DIFFERS FROM THE 2000 CENSUS The 2010 census has an innovative design, resulting in a census that differs from its predecessor as much as any since the incorporation of mailout-mailback data collection in 1970. Furthermore, the design for the 2010 census is dramatically different from the 2000 census in ways that will appreciably affect the 2010 coverage evaluation program. In this section we outline how the 2010 census will differ from the 2000 census and how those changes are likely to affect CCM. The primary differences between the 2000 and 2010 census designs, as currently planned, are

OCR for page 32
Research and Plans for Coverage Measurement in the 2010 Census: Interim Assessment: Panel on Coverage Evaluation and Correlation Bias in the 2010 Census A short-form only census. The Census Bureau has now fielded the American Community Survey (ACS), which is a continuous version of the decennial census long form. Therefore, under current plans there will be no long form in the 2010 census. This reduces respondent burden and will facilitate several aspects of data collection in the census, including data capture, data editing and imputation for nonresponse, the work of follow-up enumerators, and the management of foreign language forms and foreign language assistance. As a result, this change is likely to improve data quality. Use of handheld computing devices for nonresponse follow-up. The enumerators that follow up nonrespondent households will now use a handheld computing device to (1) administer the census questionnaire (computer-assisted personal interviewing), (2) edit the responses in real time, (3) collect, save, and transmit the data to census processing centers, (4) help locate residences through the use of computer-generated maps (and possibly geographic coordinates), and (5) possibly help organize enumerator routes. Improved MAF/TIGER system. The Master Address File (MAF) has been identified as being deficient. For example, see National Research Council (2004b: Finding 4.4). There are currently efforts to improve, for 2010, both the Census Bureau’s MAF and its geographic database, the TIGER (Topologically Integrated Geographic Encoding and Referencing) system. The MAF provides a list of household addresses, and TIGER is used to associate each address on the MAF with a physical location. The MAF/TIGER Enhancement Program includes (1) the realignment of every street and boundary in the TIGER database; (2) development of a new MAF/TIGER processing environment and the integration of the two previously separate resources into a common technical platform; (3) expansion of geographic partnership programs with state, local and tribal governments, other federal agencies, the U.S. Postal Service, and the private sector; (4) implementation of a program to use ACS enumerators to generate address updates, primarily in rural areas; and (5) use of periodic evaluation activities to provide quality metrics to guide corrective actions (Hawley, 2004). One motivation for this initiative was the recognition by the Census Bureau that many census errors and inefficiencies in 2000 resulted from errors in the Master Address File and in the information on the physical location of addresses. Coverage follow-up interview. The Census Bureau is greatly expanding the percentage of housing units that will be administered a coverage follow-up (CFU) interview in 2010, in comparison to those in 2000 who were administered either the Coverage Edit Follow-up (CEFU) or the Coverage Improvement Follow-up (CIFU) interviews. CEFU was used to determine the correct count and characteristics for people in households with more than six residents (since the census form had space for information for only six persons), and the correct count for households with count discrepancies (e.g., differences between the number of separate people listed on the questionnaire and the indicated total number of residents). CIFU was used to determine whether addresses that were initially judged as being vacant were in fact vacant. The expansion of CFU over CEFU and CIFU was motivated by the recognition, partially provided by A.C.E., that confusion with residence rules made an important contribution to census coverage

OCR for page 32
Research and Plans for Coverage Measurement in the 2010 Census: Interim Assessment: Panel on Coverage Evaluation and Correlation Bias in the 2010 Census error. The CFU interview will be greatly expanded in 2010 to include not only those three situations, but also the following: (a) households with a possible duplicate enumeration identified by a computer match of the census returns to themselves, (b) other addresses at which at least one resident sometimes lived (to avoid enumerations in the wrong location), and (c) other people who sometimes lived in the household (to avoid undercoverage). The latter two situations will be detected through the addition of two “coverage probe” questions to the census form. However, due to resource and time constraints, the Census Bureau may be able to administer the CFU to only a subset of the qualifying households in 2010. The Census Bureau thinks that it may be able to follow up only 5 to 10 percent of the nation’s addresses for this purpose, but some preliminary estimates suggest that a larger percentage may satisfy one or more of these contingencies.1 In that case, the Census Bureau may have to prioritize by selecting a subset of the qualifying households that were more likely to provide information that would result in a less erroneous count. Implementation of this operation will depend on information collected in the 2006 test census and the 2008 dress rehearsal. Removal of duplicate enumerations in real time. As implied in (4) above, the CFU interview will be used to follow up suspected duplicate enumerations that are identified through use of a national computer search for duplicate enumerations, with the objective of determining which address of a pair of duplicates is the correct residence and consequently removing the erroneous duplicate enumeration from the census. This new census design has some benefits for the coverage measurement program in 2010. Focusing on the collection of short-form data and the use of handheld computing devices might improve the quality of the information collected, thereby improving the quality of the matching of the postenumeration survey (PES) to the census. Having an improved and more complete MAF should reduce the extent of whole-household undercoverage. Finally, the national search for and field verification of duplicate enumerations should reduce the number of duplicates in the census, which may facilitate the estimation of component errors in the census and may also simplify the application of the net coverage error models used in dual-systems estimation (DSE). So the changes to the 2010 census design are also likely to improve the quality of the coverage measurement information provided in 2010. It is important to emphasize that some of the changes to the 2010 census design were motivated by the results of the 2000 A.C.E. program. Specifically, the large number of erroneous enumerations, especially duplicates, motivated the expansion of the CFU interview, as well as the implementation of the national search for duplicates. Also, although not directly a finding from A.C.E., the recognition that the 2000 census Master Address File had a large number of duplicates and was otherwise of uncertain quality motivated some of the improvements of the MAF/TIGER system. 1 We think that reasonable estimates may already be possible given data from 2000 and the later census tests. For example, the 2004 census test indicates that categories (b) and (c) may sum to 11 percent or so.

OCR for page 32
Research and Plans for Coverage Measurement in the 2010 Census: Interim Assessment: Panel on Coverage Evaluation and Correlation Bias in the 2010 Census LIMITATIONS OF A.C.E. IN MEASURING COMPONENT COVERAGE ERROR A.C.E. in the 2000 census was planned from the outset as a method for adjusting census counts for net coverage error. Hence, A.C.E. focused on estimating net census coverage error rather than summaries of census component errors. For example, the limited geographic search for matches used in A.C.E. relied on the balancing of some erroneous enumerations and omissions that were actually valid E-sample enumerations but in the wrong location. Such errors could result, for example, from a geocoding error (placing an address in the wrong census geography) or enumeration of someone at a second home. Because such erroneous enumerations and omissions were expected to balance each other, on average, they were expected to have little impact on the measurement of net coverage error. Therefore A.C.E. did not allocate the additional resources that would have been required to distinguish these situations from entirely erroneous enumerations or omissions. Similarly, A.C.E. did not always distinguish between an erroneous enumeration and counting a duplicate enumeration at the wrong location. The following are limitations of A.C.E. in 2000 for measuring census component coverage error: Inadequate information collected as part of the census and the PES allowed too many mistakes in the A.C.E. final determination of Census Day residence. In 2000, comprehensive information was not collected from a household either in the census or in the A.C.E. interview regarding other residences that residents of a household often used or on other individuals who occasionally stayed at the household in question. This limited the Census Bureau’s ability to correctly assess residency status for many individuals. The Census Bureau intends to include more probes to assess residence status in the 2010 census questionnaire, in the census follow-up interview, and on the 2010 CCM questionnaires. Also, in 2010, the duplicate search will be done nationwide, not only for the PES population. In addition, the Census Bureau plans on incorporating a real-time field verification of duplicate enumerations in 2010. (For details on issues in determining correct residence, see U.S. Census Bureau, 2003.) Also, nonresponse in the E- and P-samples complicated matching of the P-sample to the E-sample (for coverage measurement) and of the E-sample to the census (to identify duplicates). It also complicated estimation because it interfered with assigning a person to the correct poststratum (under the 2000 design) or creates missing values for predictor variables (as discussed below, under the proposed use of logistic regression in 2010). (For details, see Mulry, 2002.) Furthermore, the missing data treatments used for individuals with extensive nonresponse failed to fully utilize the available data. Procedures are now being examined that make greater use of the available data, especially on household composition, to determine the match status of these individuals in 2010.

OCR for page 32
Research and Plans for Coverage Measurement in the 2010 Census: Interim Assessment: Panel on Coverage Evaluation and Correlation Bias in the 2010 Census Also, the methodology used for individuals who moved between Census Day and the day of the postenumeration interview (known as PES-C) resulted in a large percentage of proxy enumerations, which in turn resulted in matching error.2 (PES-C was implemented in 2000 due to the early plans, later cancelled, to use sampling for nonresponse follow-up in the 2000 census.) The Census Bureau will probably return in 2010 to the use of PES-B (similar to the 1990 methodology), which relies completely on information from the inmover. The A.C.E. Revision II estimates modified undercoverage estimates for adult black men using sex ratios from demographic analysis (ratios of the number of women to the number of men for a demographic group) to correct for correlation bias (for details, see Bell, 2001; Haines, 2002). This method assumes that the estimated adult sex ratios from demographic analysis are more accurate and precise than those from the A.C.E. For nonblack Hispanics, estimation of adult sex ratios requires a long historical series of Hispanic births and deaths and, more importantly, highly accurate data on the magnitude and sex composition of immigration (both legal and undocumented). The historical birth and death data for Hispanics are available only since the 1980s, and the available measures of immigration are too imprecise for this application. Consequently, this use of demographic analysis to modify A.C.E. estimates was not directly applicable to nonblack Hispanic males in 2000.3 The approach taken to estimate net census coverage error relied on balancing erroneous enumerations against omissions in cases in which there was insufficient information for matching E- and P-sample cases. Consequently, A.C.E. was not effective at estimating components of census error. Poststratification is used to reduce correlation bias (see description in Chapter 2), since it partitions the U.S. population into relatively homogeneous groups. The number of factors that could be included in the poststratification used in A.C.E. was limited because the approach fully cross-classified many of the defining factors, with the result that each additional factor greatly reduced the sample size per poststratum. (For details of the 2000 poststrata, see U.S. Census Bureau, 2003.) The 2010 plan uses logistic regression modeling to reflect the influence of many factors on coverage rates without having to define a large number of poststrata. Also, small-area variations in census coverage error that are not corrected by application of the poststratum adjustment factors to produce estimates for subnational domains (referred to as synthetic estimation) were not reflected in the variance estimates of adjusted census counts. The Census Bureau is examining the use of random effects in their adjustment models to account for the residual 2 PES-C collected information about whether a PES outmover household matched to the census through use of information about the outmover household (often using proxy information), but resulting matches were applied to the size of the inmover household rather than the size of the outmover household because the information on the number of inmovers was considered to be of greater reliability. 3 In support of this argument, it is useful to note that a majority of working-age (18-64) Hispanics are foreign-born—about 55 percent, whereas only less than 5 percent of whites and slightly more than 5 percent of blacks are.

OCR for page 32
Research and Plans for Coverage Measurement in the 2010 Census: Interim Assessment: Panel on Coverage Evaluation and Correlation Bias in the 2010 Census variation in small-area coverage rates beyond what is modeled through synthetic estimation. In addition to these issues, other features of A.C.E., including aspects of data collection and sample design, made the 2000 A.C.E. less informative than it might have been in measuring census component coverage errors. As stated above, this was only to be expected given the focus of A.C.E. on producing adjusted census counts, well justified by the desire to remedy long-standing patterns of differential undercoverage of minorities in the census. However, with the new priority of measuring census component coverage error, a number of design and data collection decisions, within the general framework of PES data collection, remain open to modification. Furthermore, as we argue below, estimation of net census error also remains important for assessment of census component coverage error, specifically census omissions. PLANS FOR COVERAGE EVALUATION IN THE 2006 CENSUS TEST The goals for the 2006 test census relevant to coverage evaluation were as follows (U.S. Census Bureau, 2004): To examine how the Census Bureau can improve the determination of Census Day residence in the CCM process through modification of the census questionnaire, the initial PES questionnaire, and the PES follow-up interview. This may be the most important problem facing coverage evaluation and the greatest opportunity for improvement, because the A.C.E. underestimated erroneous enumerations by 4.7 million people in 2000, and overestimated the P-sample population by 2.2 million, much of which was probably due to errors in enumerating people in their proper census location (see National Research Council, 2004b: 218 and 253, for details). A request for information on alternative addresses or additional part-time residents was not included in the 2000 census, which limited attempts to ascertain correct Census Day residence. To test procedures for determining more accurately the location of a person’s Census Day residence outside the P-sample blocks for P-sample inmovers and for people with multiple residences. To determine how the more extensive matching for duplicates and people with multiple residences (following up on information collected in the CFU interview) can be implemented with the anticipated resource and time constraints in 2010. To identify additional data to be collected on census processes in support of the measurement and analysis of census component coverage error. To measure possible contamination of the CFU interview by the (possibly simultaneous) collection of coverage measurement information and to assess the implications for CCM data collection and estimation. The coverage evaluation program of the 2006 test census began with a PES, after the census data collection was complete, in which computer-assisted personal interviews

OCR for page 32
Research and Plans for Coverage Measurement in the 2010 Census: Interim Assessment: Panel on Coverage Evaluation and Correlation Bias in the 2010 Census were administered to an independent sample of approximately 5,000 housing units (drawn from the same address list as the census) to determine Census Day residence. This was followed by an automated and then clerical match to the census enumerations, with field follow-up of those with unresolved match status. A person follow-up interview was conducted simultaneously with the PES to collect additional data to resolve residence status for various situations. Once matching was completed, the CCM program used DSE based on the usual E- and P-samples, except that the addresses for the P-sample were identical to those of the test census. This exception prevented the measurement of whole-household omissions in the test census. Movers between the time of the census and the collection of postenumeration survey (CCM) data utilized PES-B methodology, which counts the number of people resident in the CCM blocks at the time of the postenumeration survey rather than the number of people resident on Census Day. Information on the other locations at which a person might also have been counted was collected in a follow-up interview for households that indicated other residences on the census questionnaire. This was to assist in the assessment of correct residence and to better define omissions, erroneous enumerations, and duplications. The CCM person interviewing used a laptop for the initial interview, and unresolved matches were followed up with personal visits. For census returns that provided a phone number, the CCM interviews were carried out by telephone, as in 2000. CCM personal visits did not begin until nonresponse follow-up was concluded. However, CCM interviewing was simultaneous with the CFU follow-up interview. There was an automated and computer-assisted clerical search for P-sample matches and duplicate census enumerations at the Census Day residence location, as well as at other locations where the person may have been counted. There was also an automated search across all census enumerations in the test site both for P-sample matches and for duplicate census enumerations. There was an attempted match to census enumerations that had a missing or deficient name or were otherwise difficult to match due to limited information to better estimate components of census coverage error (see discussion of KEs below.) No weighting or imputation was carried out for missing data, and coverage estimates were not produced. Finally, the Census Bureau will explore various estimation methodologies to generate estimates of components of census coverage error and net coverage error, conditional on the limitations of the census test, to examine whether sufficient and consistent data are being collected. Unlike the case for decennial coverage measurement programs, no attempt was made to collect data to assess whole-household undercoverage. Also, no attempt was made to assess the undercoverage of individuals living in group quarters. (CCM is also planned to exclude group quarters, about 2.7 percent of the U.S. population, from coverage measurement in 2010. This is unfortunate, given the difficulty in counting the institutional population.) Data needed to estimate coverage error (both net coverage error and components of coverage error) for persons living in housing units will be assembled by census operation to support the linkage of census component coverage error with specific census operations.

OCR for page 32
Research and Plans for Coverage Measurement in the 2010 Census: Interim Assessment: Panel on Coverage Evaluation and Correlation Bias in the 2010 Census MAJOR ACTIVITIES OF THE CENSUS COVERAGE MEASUREMENT RESEARCH PROGRAM The CCM research program involves several activities grouped into the following categories: (1) research on measuring components of census error, which includes development of a framework for coverage measurement, matching of cases with minimal information, and identification of census duplicates in real time; (2) research on models for net error, including alternatives to poststratification and synthetic estimation; and (3) research on contamination due to the CFU interview. We also examined preliminary ideas of the Census Bureau regarding the design of the CCM postenumeration survey and the current application of E-StARS to coverage measurement; E-StARS is the Census Bureau research program examining possible applications of merged, unduplicated lists of administrative records. All of these research efforts support the objective in 2010 of measuring census component coverage errors. Matching cases with minimal information reduces the need to rely on imputation of match status and therefore more clearly determines whether those cases are errors and, if so, what type. The identification of duplicates clearly facilitates their estimation and reduces the estimated number of erroneous enumerations. Improved estimation of net error improves the estimation of the number of omissions. Finally, contamination of the CFU by the CCM interview could result in an unrepresentative census in the P-sample block groups and therefore bias the estimates produced by DSE. We now describe and comment on each of these areas of research in turn. RESEARCH ON MEASURING COMPONENTS OF CENSUS ERROR The Census Bureau’s Framework Paper In considering the measurement of erroneous enumerations, omissions, duplications, and enumerations in the wrong place, it became apparent that the definitions of these coverage errors needed clarification (see National Research Council, 2004b: 252). The Census Bureau therefore decided to develop a framework of precise definitions of census errors, as well as what assumptions supported their estimation, to better guide development of its coverage measurement plan for 2010. The resulting draft document “Framework for Coverage Error Components” (U.S. Census Bureau, 2005) is an excellent attempt to provide this foundation. This document defines erroneous enumerations as (1) duplicate enumerations, (2) people born after Census Day, (3) people who died before Census Day, and (4) people who are not residents of a housing unit in the United States. Omissions are people who should have been enumerated in the census but were not. Contrary to this, in A.C.E., which focused on net error, persons had to be enumerated in a housing unit within the

OCR for page 32
Research and Plans for Coverage Measurement in the 2010 Census: Interim Assessment: Panel on Coverage Evaluation and Correlation Bias in the 2010 Census search area of the residence (generally the relevant E-sample block cluster) to be considered correctly enumerated. In this new framework, the starting position is that persons must only be enumerated in a housing unit somewhere in the United States to be considered to be a correct enumeration. This definition of a correct enumeration used in the framework document is not Census Bureau policy; it is instead a useful starting point in developing a comprehensive and clear understanding of the measurement of census coverage error, with the expectation that the geographic dimension will be addressed in later expansions of the framework. The varying amount of information available for census enumerations complicates the classification of census errors. Data-defined enumerations are those with at least two recorded characteristics; others are non-data-defined enumerations. Among the former, some enumerations have sufficient information for matching and follow-up (complete name and two additional characteristics), and others have insufficient information. The non-data-defined and insufficient information cases could be either correct or erroneous enumerations, since the data are often insufficient to make any further determination. Finally, information to determine enumeration status is collected from the PES. The Census Bureau refers to the list of people that would be enumerated if the P-sample were applied nationally as the notional P-census. Thus, conceptually every potential enumeration falls into one of four cells: (1) those in both the P-census and the census, (2) those in the P-census but not in the census (census omissions), (3) those in the census but not in the P-census (erroneous enumerations and P-census omissions), and (4) those missed by both the P-census and the census. Potential E-sample cases include correct enumerations and erroneous enumerations but not non-data-defined people or census omissions. The A.C.E. definition of E-sample erroneous enumerations also includes (a) correct enumerations in the wrong location and (b) enumerations with insufficient information for matching. Measurement of census component coverage errors requires separate estimates of the number of enumerations that are in the wrong location and the number of enumerations with insufficient information that are actually erroneous. To assess the number of omissions, A.C.E. used the P-sample nonmatches, which under the new definitions could be omissions, people enumerated in the wrong location, or nondata-defined people. The challenge here in moving toward a focus on error components is to determine how many of those people were actually missed in the census. To provide high-quality estimates of census component coverage errors in 2010, the Census Bureau needs to make progress on two fronts. First, it must reduce the inflated estimate of erroneous enumerations. Enumerations with insufficient information need to be examined further, enumerations in the wrong place need to be identified as such, and the remaining unresolved cases need to be treated as nonrandomly missing data. Second, a better method is needed to estimate the number of people missed by both

OCR for page 32
Research and Plans for Coverage Measurement in the 2010 Census: Interim Assessment: Panel on Coverage Evaluation and Correlation Bias in the 2010 Census the P-sample and the census. The current approach assumes independence of correct enumeration status and match status within poststrata, and failure of that assumption results in correlation bias. Since net error is defined as omissions minus erroneous enumerations, one can estimate omissions by summing reliable estimates of net error and the number of erroneous enumerations. Since net error can be estimated by DSE minus the census, omissions may be estimated by taking the dual-systems estimate minus the census plus the number of erroneous enumerations. However, this estimation strategy needs to be improved through additional data collection to help distinguish enumerations in the wrong location and to better handle cases with insufficient information, as well as through better estimation of the number of people missed by both the PES and the census. The framework document also addresses how to estimate these various error components and what assumptions they are based on. Additional information will be collected in 2010 regarding other residences at which someone might have been counted to determine more accurately whether a nonmatched P-sample enumeration is actually an omission and which of a set of duplicates is the correct enumeration. Furthermore, there will be greater efforts made to match cases with “insufficient” information. Finally, missing data models will be developed to treat cases that are not data-defined. The panel supports the general approach described in this draft framework, which is consistent with recommendations in National Research Council (2004b). This is an important first step toward developing a feedback loop linking the measurement of census component coverage error to deficiencies in specific components of census processes. The panel has some concerns about the proposed treatment of imputations in the draft framework. A focus on the correctness of an imputation as an enumeration is misplaced, as are concerns about the correctness of imputations of characteristics. Imputations are simply the means to an end, which is improved census estimation, and it is the quality of the estimates collectively that should be assessed. For example, if a characteristic of a known person is imputed, the question of whether that is the person’s correct value is of no interest. The critical question concerns whether census estimates that involve that characteristic are collectively improved by the imputation, which will tend to be the case if the imputation model is sensible. The same principle applies to whole-person imputations. (This approach is compatible with a focus on components of error, since the measures used are for aggregates rather than individuals.) Finally, different errors may be important for different uses of the census numbers, so the framework should be sufficiently flexible to allow for aggregating component errors in more than one way. For example, for estimation of broad demographic distributions (to predict future Medicare enrollment), an error in age might be important, but misplacing a person geographically would be of little consequence. Conversely, for redistricting purposes, a person’s exact age is unimportant but

OCR for page 32
Research and Plans for Coverage Measurement in the 2010 Census: Interim Assessment: Panel on Coverage Evaluation and Correlation Bias in the 2010 Census geographical accuracy is critical. The panel hopes to examine this more in our final report. Matching Cases with Minimal Information In the 2000 census, for an enumeration to have sufficient information for matching and follow-up, it needed to include that person’s complete name and two other nonimputed characteristics. In A.C.E. in 2000, there were 4.8 million (sample survey weighted) data-defined enumerations with insufficient information for matching and follow-up, meaning that they contained two characteristics. These cases were coded as “KE” cases in A.C.E. processing, and we retain that terminology. A.C.E. estimation treated KEs as erroneous enumerations, and they were removed from the census enumerations prior to dual-systems computations. (If KEs are similar in all important respects to census enumerations with sufficient information for matching, removal from dual-systems computations increases the variance of the resulting estimates, but it does not greatly affect the estimates themselves.) Removal of KEs helped to avoid counting a person twice because matches for these cases are difficult to ascertain. Also, it was difficult to follow up these E-sample cases to determine their match status if they initially were not matched to the P-sample, because of the lack of information with which to identify the person to interview. However, some unknown and possibly large fraction of these cases were correct enumerations. Therefore, removing these cases from the matching inflated the estimate of erroneous enumerations, and it also inflated the estimate of the number of census omissions by about the same amount, since roughly the same number that are correct enumerations would have matched to P-sample enumerations. Given that the emphasis in 2000 was on the estimation of net census error, this inflation of the estimates of the rates of erroneous enumeration and omission was of only minor concern. However, with the new focus in 2010 on estimates of components of census error, there is a greater need to find alternative methods for treating KE enumerations. One possibility that the Census Bureau is currently exploring is whether many of these cases can be matched to the P-sample data using information on other household members. To examine this, the Census Bureau carried out an analysis using 2000 census data on 13,360 unweighted data-defined census records that were found to have insufficient information for matching, to determine whether some of them could be reliably matched. (For details, see Auer, 2004, and Shoemaker, 2005.) This clerical operation used name, date of birth, household composition, address, and other characteristics to match these cases to the P-sample. For the 2000 A.C.E. data, 44 percent of the KE cases examined were determined to match to a person who lived at the same address on Census Day and was not otherwise counted, with either “high confidence” or “medium confidence” (which are reasonable and objectively defined categories of credibility). For the 2000 census, this would have reclassified more than 2 million census enumerations from erroneous to correct enumerations, as well as a like number from P-sample omissions to matches, thereby greatly reducing the estimated number of census component coverage errors. For the remaining unresolved cases, the

OCR for page 32
Research and Plans for Coverage Measurement in the 2010 Census: Interim Assessment: Panel on Coverage Evaluation and Correlation Bias in the 2010 Census However, the panel would like the Census Bureau to explore a wider range of options in determining what model form and predictors work best in a predictive environment. Also, the Census Bureau’s comparison of the logistic regression approach to poststratification when the logistic regression predictors are restricted to those used in the poststratification ignores a primary benefit of logistic regression of accommodating a larger number of predictors. This suggests that a more appropriate comparison is between the 2000 poststratification and logistic regression models with additional variables determined to provide additional predictive power. Furthermore, there have been a variety of studies, outlined in Chapter 4, especially ethnographic work, that provide information as to why certain housing units are missed in the census and why people with various characteristics are missed in otherwise enumerated housing units. This information is moderately consistent with the variables currently included in the logistic regression models being examined by the Census Bureau, but the linkage between the research findings and the predictors in these models is not as direct as one would like. We think that the logistic regression models need to represent what is known about the sources of census coverage error, to the extent that this information is represented on the short form and in available contextual information. There also seems to be an unnecessary rush to pursue a model that can be used in a production environment, while there is still time to operate in a more exploratory manner. The panel therefore thinks that the Census Bureau has been too cautious in its examination of potential sets of predictors. The six models that have garnered the majority of attention to date are too similar to learn enough about what model forms and collections of predictors will work well. The Census Bureau should therefore expand the important research carried out by Schindler (2006) and apply it to the logistic regression models, attempting to identify unanticipated correlations between match rate or correct enumeration rate and the available predictors, using cross-validation to evaluate the resulting logistic regression models. With respect to model form, the Census Bureau has also carried out some preliminary work on a very different use of two logistic regression models to model census net coverage error (see Griffin, 2005b). The first logistic regression models the probability that a housing unit will be missed in the Census Bureau’s Master Address File. The second logistic regression model, conditional on the first, estimates the probability that, given that the housing unit is included in the Master Address File, an individual with certain characteristics will be missed. A number of details remain unclear with this approach, including how to handle erroneous enumerations and duplications. However, the panel strongly endorses further work on this and other modeling ideas that, even if not used in a production environment, will add to the Census Bureau’s understanding of census coverage error. Finally, the switch from use of poststrata to logistic regression modeling has important implications for census data users in communicating summaries of net coverage error. First, logistic regression modeling is likely to be more statistically efficient in its use of data than poststratification and, if so, may support estimates at lower levels of geographic and demographic aggregation. Therefore, the Census Bureau should

OCR for page 32
Research and Plans for Coverage Measurement in the 2010 Census: Interim Assessment: Panel on Coverage Evaluation and Correlation Bias in the 2010 Census examine what the reliability will be for estimates of various levels of aggregation and consider releasing estimates at a more detailed level than the A.C.E. poststrata, should the estimates support that. Second, for ease of comparison, while there is likely to be no poststrata in 2010 due to the use of logistic regression modeling, the Census Bureau should consider release of estimates of net coverage error for the 2010 census for comparable aggregates to support the comparison of net coverage error from census to census. Recommendation 1: The Census Bureau should evaluate, for use in the 2010 census coverage measurement program, a broader range of models, most importantly logistic regression models, for net coverage error that includes variables in addition to those used to define the A.C.E. poststratification. These should include a wider range of predictors (e.g., geographic, contextual, family, and housing variables and census operational variables), alternative model forms (e.g., classification trees), and the use of random effects to model small-area variation. The panel hopes to provide more guidance on the issue of which covariates to include in these logistic regression models in its final report. In the mean time, the Census Bureau should continue to investigate the full range of predictors while the panel and the Census Bureau continue to consider for which applications models with various predictors are appropriate. RESEARCH ON CONTAMINATION DUE TO THE COVERAGE FOLLOW-UP INTERVIEW Previous PESs initiated their data collection after the conclusion of the census data collection, with the minor exception of telephone A.C.E. interviewing in 2000. In 2000, this meant starting the A.C.E. nontelephone field interviewing after the conclusion of the nonresponse follow-up and the CEFU and CIFU interviewing. This was done for two reasons: (1) to avoid the possibility that the A.C.E. interview might impact the still incomplete census operations, thereby causing the PES blocks to be unrepresentative of the census, and (2) so that the evaluation that A.C.E. provided was of the complete census. However, the wait to begin the A.C.E. interviews increased the number of movers in the period between the census and the A.C.E., which reduced the quality of the data collected for A.C.E. Any impact of the PES (CCM) interview (or other PES operations) on census processes in the PES blocks is a type of contamination. One way in which contamination might operate is if the census follow-up interview were affected by confusion with the already completed CCM interview. One possible impact is a refusal to participate in the census follow-up interview, but one can also posit other more subtle impacts on the census follow-up interview from CCM operations. The impact of contamination on the entire census is essentially negligible, since the PES blocks represent a very small percentage of the country (less than 1 percent in

OCR for page 32
Research and Plans for Coverage Measurement in the 2010 Census: Interim Assessment: Panel on Coverage Evaluation and Correlation Bias in the 2010 Census 2000). However, given that contamination could result in a census in the CCM blocks that was not representative of the census in the remainder of the country, it might lead to substantial bias in the dual-systems estimates of net undercoverage. In previous censuses, waiting for the various follow-up interviews and other coverage improvement programs to be completed prior to the collection of PES data was of less concern, since these were, generally speaking, relatively limited operations that could be concluded fairly expeditiously. However, in 2010, as noted above, the CFU interview could potentially involve a large fraction of the households in the United States and take a substantial time to complete. This would push back the CCM interviewing to September or later, resulting in a substantial increase in the number of movers and generally reducing the quality of the data collected in the CCM. Furthermore, there is a substantial similarity to the CFU and the CCM questionnaires, which might increase the possibility of contamination. This issue has two aspects. The first is assessing the degree to which having the CCM interview precede the CFU interview affects the responses collected in the CFU interview. Attempts to measure contamination in the 1990 and 2000 censuses found no appreciable contamination (see, e.g., Bench, 2002), but, as argued above, the threat of contamination in the 2010 CFU seems more serious. If this contamination is ignorable, then the Census Bureau could let the interviews coexist in the field in 2010, in which case, one would also like to assess the impact of the CFU interview on the quality of the CCM interview. There were two attempts to measure contamination in the 2006 census test. The first attempt compared interview rates and rates of erroneous enumerations and omissions in the two populations defined by the order of the interviews. This analysis was stratified by the various situations that result in a CFU interview, listed above. However, the measurement of contamination was indirect, and the modest sample size reduced the statistical power of the analysis. In addition, there was a matched pair design in which a second sample using the same sample design as the CCM was selected, using a block geographically proximate to the CCM sampled block. Then the population estimates for the two samples were compared. Again, the small sample size for this study was a concern. Although it is too late for the 2006 test, the panel was interested in more direct observation of the impact of several proximate interviews used to determine the residents of a household. It is possible for a household to have a form mailed to it, with nonresponse resulting in several attempts to follow up by a field enumerator. If one of the various situations generating a CFU interview occurs, there will be attempts to carry out a CFU interview, and if then selected into the CCM sample, the household will be interviewed again, and finally, if there is a difficulty in matching to the E-sample, the household could be field interviewed a fourth time. To better understand the impact of several interviews occurring close in time and with similar content on respondents, the Census Bureau could carry out a limited test of this during 2007 or during the 2008 dress

OCR for page 32
Research and Plans for Coverage Measurement in the 2010 Census: Interim Assessment: Panel on Coverage Evaluation and Correlation Bias in the 2010 Census rehearsal. The panel is concerned that after the second interview, the chances either of a refusal or the collection of poor-quality data could increase. The second aspect of the contamination issue is what to do in 2010 if appreciable contamination is either observed or cannot be ruled out. One might address this problem in several ways (see Kostanich and Whitford, 2005, for a discussion of some of these approaches). Combine the CFU and the CCM interviews into one multipurpose interview. The panel has some sympathy for this position, given the similarity of the interviews. However, the CCM interview must be an independent count of a housing unit to satisfy the assumptions underlying DSE, whereas the CFU interview is dependent on information received in the initial census questionnaire. It is therefore difficult to combine these interviewing instruments. Have the CFU interview occur either before or after the CCM interview, but apply the CCM coverage measurement program to the census before the application of the CFU interview. This is referred to as evaluating a truncated census, since the definition of the census for purposes of coverage evaluation is the census that existed prior to the taking of the CFU interview. Any enumerations added by carrying out CFU interviews after the CCM interviews were completed could be treated as “late additions” were treated in 2000, that is, removed from the census for purposes of coverage measurement. A problem with this approach is that if the CFU adds an appreciable number of people, or corrects the enumerations of an appreciable number of people, one is evaluating a truncated census that is substantially different from the actual census. Also, if these additions or corrections are considerably different in coverage error characteristics in comparison with the remainder of the population, that would add a bias to the dual-systems estimates. One could include the CFU interviews that occurred prior to the CCM interviews in the truncated census, in which case the net coverage error models could condition on whether a CFU interview was carried out prior to the CCM interview, which would remove any bias if the P-sample inclusion probabilities depended on the occurrence of the CFU interview (but not on its outcome— for details, see Bell, 2005). Information on what the CFU interview added from outside the CCM blocks also could be used in these models. There are some operational complexities to this idea, including the need to duplicate the formation of relatively large processing files. Finally, as mentioned previously, one is not evaluating the complete census, and therefore to assess components of census coverage error resulting from the application of the CFU, one would need to carry out a separate evaluation study outside the CCM blocks, which is a serious disadvantage. Do not use the CFU in the CCM blocks. This avoids any contamination, but then the CCM evaluates an incomplete census, with essentially the same problems listed in (2).

OCR for page 32
Research and Plans for Coverage Measurement in the 2010 Census: Interim Assessment: Panel on Coverage Evaluation and Correlation Bias in the 2010 Census Let the CFU and CCM interviews occur in whatever order they do, and treat contamination as a constant effect times an indicator variable for which of the two interviews comes first for households that have both CFU and CCM interviews. The difficulty with this approach is that it is not clear what the impact will be on whichever interview comes second, so it is not clear that contamination can be effectively modeled through use of a constant effect. For example, contamination might be a function of various characteristics of the household and therefore be subject to various interactions. Delay the CCM interviews until the CFU interviews are complete. This does solve the contamination problem. However, coverage evaluation interviews that occurred in August 1980 were less useful than those in April due to the large number of movers that occurred during the four-month period. Therefore, this could have a substantial, negative impact on the quality of the CCM data that are collected in 2010, depending on how long one has to wait. The panel has not yet come to a consensus on this question. The panel was interested in further examination of the implications of a truncated census (option 2) or combining the two instruments (option 1). The Census Bureau believes that the best approach is to delay the CCM interviews until after all CFU interviews are completed (option 5). The basis for this decision was that in this way the Census Bureau will not plan to have a substandard census in any area (which would certainly be true of option 3), and combining the interviews might harm both interviews in option 1. Furthermore, option 4 is unknown and difficult to test prior to the 2010 census. (For more details on the Census Bureau’s views on contamination, see Kostanich and Whitford, 2005.) However, the panel did not find the argument about the difficulty of duplicating census processing files for option 2 compelling, given the current availability of inexpensive computer memory. The panel does have concerns about not starting the CCM interviews until September 2010, given the increased number of movers that this would create between Census Day and the CCM interview. It is hoped that by expediting certain operations, an August start for the CCM might still be possible. For this reason, it is important to collect good data in 2006 and 2008 on the impact of delays of various length on the number of movers. In this and several other respects, the results from the 2006 census test will inform the Census Bureau’s position on this issue. SAMPLE DESIGN FOR THE CCM POSTENUMERATION SURVEY An important question concerning the CCM program is what modifications should be made to the design of A.C.E. in looking toward the CCM in 2010, given the change in objectives in coverage measurement between the 2000 and the 2010 censuses. That is, to what extent can the new goal of process improvement be incorporated into the design of the CCM PES?

OCR for page 32
Research and Plans for Coverage Measurement in the 2010 Census: Interim Assessment: Panel on Coverage Evaluation and Correlation Bias in the 2010 Census The proposed design for the CCM PES in 2010 is as follows (for details, see Fenstermaker, 2005). The Census Bureau is assuming that the CCM PES will draw a sample of 300,000 housing units, with primary sampling units comprising block clusters. The panel is in support of Recommendation 6.1 of National Research Council (2004b), which supports a PES survey that would produce net coverage estimates of the same precision as that of the 2000 A.C.E. These block clusters are meant to contain around 30 housing units, and the plan is to subsample them in the event that they contain substantially more. These block clusters will be stratified into three categories: (1) medium and large clusters, with some subsampling within large block clusters, (2) American Indian Reservation block clusters, and (3) small block clusters, which will utilize a two-phase design to sample block clusters under a certain size but to retain all small block clusters greater than that. In allocating the sample of 300,000 housing units to states, the general approach will be to sample proportional to the total population of each state. However, each state’s sample will contain a minimum of 60 block clusters, and Hawaii will be allocated 150 block clusters. In addition, there will be a separate American Indian Reservation sample drawn proportionally to the 2000 census count of American Indian and Alaskan Native populations living on American Indian reservations. The rationale behind the state allocations for the 2010 CCM PES is that this is intended to be a general purpose sample, so any oversampling in comparison to proportional allocation needs to be strongly justified. In addition, the Census Bureau was very satisfied with the 2000 A.C.E. design, which this design roughly duplicates. The Census Bureau has no specific variance requirements for the 2010 CCM estimates, since production of adjusted counts is not anticipated. The Census Bureau did examine some alternative specifications for the design of the CCM PES, using simulation studies of the quality of the resulting net coverage error estimates and assessment of components of census coverage error, especially estimation of the number of omissions and erroneous enumerations at the national level and for 64 poststrata (see Fenstermaker, 2005). The designs were (1) the design described above, with allocations proportional to total state population, but with a minimum of 60 block clusters per state, and with Hawaii allotted 150 block clusters; (2) similar to (1) except Hawaii is allocated only 60 block clusters; (3) a design in which allocations are made to the four census regions to minimize the variance of estimates of erroneous enumerations, but within regions, allocations are made proportional to state size; and (4) a design in which half of the sample is allocated proportional to the number of housing units within update/leave areas, and half of the sample is allocated proportional to each state’s number of housing units. Through use of simulations, for each design and PES sample, national estimates were computed of the rate of erroneous enumerations (and the rate of erroneous enumerations with mail returns, with nonresponse follow-up, and with CFU), the nonmatch rate, the omission rate, and the net error rate. Finally, national estimates of the population were computed, along with their standard errors. The same analysis was done at the poststrata level. One hundred replications were used for the simulation study. The

OCR for page 32
Research and Plans for Coverage Measurement in the 2010 Census: Interim Assessment: Panel on Coverage Evaluation and Correlation Bias in the 2010 Census results supported retention of the design that closely approximated the 2000 A.C.E. design, described above. The panel has not yet come to a consensus on whether to recommend modifications to this design for the CCM PES in 2010. There is some concern that the allocation of a minimum of 60 block clusters to each state is too linked with the need to provide adjusted counts for states and not as targeted toward measurement of the rates of the four types of census component coverage errors. If it is the case that the households that are more problematic to count can be linked to relatively focused geographic regions, it would be interesting to evaluate a design that oversampled those areas to see the impact on the reliability of measurement of census component error rates. This is similar to design alternative (3) above, but what we are suggesting is more targeted than that. Furthermore, we also think that the Census Bureau needs to give more consideration to its within-state allocations of block groups. For example, the possibility of oversampling block groups in predominantly minority areas with, say, large percentages of renters is an alternative that deserves further consideration. The panel is also not clear why the Census Bureau is not making greater use of their planning database, which provides an indication of the difficulty of enumerating block groups. The objective of settling on a sample design for the CCM in 2010 is a difficult task. There are two general objectives of the coverage measurement program for 2010. First, there is the primary objective put forward by the Census Bureau, which is the measurement of census component coverage errors at some unspecified level of geographic and demographic aggregation. Second, there remains the need to measure net coverage error at the level of the poststrata used in 2000 in order to facilitate comparison with the 2000 census. To address the first goal, one would like to target problematic domains. However, one has to guard against unanticipated problems that might appear in previously easy-to-count areas. To do that and to provide estimates of net coverage error across the United States, a less targeted design is needed. These various demands individually argue for very different designs, and mutually accommodating them, to the extent possible, is challenging. The panel anticipates providing much more direction on this question in its final report. Research on the Use of Administrative Records in Support of Coverage Improvement and Coverage Measurement in 2010 The Census Bureau’s research program has explored decennial uses of administrative records, that is, data collected as a by-product of administering a governmental program, since the 1980s. Possible uses include (1) a purely administrative records census; (2) improving census nonresponse follow-up either by using enumerator follow-up only when administrative records do not contain the required information or, alternatively, using administrative records to complete information for households that do not respond after several attempts by field enumerators; (3) improving the Master Address File using addresses in administrative records; (4) assisting in coverage measurement, for example, through use of triple-systems estimation (a generalization of

OCR for page 32
Research and Plans for Coverage Measurement in the 2010 Census: Interim Assessment: Panel on Coverage Evaluation and Correlation Bias in the 2010 Census DSE in which the third system is a merged list of individuals from administrative records); and (5) assisting in coverage improvement, for example, by identifying census blocks for which the census count is likely to be of poor quality. We emphasize that the use of administrative records could be the most promising idea for assisting in the measurement of omissions of hard-to-enumerate groups. However, until recently, partly due to the limited quality of the available administrative records, including the currency of the information, especially for addresses, the computational burden, and concerns about public perceptions, neither these nor other applications of administrative records have been implemented during a decennial census. (An approach to the problem of currency of address can be found in Stuart and Zaslavsky, 2002). As a result, until 2000, there was no comprehensive census test of the use of administrative records for any purpose, although there were earlier assessments of the coverage of merged administrative lists.9 Now, however, several of these concerns have been ameliorated. The quality and availability of national administrative records are improving, computing power has increased dramatically, and as a result the very active research group on administrative records at the Census Bureau has achieved some impressive results. The primary program and database, referred to as E-StARS, now has an extract of a validated, merged, unduplicated residential address list with 150 million entries, 80 percent of which are geocoded to census blocks, and another extract of a validated, merged, unduplicated list of residents with demographic characteristics. These lists are approaching the completeness of coverage that might be achieved by a decennial census. Seven national files are merged to create E-StARS, with the Social Security Number Transaction File providing demographic data. The panel strongly supports this research program, and we think that there is a real possibility that administrative records could and should be used in the 2010 census, either for coverage improvement, for nonresponse follow-up, or for coverage measurement. Potentially feasible uses in the 2010 census include To improve or evaluate the quality of either the Master Address File or the address list of the postenumeration blocks. The quality of the Master Address File is a key to a successful mailout of the census questionnaires and nonresponse follow-up, and the quality of the independent list that is created in the PES blocks is a key to a successful coverage measurement program. E-StARS provides a list of addresses that could be used in at least two ways. First, the total number of E-StARS addresses for small areas could be checked against the corresponding Master Address File totals or PES totals to identify areas with large discrepancies that could be relisted. Second, more directly, address lists could be matched to identify specific addresses that are missed in either the Master Address File or the 9 The Census Bureau operates under the constraint that information obtained from administrative records under confidentiality restrictions cannot be sent out to the field to assist enumerators, which prohibits the use of some applications of administrative records.

OCR for page 32
Research and Plans for Coverage Measurement in the 2010 Census: Interim Assessment: Panel on Coverage Evaluation and Correlation Bias in the 2010 Census PES address listing, with discrepancies followed up in the field for resolution. Note that while administrative records could be used to improve the address list for either the census or the PES, to maintain independence they should not be used for both. To assist in late-stage nonresponse follow-up. The Census Bureau makes several attempts to collect information from mail nonrespondents to the census form. When these attempts fail to collect information, attempts are made to locate a proxy respondent and, when that fails, hot-deck imputation is used to fill in whatever information is needed, including the residence’s vacancy status and the household’s number of residents. If the quality of E-StARS information is found to be at least as good as that from hot-deck imputation or even proxy interviews, it might be effective to attempt to match nonrespondents to E-StARS before either pursuing a proxy interview or using hot-deck imputation. Especially with a short-form-only census, E-StARS might be sufficiently complete and accurate for this purpose. (It may ultimately be discovered, possibly during an experiment in 2010, that fewer attempts at collecting nonresponse data are needed by making use of E-StARS information after, for example, only one or two attempts at nonresponse follow-up, thereby shortening and reducing the costs of nonresponse follow-up.) For item imputation. The Census Bureau often uses item imputation to fill in modest amounts of item nonresponse. Item nonresponse could affect the ability to match a P-sample individual to the E-sample, and missing demographic and other information may result in an individual being placed in the wrong poststratum. Item imputation based on information from E-StARS may be preferable to hot-deck imputation. The use of E-StARS to provide item imputation is currently being tested as part of the 2006 census test. To improve targeting of the coverage improvement follow-up interviews. The coverage improvement interview in 2010, as currently planned, will follow up households with any of the following six conditions: (1) uncertain vacancy status, (2) characteristics for additional people in large households, (3) resolution of count discrepancies, (4) duplicate resolution, (5) persons who may have been enumerated at other residences other than the one in question, and (6) nonresidents who sometimes stayed at the housing unit in question. The workload for this operation might well exceed the Census Bureau’s capacity to carry out the necessary fieldwork, given limited time and resources. Administrative records could possibly be used to help identify situations in which field resolution is not needed, for example, by indicating which of a set of duplicates is at the proper residence. (Uses of E-StARS like this are being attempted in the 2006 census test.) To help determine the status of a nonmatch prior to follow-up of nonmatches in the PES. It is very possible that nonmatches of the P-sample to the census may be resolved, for example, by indicating that there was a geocoding error or a misspelled name through the use of administrative records, thereby saving the expense and time of additional CCM fieldwork. To evaluate the census coverage measurement program. Many of the steps leading to production of dual-systems estimates might be checked using

OCR for page 32
Research and Plans for Coverage Measurement in the 2010 Census: Interim Assessment: Panel on Coverage Evaluation and Correlation Bias in the 2010 Census administrative records. For example, administrative records information might be used to assess the quality of the address list in the P-sample blocks, to assess the quality of the matching operation, or to assess the quality of the small-area estimation of population counts. (However, any operation that makes use of administrative records cannot also use the same administrative records for purposes of evaluation.) The administrative records group at the Census Bureau has already had a number of successful applications of E-StARS. First, an administrative records census was conducted in five counties during the 2000 census, and its quality was judged to be comparable to that of the census in those counties. Second, E-StARS was used to explain 85 percent of the discrepancies between the Maryland Food Stamp Registry recipients and estimates from the Census Supplementary Survey in 2001 (the pilot American Community Survey). The panel considers this important and promising research that should play a key role in censuses beginning in the year 2020, given the potential for cost savings and quality improvement. With respect to use in 2010, since the various suggestions depend crucially on the quality of the merged and unduplicated lists of addresses and people in E-StARS, the use of E-StARS for any of the above purposes in 2010 will require further examination of the quality of the lists, as well as evaluation of the specific application in comparison to the current method used in the census. Until there are rigorous operational tests of both feasibility and effectiveness, it would not be reasonable to move toward implementation in 2010. Given where we are in the decade, it is unlikely that more than one of the above six bulleted applications could have sufficient resources devoted to support incorporation in the 2008 dress rehearsal, which is a necessity for implementation in 2010. Therefore there is a need to focus immediately on one very specific proposal. The panel recommends that one of the above applications be developed sufficiently to support a rigorous test in the 2008 dress rehearsal with the goal of implementation in 2010 should the subsequent evaluation support its use. Furthermore, the Census Bureau should begin now to design rigorous tests of all the above suggestions for the use of administrative records, very possibly during the 2010 census, as a first step toward decennial census application of administrative records in 2020. We think that administrative records have great promise for assisting in understanding census omissions and therefore need to be used either for evaluation of the CCM or as a part of the CCM program. Recommendation 2: The Census Bureau should choose one or more of the proposed uses of administrative records (e.g., tax record data or state unemployment compensation data) for coverage improvement, nonresponse follow-up, or coverage measurement and comprehensively test those applications during the 2008 census dress rehearsal. If a process using administrative records improves on processes used in 2000, that process should be implemented in the 2010 census.

OCR for page 32
Research and Plans for Coverage Measurement in the 2010 Census: Interim Assessment: Panel on Coverage Evaluation and Correlation Bias in the 2010 Census We add that evaluations of the use of administrative records are often viewed as involving extensive, resource-intensive fieldwork. However, while fieldwork needs to be involved to some extent, much evaluation of administrative records can be accomplished if the Census Bureau structures its various databases collected from test censuses in a way that facilitates matching. Furthermore, if data from E-StARS are used successfully in 2010, the Census Bureau should consider more ambitious uses of administrative data in the 2020 census. Specifically, the Census Bureau might use administrative data to replace the nonresponse follow-up interview for many housing units, not just late-stage nonresponse. Under this proposal, the Census Bureau would use data from administrative records to determine the occupancy status of some nonresponding housing units and the number and characteristics of its residents. To do so, the Census Bureau would have to develop criteria of adequacy of the information in the administrative records to establish the existence and membership of the household for this purpose. For example, agreement of several records of acceptable currency and quality might be considered sufficient to use the information as a substitute for a census enumeration, which would reduce the burden of field follow-up. This would represent a substantial change in what constitutes a census enumeration, of at least the same conceptual magnitude as the change from in-person to mail enumerations as the primary census methodology. However, given that the completeness of administrative systems and the capabilities of matching and processing administrative records has been growing, while cooperation with field operations has declined, these contrasting trends make it increasingly likely that administrative records can soon provide enumerations of quality at least as good as field follow-up for some housing units. Furthermore, unlike purely statistical adjustment methods, every census enumeration would correspond to a specific person for whom there is direct evidence of his or her residence and their characteristics. The long-run potential for such broader contributions from administrative records is a reason to give high priority to their application in the 2010 census, in addition to their direct benefits in that census. Two possible objections might be raised in opposition to this approach. First, this use of administrative records may be ruled to be inconsistent with interpretations of what a census entails in the Constitution. Second, public acknowledgment that this method is being used might have a negative impact on the level of cooperation with census-taking. These two issues would need to be resolved before the Census Bureau could go forward. Also, this is clearly dependent on the success of the more modest efforts suggested for possible use in 2010.