Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 55
Coverage Measurement in the 2010 Census 3 Plans for the 2010 Census The 2010 census has an innovative design, resulting in a census that will differ from its predecessor to a very substantial degree. Though plans for the 2010 census remain tentative, it is useful for the panel’s analysis to be able to compare the timetables for main activities in the 2000 and in the 2010 censuses. Box 3-1 contains a cross-walk of the timetables for the 2000 and 2010 censuses. MAJOR DESIGN CHANGES Four significant differences in design will considerably affect how the 2010 census coverage measurement (CCM) program needs to differ from the 2000 coverage measurement program: a short-form only census, an improved system for the Master Address File (MAF/TIGER) (the topologically integrated geographic encoding and reference database), coverage follow-up interviews, and removal of duplicate enumerations during the census.1 A Short-Form Only Census Since 2005 the Census Bureau has been fielding the American Community Survey (ACS), a continuous version of the decennial census long form. Therefore, under current plans there will be no long form in the 2010 census. This change will facilitate several 1 The discussion in this chapter is based on the Census Bureau’s plans for the 2010 census as of spring 2008.
OCR for page 56
Coverage Measurement in the 2010 Census BOX 3-1 Cross-Walk of Schedules 2000 Census 2010 Census LUCAa LUCA 98 05/98–09/99 LUCA 99 01/99–110/9 Ship materials 11/06/07–33/18/0 Updates: 9/25/07–10/08/08 [Note: Some materials sent earlier than 11/06/07] MAF Block Canvass 01/99–05/99 04/06/09–07/10/09 Questionnaire Mailout 03/13/00–03/15/00 03/15/10–03/17/10 NRFUb Begins/Ends 04/00–07/00 05/01/10–07/10/10 CEFUc CFUd 05/00–07/00 04/26/10–08/13/10 CIFUe 07/00–08/00 Coverage Measurement Personal Interviews 05/00–08/00 08/14/10–10/02/10 aLUCA: Local Update of Census Addresses bNRFU: Nonresponse Follow-Up cCEFU: Coverage Edit Follow-Up dCFU: Coverage Follow-Up eCIFU: Coverage Improvement Follow-Up SOURCES: Census 2000 Operational Plan, December 2000, U.S. Department of Commerce, Economics and Statistics Administration, U.S. Census Bureau; 2010 Census Key Operational Milestone Schedule. aspects of data collection in the census, including data capture, the work of follow-up enumerators, the management of foreign language forms and foreign language assistance, and data editing and imputation for nonresponse. The ACS has been mentioned as a possible survey vehicle for coverage measurement. We agree that there may be some potential for use of the ACS to help assess the quality of dual-systems estimation (DSE), or to help more broadly in coverage evaluation. However, some problems would need to be overcome in applying the ACS in this way. First, the address files for the ACS and the Census are very closely related, so at present the ACS could not be used to estimate whole household omissions. In addition, the ACS questionnaire is not focused on coverage measurement, as is that for the CCM. Finally, the ACS has a different defi-
OCR for page 57
Coverage Measurement in the 2010 Census nition of residence than the census, which would cause some additional, albeit minor, complications. Improved MAF/TIGER System In outline, the MAF begins with the final address list developed in concert with the taking of the previous census. This list is updated on a fairly continuous basis by additions and deletions to the U.S. Postal Service’s Delivery Sequence File. For the 2000 census, local areas were provided the opportunity in 1998 and 1999 to make additions and deletions based on local information, which was referred to as the Local Update of Census Addresses (LUCA) Program. A block canvass was carried out to determine the accuracy of the address list a year prior to the census. In addition to these procedures, there were more than a dozen other ways in which an address can be added to the MAF. There were numerous questions about the completeness and the accuracy of the MAF listings for the 2000 census (see, e.g., National Research Council, 2004b:Finding 4.4), and efforts are now under way to improve both MAF and TIGER for 2010. The MAF and TIGER databases have been redesigned into a single MAF/TIGER database: MAF provides a list of household addresses, and TIGER is used to associate each address on the MAF with a physical location. The MAF/TIGER Enhancement Program includes: (1) the realignment of every street and boundary in the TIGER database; (2) development of a new MAF/TIGER processing environment and the integration of the two previously separate resources into a common technical platform; (3) collection of global positioning system coordinates for structures on MAF; (4) expansion of geographic partnership programs with state, local, and tribal governments, other federal agencies, the U.S. Postal Service, and the private sector; (5) implementation of a program to use ACS enumerators to generate address updates, primarily in rural areas; and (6) the use of periodic evaluation activities to provide quality metrics to guide corrective actions (for details, see Hawley, 2004). One motivation for this initiative was the recognition by the Census Bureau that many census errors and inefficiencies in 2000 resulted from errors in the MAF and in the information on the physical location of addresses. Coverage Follow-Up Interviews The Census Bureau is greatly expanding the percentage of housing units for which there will be a coverage follow-up interview in 2010 in comparison with the housing units in 2000 for which there was a coverage edit follow-up. The 2000 coverage edit follow-up was used to determine the correct count and characteristics for two situations: households with more than six residents (since the census form had space for information for only six persons)
OCR for page 58
Coverage Measurement in the 2010 Census and households with count discrepancies (e.g., differences between the number of separate people listed on the questionnaire and the indicated total number of residents). The planned expansion in 2010 is motivated by the recognition that confusion about residence rules was a substantial source of census coverage error. The expanded coverage follow-up interviews planned in 2010 will include four situations in addition to the two covered in 2000: (a) households with a possible duplicate enumeration identified by a computer match of the census returns to themselves; (b) households that respond positively to a coverage probe on the census questionnaire concerned with census omissions; (c) households that respond positively to a coverage probe on the census questionnaire concerned with census erroneous enumeration and duplication; and (d) households that have different counts than that of a census-developed population register based on merged administrative records, known as StARS. Of these four situations, (a) is intended to identify both households containing duplicated individuals and fully duplicated households, (b) is intended to identify potential omissions in the census, (c) is intended primarily to identify duplicated individuals, and (d) is intended to identify all types of coverage error; see Box 3-2. BOX 3-2 Situations Potentially Generating a Coverage Follow-Up Interview Coverage follow-up interviews could result from any of the following six situations: Count discrepancies in which the indicated total number of residents does not equal the number of individuals for whom information is provided on the census questionnaire. Large households, where the number of residents is larger than six, which is the maximum number of individuals with space for characteristics information on the census questionnaire. Positive result from the national duplicate search, i.e., where individuals in a household unit match the data for individuals in another household. Positive response to the coverage probes for census omissions, namely: “Were there any additional people staying here Census Day that you did not include in Question 1?” Positive response to the coverage probe for census overcounts, namely: “Does person P sometimes live or stay somewhere else?” A count discrepancy between the census count for a housing unit and the count from a roster produced from merging administrative records.
OCR for page 59
Coverage Measurement in the 2010 Census Due to resource and time constraints, the Census Bureau will only follow-up those households that provide a telephone number. In addition, the Census Bureau may only be able to administer the coverage follow-up interview to a “most promising” subset of the qualifying households in 2010. In other words, the Census Bureau may have to set priorities by selecting a subset of the qualifying households that are more likely to provide information that would result in coverage improvements. Removal of Duplicate Enumerations During the Census As noted above, the coverage follow-up interviews will be used to collect more information on suspected duplicate enumerations that are identified through use of a national computer search, with the objective of determining whether they are in fact duplicates and, if so, which of the addresses (if either) is the correct residence. If the correct residence is identified, the enumerations at the incorrect residence would be removed from the census. This new census design has some benefits for the coverage measurement program in 2010. Focusing on the collection of short-form data will likely improve the quality of the information collected, thereby reducing the frequency of errors made in matching of the postenumeration survey (PES) to the census. Also, implementing a national search for, and field verification of, duplicate enumerations should reduce the number of duplicates in the census, which may in turn facilitate the estimation of components of coverage errors in the census and may also simplify the application of the net coverage error models used in DSE in 2010. TREATMENT OF DUPLICATES Census Duplications There are many different causes of duplication in a census. As noted above, the census process may enumerate people that move either shortly before, on, or shortly after Census Day at both their previous and their current residences; the census may enumerate families with second homes at both residences; the census may enumerate college students both at their college residence and at their parents’ homes; and the census may enumerate “snow birds” at both their primary residences and at their winter homes. These are all examples of confusion over where someone’s correct census residence is. Another cause of duplication in the census is representation of an address in more than one way on the MAF or having two forms returned for the same unit, which can happen in multiple ways. To address duplication in the census (in addition to attempts to measure its frequency in the
OCR for page 60
Coverage Measurement in the 2010 Census coverage measurement program) efforts have been made to adjust various census processes to reduce the frequency of duplication. An example is the primary selection algorithm, used in both the 1990 and 2000 censuses and planned for use in 2010, which removes duplicate responses in the census from the same housing unit by identifying the unique people who were enumerated across all responses keyed to that housing unit. Also, the census questionnaire has been adjusted in attempts to reduce misunderstandings of census residence rules—in particular, through the addition of the two coverage probes—and various efforts have been made to reduce duplication in the MAF. Yet preventing census duplication before it occurs is still a nontrivial task, and it was a serious problem in 2000 (for details, see National Research Council, 2004b). As noted above, the Census Bureau in 2010 will attempt to identify and delete duplicate persons and housing units during the census. Specifically, after the primary enumeration process and nonresponse follow-up are complete, a nationwide computer search of census enumerations for matching individuals will be carried out, using name, date of birth, gender, and phone number (when available). On the basis of the results of that search, the Census Bureau will identify likely duplicates in the 2010 census. Depending on the geographic proximity of the two residences in question and the duplicate status of the other residents of a household, this process may also be used to identify suspected duplicate housing units. Once this list of potential individual and whole household duplicates is generated, the plan is to collect more information by telephone through coverage follow-up interviews at both residences. The interviews will attempt to ascertain which (if either) of the two enumerations is correct and which is a duplicate. (See Box 3-1 for additional details on circumstances that can generate a coverage follow-up interview.) Because Title 13 (of the U.S. Code) privacy protections prohibit using information from one housing unit in querying another, extensive probes will be used for handling a wide variety of complex living situations that may be associated with the potential duplication in question (e.g., part-time residents, students away at college, movers, children in joint custody, and elderly people in nursing homes). In particular, the interviewers cannot be told which people in a house are likely duplicates to help guide the interviews. To reduce costs, as noted above, the coverage follow-up interviews will only be carried out by telephone, and so will not be carried out for households that did not provide their telephone number on their census questionnaires. This approach will prevent a modest, but not insubstantial, percentage of the existing duplicates from being identified and therefore being removed from the census. Though the specific algorithm and the accompanying threshold for designating matching individuals have not been chosen, the Census
OCR for page 61
Coverage Measurement in the 2010 Census Bureau intends to set a strict threshold before the records for two individuals will be identified as a possible match and therefore trigger a coverage follow-up interview. In addition to the strict threshold for designating potential matches, one of the enumerations of a potential duplicate pair will not be deleted from the census unless the evidence collected from the coverage follow-up interview is clear that the individuals are duplicates and which of the residences is correct given the census residence rules. The panel is unclear precisely how the information collected in the coverage follow-up interview will be used to discriminate between a duplicate enumeration and a nonduplicate enumeration and to determine the correct census residence. Moreover, the process will provide an asymmetric treatment of coverage errors in that the error that results in the removal of a valid enumeration will be judged as being more serious than the error that results in the retention of a census duplicate. The panel acknowledges that this asymmetry can be partly supported given the nature of decennial census counts: that is, because the political environment in which the census operates reacts differently to these two types of error. However, both errors need to be measured and the trade-off evaluated to determine if it is reasonable or needs to be reconsidered. Joint Custody Scenario. As an example of what might happen in 2010, consider the following situation involving a child in joint custody, when both parents consider the child’s primary residence to be her or his home. In this situation, the coverage follow-up interviews might well collect information that supports the same two residences as the census reported. The Census Bureau will strongly suspect that it is a duplicate pair but will be unable to delete either enumeration given the lack of a way to identify the correct residence. Given the political sensitivity of the deletion of a census enumeration, a coverage follow-up interview is required for deletion of one of a duplicate pair, even if the duplicate status is essentially unambiguous given the above matching characteristics, and even if the residence rules are clear as to which residence is correct. In the case of potential whole housing unit duplicates, field inspection will be used to determine if two housing units are duplicates. Potential duplicate housing units (or households) may result from: (1) duplicate addresses for the same housing unit, (2) delivery mix-ups in apartments, (3) movers, and (4) person duplication of all members of a housing unit. When field follow-up is used to verify duplicate housing status, there will be no associated telephone coverage follow-up interviews for the individual residents. If duplicate addresses are discovered for the same physical unit, one of the two enumerations will be deleted. In the case of a delivery mix-up, the duplicate is retained as a field imputation.
OCR for page 62
Coverage Measurement in the 2010 Census The Census Bureau’s proposed use of the coverage follow-up interviews and the field validation of potentially duplicated whole housing units raises at least two major questions: Given that the coverage follow-up interviews will be by telephone only, what are the anticipated effects on duplicate resolution and other enumeration problems that are addressed by the use of the interview? What are the rules for deleting whole housing units that are identified as potential duplicates? The coverage follow-up interviews and the national search for duplicates could provide substantial benefits over previous censuses in identifying and removing many census duplicates during the field enumeration and in reducing the occurrence of other census coverage errors. However, there are many potential complications in implementation that might limit the benefits from the introduction of these processes in 2010. Since these activities are inherently national in scope, they could not be comprehensively tested in the relatively limited environment of a census test or the 2008 census dress rehearsal. In particular, such environments are unlikely to provide very good estimates of the extent to which these new activities will stress the census infrastructure (e.g., because of the number of coverage follow-up interviews that will be required). However, they could provide information as to how to set the threshold for determining when potential duplicates have characteristics that are close enough to warrant a coverage follow-up interview. The panel sees three concerns for the planned coverage follow-up interviews. First, will there be sufficient resources to support the interviews for all the situations that have been identified as potentially requiring such follow-up. Second, although the questions on the coverage follow-up interviews are more detailed than those on the census, will the similarity of the questions result in the relatively infrequent collection of information that would support changes in census enumeration status. Note that, in some sense, a respondent in the follow-up interview needs to admit that the previously provided information on the census form was incorrect. Third, given that follow-up interviews will only be done for households that provide telephone numbers, what will be the effects of not following up households that did not provide telephone numbers. In addition, as noted above, the panel is concerned that because the threshold for matching will be set relatively high, some duplicates will not appear to have characteristics that match and therefore will not trigger a coverage follow-up interview. This is particularly noteworthy because there may be demographic groups or geographic areas with a concentration of duplicates.
OCR for page 63
Coverage Measurement in the 2010 Census Careful evaluation of both the coverage follow-up interviews and the national search for duplicates is extremely important so that the functioning of these processes in 2010 is fully understood and to carefully guide any needed improvements for both of these processes prior to their use in 2020 (if, as we would anticipate, they are included in the 2020 census design). To support a careful evaluation, there is a need, at least for a sample of cases, to retain information as to precisely what happened to the cases that were selected for coverage follow-up interviews and as a result of the interviews. Therefore, the Census Bureau should save the responses, at least for a sample of enumerations, to the coverage follow-up interview questions and the final decisions made regarding the assessment of enumeration status. In addition to retaining information on the functioning of the follow-up process, it will also be important to know the extent to which the follow-up process moved census counts closer to the truth. The CCM provides a unique resource to assist in determining the situations for which the coverage follow-up process worked well and those for which it worked poorly. So, for enumerations in the E-sample (census enumerations that are in the P-sample, the postenumeration survey clusters), it would be very useful to retain a comprehensive log of their status prior to and after the coverage follow-up interviews. With this information, the CCM can provide a formal way of measuring the probabilities of proper and improper duplicate removal and proper and improper duplicate retention, and it can therefore provide an assessment of the decision process that was used to determine the cases that were deleted as duplicates and the cases that were retained. In addition to using the CCM for this purpose, it may also be valuable for the Census Bureau to return to the field to examine a subsample of cases selected for coverage follow-up interviews to see whether the interviews actually provided new information with any appreciable frequency and whether that new information led to correct decisions. Such a study should be designed to include a large fraction of census duplicates. Regarding the national search for duplicates, it would be useful to learn more about those cases that were near but still below the threshold and therefore were not selected for coverage follow-up to determine whether other thresholds would have provided better results. One could sample from the cases near but below the threshold and follow them up to assess whether any were duplicates and whether a field interview likely would have determined that. Such data collection would inform a cost-benefit analysis of the tradeoff of identifying more true duplicate enumerations against the cost of the additional field work (and the erroneous identification of more false duplicates).
OCR for page 64
Coverage Measurement in the 2010 Census This discussion has been focused on the use of the coverage follow-up interviews to determine duplicate status. However, the interviews will also be generated by a coverage probe for census omissions (i.e., “Were there any additional people staying here Census Day that you did not include in Question 1?”). This probe is likely to have the same problem as that encountered in the search for duplicates, namely, that it will often result in the same information as the census. Therefore, it also will be important to evaluate the resolution of households whose selection for follow-up interviews are generated by the coverage probe for census omissions. These evaluations will help measure the degree to which the panel’s concerns (noted above) are a problem. Looking ahead, to further improve on the use of this new process, the panel also believes it is important for the Census Bureau to undertake research during the postcensal period in the following areas: the potential for StARS to help target the cases that are included in the set of coverage follow-up interviews; the potential for StARS to help resolve potential duplicate enumerations; the potential use of StARS to augment the CCM personal interviews for resolving duplicate status; how to optimally set the “bar” for inclusion in the coverage follow interviews; how best to discriminate between person and whole household duplication; and in general, how to evaluate census unduplication procedures. With respect to the first three issues above, see further discussion below. Given the late date, it may be difficult to comprehensively evaluate these three suggestions, but it should be feasible to make some progress on each. CCM Duplications Duplications will occur not only in the census but also in the CCM survey data collection. There are some differences in the treatment of a possible duplicate enumeration in the census and in the CCM. Some of these differences in approach are due to the dramatically different sizes of the two activities: coverage follow-up interviews could be used for between 10 and 30 million households; in contrast, the CCM will cover about 300,000 households. We note several consequences of these differences. First, since one of the components of census coverage error is omissions, and since, as pointed out below, the estimation of net coverage error
OCR for page 65
Coverage Measurement in the 2010 Census is needed to estimate the number of census omissions, the Census Bureau should and will use an unbiased approach to its assessment of duplicate status in the CCM, in the sense of avoiding any differential bias in assessing the number of census omissions in comparison with the number of census overcounts. Second, because the CCM does not have to make a final determination of enumeration status, as the census does, the CCM can assign probabilities of being a duplicate to cases with unresolved duplicate status or similarly to cases with unresolved correct enumeration status in the E-sample or unresolved residency status in the P-sample. The in-person follow-up interviews for initially nonmatching CCM cases should often prove useful in reducing the number of duplicates in the CCM P-sample. However, it would also be useful to use information from the coverage follow-up interviews to reduce duplicates in the P-sample. However, due to concerns about statistical independence between the census and the postenumeration survey data collections, the Census Bureau does not currently plan to use information from the census coverage follow-up interviews to help ascertain duplicate status in the CCM. The panel does not agree with this decision: the panel does not understand why census information should not be used to assist in such determinations, since the goal is the proper estimation of the frequency of P-sample matches and E-sample correct enumerations. College Student Scenario. To help make these issues more clear, consider the following example involving a 19-year-old college student. Assume that the student is counted in the census at his or her parents’ house and also at his or her university in a different city. Also assume that the responses to the coverage probe dealing with overcounts on the census form for the parents’ home does not indicate that the student may sometimes live at an alternative address (the university). If the student’s name is relatively common and some of the other characteristics do not match (possibly due to nonresponse), a coverage follow-up interview may not take place, since the degree of agreement may not reach the high threshold for a coverage follow-up interview. In this case, the duplicate enumeration would remain in the census. However, if the student’s name is relatively uncommon, the degree of agreement may result in a follow-up interview. In that case, the parents, when interviewed, could still assert that the student lives with them, in which case the student’s duplicate enumeration would still remain in the census. However, the CCM threshold for identifying potential duplicates is almost certainly going to be lower than that for the coverage follow-up interview, which may therefore trigger an in-person follow-up interview, which might resolve the case. Assuming that the parents’ home is in the CCM survey, the parents are also likely to incorrectly respond to both the CCM interviewers and the
OCR for page 70
Coverage Measurement in the 2010 Census A second approach would be to have the coverage follow-up interviews occur either before or after the CCM interviews, but apply the CCM coverage measurement program to the census before coverage follow-up interviews. This approach is referred to as evaluating a truncated census, since the definition of the census for purposes of coverage evaluation is the census that existed prior to the follow-up interviews. Any enumerations added by carrying out coverage follow-up interviews after the CCM interviews were completed could be treated as “late additions” were treated in 2000: that is, removed from the census for purposes of coverage measurement. A problem with this approach is that if the coverage follow-up interview adds an appreciable number of people or corrects the enumerations of an appreciable number of people, one is evaluating a truncated census that is substantially different from the actual census. Also, if these additions or corrections are considerably different in coverage error characteristics in comparison with the remainder of the population, it would add a bias to the dual-systems estimates. As defined, one could include the coverage follow-up interviews that occurred prior to the CCM interviews in the truncated census, in which case the net coverage error models could condition on whether a follow-up interview was carried out prior to the CCM interviews: this would remove any bias if the P-sample inclusion probabilities depended on the occurrence of the coverage follow-up interviews (but not on its outcome; for details, see Bell, 2005). Information on what the interviews added from outside the CCM blocks also could be used in these models. There are some operational complexities to this idea, including the need to duplicate the formation of relatively large processing files. Finally, as mentioned above, one is not evaluating the complete census: consequently, to assess components of census coverage error resulting from the application of the later changes from the coverage follow-up interviews, one would need to carry out a separate evaluation study outside the CCM blocks, which is a serious disadvantage. A third approach is not to use coverage follow-up interviews in the CCM blocks. This approach avoids any contamination, but then the CCM evaluates an incomplete census, with essentially the same problems listed in the second approach, although it is worse because no results from coverage follow-up interviews could be used. A fourth approach is to let the coverage follow-up and CCM interviews occur in whatever order they do and treat contamination in net coverage models as a constant effect times an indicator variable for which of the two interviews comes first. The difficulty with this approach is that the effect of whichever interview comes second is not clear, so it is not clear that contamination can be effectively modeled through use of a constant effect. For example, contamination might be subject to various interaction effects.
OCR for page 71
Coverage Measurement in the 2010 Census A fifth approach is to delay the CCM interviews until the coverage follow-up interviews are complete. Such a delay solves the contamination problem, but it introduces other problems. For example, coverage evaluation interviews that occurred in August 1980 were less useful than those in April due to the large number of movers that occurred during the four-month period. Thus, this approach could have a substantial, negative impact on the quality of the CCM data that are collected in 2010, depending on the length of time between the census and the CCM. After considering these approaches, the Census Bureau decided on the last one—to delay the CCM interviews until after all coverage follow-up interviews are completed. There were several arguments given in support of this decision: The Census Bureau would not have to plan on having a substandard census in any area, which would certainly be true of the third approach. Combining the interviews, the first approach, might harm both interviews. The fourth approach—letting the two interviews occur whenever they fell—is speculative and would be difficult to assess prior to the 2010 census. The second and third approaches (excluding some of the interviews) would require some assumptions about the nature of the late coverage follow-up interviews and would also require a large, parallel census database. (For details on the Census Bureau’s views on contamination, see Kostanich and Whitford, 2005.) It may be the case that further work would have demonstrated the advantages of either a truncated census (the second approach) or of combining the two interview (the first approach). Also, the panel finds some of the Bureau’s arguments in relation to the second and third approach—in particular the difficulty of duplicating census processing files given the availability of inexpensive computer memory—not fully convincing. However, arguments for or against various alternatives are now moot, given the Census Bureau’s decision. The CCM interviews may not begin until late August or September 2010, which means there will be a relatively larger number of movers between Census Day and the CCM interviews in comparison with the number of movers in 2000. Data from movers in this context are known to be of poor quality, partly because a large fraction of the data collected is from proxy respondents. In addition, there also may be recall problems, since people are being queried about where they lived several months
OCR for page 72
Coverage Measurement in the 2010 Census ago. This reduction in data quality will probably result in estimating fewer matches than there actually are. An early August start for the CCM in 2010 might be possible by expediting certain operations. To determine whether this is feasible, it will be important to collect good data during the dress rehearsal, if relevant, on the possibilities of expediting the initiation of the CCM interviews and to develop a good understanding of how various delays affect the number of movers. Given its concern over a late start to CCM interviewing, the panel would like to raise the possibility of initiating the 2010 CCM data collection prior to the completion of the coverage follow-up interviews, without any accounting for the overlap of the two data collection efforts in the estimation of net coverage error. Decisions on whether to allow these data collections to overlap, and if so, how much, are difficult to assess since they involve the comparison of two biases whose magnitudes are difficult to gauge. One bias stems from the data collected from movers, and the second bias results from the potential contamination—that the census data collected in the CCM block clusters will be different from the remaining census data. Both biases are potentially sizable and, if so, could substantially reduce the utility of the estimates from the CCM. The magnitude of these biases involves a direct tradeoff: as one moves the date for the initial capture of CCM data from mid-June until early September, the contamination bias decreases to zero as the mover bias increases substantially. The available research does not clarify what the size of these two biases is as a function of various factors, especially the date that CCM data collection begins. The uncertainty about the magnitudes of these two biases precludes the panel from recommending how the Census Bureau should proceed. However, the panel’s relatively subjective assessment of the situation is that the mover bias at its maximum (from no overlap) is likely to be substantially greater than the contamination bias at its maximum (starting CCM data collection in, say, late June). Therefore, the panel suggests that the Census Bureau reconsider starting the CCM data collection no later than mid-July, thereby allowing for some modest overlap between the coverage follow-up and the CCM data collections. Whether or not the Census Bureau reconsiders the start date for the CCM, it should endeavor to begin CCM interviewing as soon as possible after the completion of the great majority of the census data collection, which one hopes would be before late July. Consistent with this, to the extent that it is feasible, the management of the coverage follow-up and the CCM data collections should be organized to limit the potential for contamination by selectively starting the CCM data collection in those areas in which the coverage follow-up interviewing has been completed,
OCR for page 73
Coverage Measurement in the 2010 Census monitoring this on as small a geographic basis as possible. Furthermore, there are potential advantages to the use of census designs in which there is modest overlap between the coverage follow-up and the CCM, and that the Census Bureau should consider use of such designs in 2010. Recommendation 4: The Census Bureau should organize census and coverage follow-up data collection so that data collection for the census coverage measurement (CCM) program is initiated as soon as possible after the completion of the census. In particular, the postenumeration survey in a particular area should start as soon as possible after the completion of the great majority of the census data collection—hopefully before late July. The Census Bureau should also consider census designs for 2010 in which there is some modest overlap between coverage follow-up and CCM data collections. ADMINISTRATIVE RECORDS The Census Bureau has explored the potential for using administrative records (data collected as a by-product of administering governmental programs) in the decennial censuses since the 1970s. Possible uses include: (1) supporting a purely administrative records census; (2) improving census nonresponse follow-up, either by using enumerator follow-up only when administrative records do not contain the required information or by completing information for households that do not respond to initial attempts by field enumerators; (3) improving the Master Address File with addresses found in administrative records;3 (4) assisting in coverage measurement, for example, through use of triple-systems estimation;4 and (5) assisting in coverage improvement, for example, by identifying census blocks that may not have been well enumerated or households for which the census count is likely to be in error. One important advantage that administrative records have is that they provide a source of information for hard-to-enumerate groups that is operationally independent of the census processes. Underlying the use of a postenumeration survey is the assumption that reinterviewing people, albeit with a much more intensive interview with more highly trained 3 The Census Bureau has already used administrative records for this purpose. The MAF is already updated using the delivery sequence file from the U.S. Postal Service, which is a type of administrative record, and the MAF is also updated using files from local jurisdictions, which are often based on local administrative sources. 4 Triple-systems estimation is a generalization of dual-systems estimation: In this case the third system would be a merged list of individuals from administrative records (for details, see Zaslavsky and Wolfgang, 1990).
OCR for page 74
Coverage Measurement in the 2010 Census interviewers, will either generate a response when there was previously no response or will provide different information than the respondents provided earlier and thereby correct a incorrect response. One can argue that for a substantial fraction of the cases with coverage error, due to the similarity of the two requests for information, neither assumption may obtain, especially for people that are actively seeking not to be counted in the census. For many such cases, administrative records may provide the only current real chance at enumeration. Until recently, the available administrative records have suffered from several limitations, including: insufficient coverage of the population represented on administrative records; lack of current information (particularly for addresses); lack of information on race and ethnicity; difficulty in unduplicating administrative lists with very few errors; computational burden; and concerns about public perceptions.5 Consequently, none of the potential applications of administrative records have been implemented during a census. Until 2000, there was no comprehensive field test of the benefits of the use of administrative records for any census application, although there were assessments of the coverage of merged administrative lists in assessing the feasibility of an administrative records census. However, administrative records did support at least two major coverage improvement programs—the non-household sources check in 1980 and the parolees and probationers check in 1990. Now, however, several of the limitations just noted have been addressed. The quality and availability of national administrative records are improving, computing power has increased dramatically, and the research group on administrative records at the Census Bureau has achieved some impressive results. The primary program and database, referred to as StARS, now has an extract of a validated, merged, unduplicated residential address list with 150 million entries, 80 percent of which are geocoded to census blocks, and another extract of a validated, merged, unduplicated list of residents with demographic characteristics. These lists are approaching the completeness of coverage that might be achieved by a decennial census (Obenski and Farber, 2005). Seven national files are merged to create StARS, with the Social Security Number Transaction File providing demographic data. As a result of this progress, an administrative records comparison will be one of six circumstances generating a coverage follow-up interview in 2010, which may be the first direct application of administrative records to assist in census enumeration. 5 An approach to the problem of current address can be found in Stuart and Zaslavsky (2002).
OCR for page 75
Coverage Measurement in the 2010 Census However, the progress to date is not a compelling argument for widespread use. The quality of administrative records, and StARS, in support of census field enumeration is still untested, and many of the deficiencies regarding undercoverage, race and ethnicity information, and current address are still worrisome. AREX 2000 provided the only major test to date of the use of administrative records (primarily for use as an alternative method for taking a census). While the population coverage (for the more thorough of the schemes tested) was between 96 and 102 percent relative to the 2000 census counts for the five test site counties, AREX 2000 and the census counted the same number of people at the housing unit level only 51.1 percent of the time and counted within one person of the census count for only 79.4 percent of the households. So although the potential of administrative records is obvious, these ideas need further development and evaluation. In order to make sure that an important opportunity is not being missed, but also to verify that administrative records can provide real benefits, the Census Bureau would need to support a wide-ranging and systematic research program on decennial census applications of administrative records that is amply funded and staffed. Such a program would have the specific goal of deciding which of the potential uses of administrative records are and are not feasible for use in 2020. Such decisions would have to be made by 2015 so that there would be sufficient time before the census for final testing and to best integrate these various activities into the 2020 census design. Administrative records can still be used in a limited way in the 2010 census, in addition to the role they are playing in generating the coverage follow-up interviews. In particular, administrative records might be considered either for coverage improvement or for coverage measurement in 2010. The panel believes that it no longer makes sense to view the use of administrative records as an interesting possibility for some unspecified census in the future. We believe it is crucial to comprehensively assess their potential now for use in the 2020 census. We propose six potentially feasible uses of administrative records in a census: to improve the MAF or other address lists, in late-stage non-response follow-up, for item imputation, to improve targeting of coverage follow-up interviews, for assistance on the status of nonmatches, and to evaluate a census coverage measurement program. Improvement or Evaluation of the Quality of the MAF or the Address List of the Postenumeration Blocks The quality of the MAF is key to a successful mailout of the census questionnaires and nonresponse follow-up, and the quality of the independent list that is created in the CCM blocks in 2010 will be key to a successful coverage measurement program. StARS provides a list of addresses that could be used in at least
OCR for page 76
Coverage Measurement in the 2010 Census two ways. First, the total number of StARS addresses for small areas could be checked against the corresponding MAF or PES totals to identify areas with large discrepancies that could be relisted. Second, more directly, address lists could be matched to identify specific addresses that are missed in either the MAF or the PES address listings, with discrepancies followed up in the field for resolution. Note that although administrative records could be used to improve the address list for either the census or the PES, to maintain independence they should not be used for both. Assistance in Late-Stage Nonresponse Follow-Up The Census Bureau makes several attempts using field enumerators to collect information from mail nonrespondents to the census. When these attempts fail to collect information, attempts are made to locate a proxy respondent and, when that fails, hot-deck imputation (filling in for the nonresponse with the data for a randomly selected, geographically proximate household) is used to supply whatever information is needed, including the residence’s vacancy status and the household’s number of residents. If the quality of StARS information is found to be at least as good as that from hot-deck imputation or even proxy interviews, it might be effective to attempt to match nonrespondents to StARS before either pursuing a proxy interview or using hot-deck imputation. Especially with a short-form-only census, StARS might be sufficiently complete and accurate for this purpose. Further, one might profitably make fewer attempts at collecting nonresponse data by making use of StARS information, for example, after only one or two attempts at nonresponse follow-up, thereby substantially expediting and reducing the costs of nonresponse follow-up. Item Imputation The Census Bureau often uses item imputation to fill in modest amounts of item nonresponse. Item nonresponse could affect the ability to match a P-sample individual to the E-sample, and missing demographic and other information may result in an individual being placed in the wrong poststratum or the use of the wrong covariate information in a logistic regression. Item imputation based on information from StARS may be preferable to hot-deck imputation. The use of StARS to provide item imputation was tested as part of the 2006 census test, but the results were not available in time for this report. Targeting the Coverage Improvement Follow-Up Interviews The coverage improvement interview in 2010, as currently planned, will follow up households with any of the following six conditions: (1) characteristics for the additional people in large households who did not fit on the census questionnaire, (2) count discrepancies between the indicated number of residents and the number of persons for whom information is
OCR for page 77
Coverage Measurement in the 2010 Census provided, (3) potential duplicates identified by a national match of census enumerations to themselves, (4) persons who, given their responses to coverage probes, may have been enumerated at other residences in addition to the one in question (potential duplicates), (5) persons who, given their responses to coverage probes, sometimes stayed at the housing unit in question and who may have been omitted from the census, and (6) people in households with different counts than in a list generated from administrative records. The workload for this operation might well exceed the Census Bureau’s capacity to carry out the necessary field work given limited time and resources. It might be possible to use administrative records to help identify situations in which field resolution is not needed, for example, by indicating which of a set of duplicates is at the proper residence. (Uses of StARS in similar ways were tested in the 2006 census test, but the results were not available in time for this report.) Determination of the Status of Nonmatches It is possible that administrative records can be used to determine the status of a nonmatch prior to follow-up of nonmatches in the postenumeration survey. It is very possible that nonmatches of the P-sample to the census may be resolved, for example, by indicating that there was a geocoding error or a misspelled name, thereby saving the expense and time of additional CCM field work. Evaluation of the Census Coverage Measurement Program The quality of many of the steps leading to production of dual-systems estimates might be checked using administrative records. For example, administrative records information might be used to assess the quality of the address list in the P-sample blocks, to assess the quality of the matching operation, or to assess the quality of the small-area estimation of population counts. We note, however, that any operation that makes use of administrative records cannot also use the same administrative records for purposes of evaluation. The administrative records group at the Census Bureau has already had a number of successful applications of StARS. First, an administrative records census was conducted in five counties during the 2000 census, and its quality was judged to be comparable to that of the census in those counties. (This assessment is somewhat surprising given that, as pointed out above, the agreement between StARS and the census counts was only slightly above 50 percent.) Second, StARS was used to explain 85 percent of the discrepancies between the Maryland Food Stamp Registry list of recipients and estimates from the Census Supplementary Survey in 2001 (the pilot American Community Survey).
OCR for page 78
Coverage Measurement in the 2010 Census Since the panel’s suggested uses of administration records depend crucially on the quality of the merged and unduplicated lists of addresses and people in StARS, prior to the implementation of StARS for any of the above purposes in 2010 (except arguably for coverage measurement), it would be necessary to evaluate the use of administrative records in comparison to the current method used in the census. Alternatively, the use could be an additional process added to the census, in which case it would be necessary to assess the likely effects on the quality of the census enumerations along with its likely costs. If there are no opportunities for a careful test of feasibility and effectiveness of applications of administrative records in 2009, additional uses of administrative records in 2010 will not be feasible. Thus, it is likely that additional uses of administrative records, besides their current role in coverage follow-up interviews, will have to wait until 2020. However, the 2010 census provides an important opportunity for testing the above ideas. Therefore, the panel suggests that the more promising of the above applications be developed sufficiently to support a rigorous test in 2010, with additional refinement during the intercensal period, with the goal of implementation in 2020 should the subsequent evaluation support their use. (This idea is consistent with a recommendation of the Panel on the Design of the 2010 Census Program of Evaluations and Experiments; see National Research Council, 2007.) If tests during 2010 are not feasible, the panel believes the highest priority should be given to testing during 2012–2015, as a first step toward the possible substantial use of administrative records in the 2020 census. In particular, given the promise of administrative records in relation to the census’ greatest challenge, reducing omissions, we strongly advocate the testing of the use of administrative records for coverage improvement and as part of the coverage measurement program or to assess the effectiveness of the coverage measurement program in measuring the number of census omissions in 2020. If data from StARS are used successfully in the coverage follow-up interviews in 2010 or if early tests of administrative records in the next decade strongly indicate their applicability and value for various census applications, the Census Bureau could consider even more ambitious uses of administrative data in the 2020 census. Specifically, for many housing units, the Census Bureau might use administrative data not just to replace late-stage follow-up, but as a replacement for the entire nonresponse follow-up interview. This use would seem to be especially valuable in situations in which the enumerators had determined that the nonresponding household was occupied. Under this approach, the Census Bureau would use data from administrative records to determine the occupancy status of some nonresponding housing units and the number and characteristics of
OCR for page 79
Coverage Measurement in the 2010 Census its residents. To do so, the Census Bureau would have to develop criteria of adequacy of the information in the administrative records to establish the existence and count of the household for this purpose. For example, agreement of several records of acceptable currency and quality might be considered sufficient to use the information as a substitute for a census enumeration, which would reduce the burden of field follow-up. This use of administrative records would represent a substantial change in what constitutes a census enumeration of at least the same conceptual magnitude as the change from in-person to mail enumerations as the primary census methodology. However, given that the completeness of administrative records systems and the capabilities for matching and processing administrative records has been growing, and given that public cooperation with survey field operations appears to be declining (though the mail response rate in the 2000 census was slightly better than that in 1990, reversing a trend over the past few censuses), it seems increasingly likely that administrative records will soon provide enumerations of quality at least as good as field follow-up for some housing units. Furthermore, unlike purely statistical adjustment methods, every census enumeration would correspond to a specific person for whom there is direct evidence of his or her residence and characteristics. The long-run potential for such broader contributions from administrative records is a reason to give high priority to their testing in the 2010 census. Three possible objections might be raised in opposition to this approach. First, this use of administrative records may be ruled to be inconsistent with interpretations of what an enumeration means in the Constitution. Second, public perception that the government will otherwise obtain the information might reduce response to the census mailout questionnaire. Third, like any use of administrative records for other than their intended purpose, this may raise public concerns about a loss of confidentiality. These three issues are not compelling arguments against moving forward, but they would need to be addressed before the Census Bureau could implement their use in 2020. In summary, if the Census Bureau is to position itself to be able to make an informed decision about the value of administrative records to fulfill a variety of possible functions in the 2020 census, it needs to make use of the various testing opportunities in both the 2010 census and in the early part of the 2010–2020 intercensal period to assess which of the applications listed here are feasible and effective. Otherwise, important benefits may be missed since one cannot implement the ideas absent a careful evaluation. Even with a successful test, there will be a number of implementation complexities that will have to be dealt with, and waiting to test such ideas in 2016 or later will likely not leave enough time.
OCR for page 80
Coverage Measurement in the 2010 Census Recommendation 5: The Census Bureau should use the various testing opportunities in both the 2010 census and in the early part of the 2010–2020 intercensal period to assess how administrative records can be used in the 2020 census.