Appendix A
A Framework for Components of Census Coverage Error

This appendix summarizes Mulry and Kostanich (2006). They begin by hypothesizing a P-census, which is the P-sample if the entire United States were included in a postenumeration survey (PES). The P-census is also idealized in that no errors are assumed to be made in its data collection or matching, though the P-census can miss, at random, some correct enumerations in the census.


The authors then categorize people on the basis of the quality of their data, that is, whether their census questionnaire has errors or non-response, as follows:

  1. those correctly enumerated in the census, CE,

  2. those enumerated in the census but in the wrong location, WL,

  3. those erroneously enumerated in the census, EE,

  4. those with insufficient information for matching to the P-census, II,

  5. those that are not data defined in the census, NDD, and

  6. those omitted in the census, OM.

The authors also divide the population into four subsets by crossing the following two dichotomies: whether or not a census enumeration has sufficient information for matching and whether or not a census enumeration is in the P-census. The subscript ij indicates subset membership: the first index is equal to 1 for those with sufficient information for matching and 0 otherwise; the second index is equal to 1 with inclusion in the



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 145
Appendix A A Framework for Components of Census Coverage Error This appendix summarizes Mulry and kostanich (2006). They begin by hypothesizing a P­census, which is the P­sample if the entire United States were included in a postenumeration survey (PES). The P­census is also idealized in that no errors are assumed to be made in its data collec­ tion or matching, though the P­census can miss, at random, some correct enumerations in the census. The authors then categorize people on the basis of the quality of their data, that is, whether their census questionnaire has errors or non­ response, as follows: 1. those correctly enumerated in the census, CE, 2. those enumerated in the census but in the wrong location, WL, 3. those erroneously enumerated in the census, EE, 4. those with insufficient information for matching to the P­census, II, 5. those that are not data defined in the census, NDD, and 6. those omitted in the census, OM. The authors also divide the population into four subsets by crossing the following two dichotomies: whether or not a census enumeration has sufficient information for matching and whether or not a census enumera­ tion is in the P­census. The subscript ij indicates subset membership: the first index is equal to 1 for those with sufficient information for match­ ing and 0 otherwise; the second index is equal to 1 with inclusion in the 1

OCR for page 145
16 COVERAGE MEASUREMENT IN THE 2010 CENSUS Census Eligible for Matching P­Census E­sample Universe In Not In In In In CE11 CE10 EE10 WL11 WL10 Not In II01 II00 EEII00 Not In NDD01 NDD00 EENDD00 Not In OM01 OM00 FIguRE A-1 Elements of dual­systems estimation. SOURCE: Adapted from Mulry and kostanich (2006). P­census and 0 otherwise. See Figure A­1 for a depiction of the various subsets of the total population using this taxonomy. The result is 13 separate cells, defined as follows: CE11: correct enumeration in the census and in the P­census CE10: correct enumeration in the census and missed in the P­census EE10: erroneous enumeration in the census and missed in the P­census (which would include both erroneous enumerations as defined in this report and duplicate enumerations in the census EEII00: erroneous enumeration in the census with insufficient infor­ mation for matching and missed in the P­census EENDD00: erroneous enumeration in the census and not data­defined and missed in the P­census WL11: enumerated in the wrong location in the census and in the P­census WL10: enumerated in the wrong location in the census and missed in the P­census II01: insufficient information for matching in the census and counted in the P­census II00: insufficient information for matching in the census and missed in the P­census NDD01: not data defined in the census and in the P­census NDD00: nor data defined in the census and missed in the P­census OM01: missed in the census and in the P­census OM00: missed in the census and missed in the P­census The following additional relationships are used below: CE = CE11 + CE10 WL = WL11 + WL10

OCR for page 145
1 APPENDIX A II = II01 + II00 + EEII00 NDD = NDD01 + NDD00 + EENDD00 OM = OM01 + OM00 Thus: Census = CE11 + CE10 + WL11 + WL10 + II01 + II00 + NDD01 + NDD00 + EE10 + EEII00 + EENDD00; True Population = (1) CE11 + CE10 + WL11 + WL10 + II01 + II00 + NDD01 + NDD00 + OM01 + OM00; Net Census Error = True Population – Census = OM10 + OM00 – EE10 – EEII00 - EENDD00; P-Census = CE11 + WL11 + II01 + NDD01 + OM01 Given that the number of correct enumerations, CE, is equal to CE11 + CE10; that the number of enumerations in the P­census, P, is equal to CE11 + WL11 + II01 + NDD01 + OM01; and that the number of the P­census matches to correct census enumerations in the matching universe in the correct location, M, is equal to CE11, one can re­express the dual­systems estimator, P DSE = CE , M in terms of the cell counts as (CE + WL11 + II01 + NDD01 + OM01 ) DSE = ( CE11 + CE10 ) (2) 11 . CE11 To justify this formula, the authors express three assumptions that are used in practical implementation of dual­systems estimation as a function of the entire set of 13 quantities: Assumption 1: The basic assumption underlying dual-systems estimation is that the proportion of the true population correctly enumerated in the census equals the proportion of the P-census enumerated in the census. This can be expressed as CE + WL + II01 + II00 + NDD01 + NDD00 CE11 + WL11 + II01 + NDD01 = . CE11 + WL11 + II01 + NDD01 + OM01 DSE

OCR for page 145
1 COVERAGE MEASUREMENT IN THE 2010 CENSUS Turning this around:  CE + WL11 + II01 + NDD01 + OM01  DSE = ( CE + WL + II01 + II00 + NDD01 + NDD00 )  11  . (3) CE11 + WL11 + II01 + NDD01   Assumption 2: It is assumed that correct enumerations in the match- ing universe are included in the P-census at the same rate as all correct enumerations. That is, it is assumed that cases insufficient for matching can be treated as missing completely at random. This is expressible as CE11 + WL11 + II01 + NDD01 CE11 + WL11 = . (4) CE + WL CE + WL + II01 + II00 + NDD01 + NDD00 Assumption : Given that the search for a match is geographically limited, it is assumed that the proportion of people that should be enumerated but are called erroneous because they are in the wrong location equals the proportion of matches that are not found because they are in the wrong location. This assumption is the so­called balancing of erroneous enumerations and nonmatches and is equivalent to the statement that the proportion of correct enumerations found because they are in the correct location equals the percentage of matches found because they are in the correct location. This can be expressed as CE11 + CE10 CE11 = , ( CE11 + CE10 + WL11 + WL10 ) CE11 + WL11 which can be re­expressed as CE11 + CE10 ( CE11 + CE10 + WL11 + WL10 ) CE + WL (5) = = . CE11 + WL11 CE11 + WL11 CE11 Substituting expressions (4) and (3) into (2), we have: (CE + WL11 + II01 + NDD01 + OM01 ) DSE = ( CE11 + CE10 ) , (6) = (2) 11 CE11 therefore justifying dual­systems estimation when the above three assump­ tions obtain. The dual­systems estimation expression can be rewritten as CE10 DSE = ( CE11 + CE10 + WL11 + II01 + NDD01 + OM01 ) + (WL11 + II01 + NDD01 + OM01 ) , CE11 which is equal to the true population if the last term is equal to the miss­ ing elements in expression (1): that is, if

OCR for page 145
1 APPENDIX A CE10 (WL11 + II01 + NDD01 + OM01 ) = WL10 + II00 + NDD00 + OM00 . (7) CE11 The quantity on the right­hand side of (7) is referred to as the fourth cell—the people who are missed by both the census and by the P­census. If one assumes that the property of being correctly included in the census at the correct location is statistically independent of being in the P­census, then  CE10   ( CE11 + WL11 + II01 + NDD01 + OM01 ) = CE10 + WL10 + II00 + NDD00 + OM00 ,   CE11  which is equivalent to (7). Mulry and kostanich also discuss what information is available from the field as to which of the sample of census enumerations, and which of the P­sample enumerations (many of which are the same individuals) fall into the various 13 types of enumerations listed above. Recall that the P­sample enumerations are only matched to matchable census enu­ merations in a search area. Also, for persons who have moved into the P­sample block clusters after census day, the P­sample is matched to their residence address on census day. Matches therefore provide an estimate of the number of correct enumerations in the correct location that were included in the P­sample. The P­sample is composed of matches and nonmatches: the matches, again ignoring sampling variation, are equal to CE11, and the nonmatches are equal to II01 + WL11 + NDD01 + OM01. These various types of nonmatches are not distinguishable without further data collection. The number of census enumerations is the sum of the correct enumer­ ations and erroneous enumerations (as defined by the Census Bureau), or E = CE + EE, where CE = CE11 + CE10. In the expression CE = CE11 + CE10, the components are distinguishable for nonmovers because in matching the P­sample to the E­sample, it is determined which census enumera­ tions were included and which were missed in the P­sample. However, the two components of correct enumerations are not distinguishable for movers. Mulry and kostanich further address the measurement of compo­ nents of census coverage error. If one wants to decompose the various summary estimates, more information would be needed than that used to support dual­systems estimation. When the objective is the estimation of net coverage error, a very strict definition of correct enumeration is used, involving a small restricted search area within the relevant P­sample block cluster (and possibly a small area surrounding that area). But when the objective is to measure components of census coverage error, one can define a correct enumera­ tion in a variety of ways to conform to a given tabulation of interest.

OCR for page 145
10 COVERAGE MEASUREMENT IN THE 2010 CENSUS For instance, a correct enumeration can be in the correct county, state, or simply included correctly in the United States, the latter being the approach taken to simplify the argument given. Mulry and kostanich state their goal is partly to obtain estimates of the number of erroneous enumerations, EE10 + EEII00 + EENDD00, and the number of census omissions, OM01 + OM00. (In this report, the panel states there is also interest in estimating the number of enumerations in the wrong place and the number of duplicate enumerations.) Unfortunately, because of enumerations in the wrong location and enumerations with either insufficient information for matching or not data defined, subtracting CE from the census count gives an inflated esti­ mate of the number of erroneous enumerations, EE10 + EEII00 + EENDD00. Specifically, Census – CE11 – CE10 = WL11 + WL10 + II01 + II00 + NDD01 + NDD00 + EE10 + EEII00 + EENDD00 , so Census – CE is the sum of erroneous census enumerations (which includes duplicates) plus census enumera­ tions in the wrong location plus correct census enumerations with insuf­ ficient information for matching. For the same reason as for erroneous enumerations, subtracting the matching enumerations from the P­census does not provide an unbiased estimate of the number of omitted people in the census, OM10 + OM00. In fact, P – M = II01 + WL11 + NDD01 + OM01. To obtain an estimate of the number of omissions, note that DSE – Census = NetCensusError = OM01 + OM00 – EENDD00 + EEII00 + EE10, and, therefore, OM01 + OM00 = NetCensusError + EENDD00 + EEII00 + EE10. So, to estimate the number of omissions, one can take an estimate of the net census error and add to it the number of erroneous enumerations (including the number of duplicates). The Census Bureau plans to use two definitions of a correct enumera­ tion in 2010, one to provide a quality estimate of net census error, which among other things will help to estimate the number of omissions, and one to estimate the remaining components of coverage error. To estimate the number of erroneous enumerations, the Census Bureau will need: • to collect additional data to determine where enumerations should be included if the search area is not the correct location; • to match the E­sample enumerations against the full set of census enumerations for duplicates, with field validation if necessary to establish proper census residence; and • for enumerations in the E­sample but not in the matching universe, to strive to match to the P­sample (when possible) to identify those kEs (responses that are census data­defined but have insuf­

OCR for page 145
11 APPENDIX A ficient information for matching as defined in 2000) that are correct enumerations. This appendix omits the remaining details: Mulry and kostanich discuss how one could separate out those enumerated in the wrong loca­ tion from those that are erroneous, other complications raised by cases with insufficient information for matching, movers, and duplicates, and when to use imputation methods. Finally, the estimates of the components are generally represented as sample weighted averages, mainly of 0­1 indicator variables, but also of imputed probabilities.

OCR for page 145