This appendix summarizes Mulry and Kostanich (2006). They begin by hypothesizing a P-census, which is the P-sample if the entire United States were included in a postenumeration survey (PES). The P-census is also idealized in that no errors are assumed to be made in its data collection or matching, though the P-census can miss, at random, some correct enumerations in the census.
The authors then categorize people on the basis of the quality of their data, that is, whether their census questionnaire has errors or non-response, as follows:
those correctly enumerated in the census, CE,
those enumerated in the census but in the wrong location, WL,
those erroneously enumerated in the census, EE,
those with insufficient information for matching to the P-census, II,
those that are not data defined in the census, NDD, and
those omitted in the census, OM.
The authors also divide the population into four subsets by crossing the following two dichotomies: whether or not a census enumeration has sufficient information for matching and whether or not a census enumeration is in the P-census. The subscript ij indicates subset membership: the first index is equal to 1 for those with sufficient information for matching and 0 otherwise; the second index is equal to 1 with inclusion in the
Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 145
Appendix A
A Framework for Components of
Census Coverage Error
This appendix summarizes Mulry and kostanich (2006). They begin
by hypothesizing a Pcensus, which is the Psample if the entire United
States were included in a postenumeration survey (PES). The Pcensus is
also idealized in that no errors are assumed to be made in its data collec
tion or matching, though the Pcensus can miss, at random, some correct
enumerations in the census.
The authors then categorize people on the basis of the quality of
their data, that is, whether their census questionnaire has errors or non
response, as follows:
1. those correctly enumerated in the census, CE,
2. those enumerated in the census but in the wrong location, WL,
3. those erroneously enumerated in the census, EE,
4. those with insufficient information for matching to the Pcensus, II,
5. those that are not data defined in the census, NDD, and
6. those omitted in the census, OM.
The authors also divide the population into four subsets by crossing
the following two dichotomies: whether or not a census enumeration has
sufficient information for matching and whether or not a census enumera
tion is in the Pcensus. The subscript ij indicates subset membership: the
first index is equal to 1 for those with sufficient information for match
ing and 0 otherwise; the second index is equal to 1 with inclusion in the
1
OCR for page 145
16 COVERAGE MEASUREMENT IN THE 2010 CENSUS
Census Eligible for Matching PCensus
Esample Universe
In Not In
In In In CE11 CE10 EE10
WL11 WL10
Not In II01 II00 EEII00
Not In NDD01 NDD00 EENDD00
Not In OM01 OM00
FIguRE A-1 Elements of dualsystems estimation.
SOURCE: Adapted from Mulry and kostanich (2006).
Pcensus and 0 otherwise. See Figure A1 for a depiction of the various
subsets of the total population using this taxonomy.
The result is 13 separate cells, defined as follows:
CE11: correct enumeration in the census and in the Pcensus
CE10: correct enumeration in the census and missed in the Pcensus
EE10: erroneous enumeration in the census and missed in the Pcensus
(which would include both erroneous enumerations as defined in
this report and duplicate enumerations in the census
EEII00: erroneous enumeration in the census with insufficient infor
mation for matching and missed in the Pcensus
EENDD00: erroneous enumeration in the census and not datadefined
and missed in the Pcensus
WL11: enumerated in the wrong location in the census and in the
Pcensus
WL10: enumerated in the wrong location in the census and missed in
the Pcensus
II01: insufficient information for matching in the census and counted
in the Pcensus
II00: insufficient information for matching in the census and missed
in the Pcensus
NDD01: not data defined in the census and in the Pcensus
NDD00: nor data defined in the census and missed in the Pcensus
OM01: missed in the census and in the Pcensus
OM00: missed in the census and missed in the Pcensus
The following additional relationships are used below:
CE = CE11 + CE10
WL = WL11 + WL10
OCR for page 145
1
APPENDIX A
II = II01 + II00 + EEII00
NDD = NDD01 + NDD00 + EENDD00
OM = OM01 + OM00
Thus:
Census =
CE11 + CE10 + WL11 + WL10 + II01 + II00 + NDD01 + NDD00 +
EE10 + EEII00 + EENDD00;
True Population =
(1)
CE11 + CE10 + WL11 + WL10 + II01 + II00 + NDD01 + NDD00 +
OM01 + OM00;
Net Census Error =
True Population – Census = OM10 + OM00 – EE10 –
EEII00 - EENDD00;
P-Census =
CE11 + WL11 + II01 + NDD01 + OM01
Given that the number of correct enumerations, CE, is equal to
CE11 + CE10; that the number of enumerations in the Pcensus, P, is equal
to CE11 + WL11 + II01 + NDD01 + OM01; and that the number of the Pcensus
matches to correct census enumerations in the matching universe in the
correct location, M, is equal to CE11, one can reexpress the dualsystems
estimator,
P
DSE = CE ,
M
in terms of the cell counts as
(CE + WL11 + II01 + NDD01 + OM01 )
DSE = ( CE11 + CE10 ) (2)
11
.
CE11
To justify this formula, the authors express three assumptions that are
used in practical implementation of dualsystems estimation as a function
of the entire set of 13 quantities:
Assumption 1: The basic assumption underlying dual-systems estimation is
that the proportion of the true population correctly enumerated in the census
equals the proportion of the P-census enumerated in the census.
This can be expressed as
CE + WL + II01 + II00 + NDD01 + NDD00 CE11 + WL11 + II01 + NDD01
= .
CE11 + WL11 + II01 + NDD01 + OM01
DSE
OCR for page 145
1 COVERAGE MEASUREMENT IN THE 2010 CENSUS
Turning this around:
CE + WL11 + II01 + NDD01 + OM01
DSE = ( CE + WL + II01 + II00 + NDD01 + NDD00 ) 11 . (3)
CE11 + WL11 + II01 + NDD01
Assumption 2: It is assumed that correct enumerations in the match-
ing universe are included in the P-census at the same rate as all correct
enumerations.
That is, it is assumed that cases insufficient for matching can be treated as
missing completely at random. This is expressible as
CE11 + WL11 + II01 + NDD01
CE11 + WL11
= . (4)
CE + WL CE + WL + II01 + II00 + NDD01 + NDD00
Assumption : Given that the search for a match is geographically limited, it
is assumed that the proportion of people that should be enumerated but are
called erroneous because they are in the wrong location equals the proportion
of matches that are not found because they are in the wrong location.
This assumption is the socalled balancing of erroneous enumerations
and nonmatches and is equivalent to the statement that the proportion of
correct enumerations found because they are in the correct location equals
the percentage of matches found because they are in the correct location.
This can be expressed as
CE11 + CE10 CE11
= ,
( CE11 + CE10 + WL11 + WL10 ) CE11 + WL11
which can be reexpressed as
CE11 + CE10 ( CE11 + CE10 + WL11 + WL10 ) CE + WL (5)
= = .
CE11 + WL11 CE11 + WL11
CE11
Substituting expressions (4) and (3) into (2), we have:
(CE + WL11 + II01 + NDD01 + OM01 )
DSE = ( CE11 + CE10 ) , (6) = (2)
11
CE11
therefore justifying dualsystems estimation when the above three assump
tions obtain.
The dualsystems estimation expression can be rewritten as
CE10
DSE = ( CE11 + CE10 + WL11 + II01 + NDD01 + OM01 ) + (WL11 + II01 + NDD01 + OM01 ) ,
CE11
which is equal to the true population if the last term is equal to the miss
ing elements in expression (1): that is, if
OCR for page 145
1
APPENDIX A
CE10
(WL11 + II01 + NDD01 + OM01 ) = WL10 + II00 + NDD00 + OM00 . (7)
CE11
The quantity on the righthand side of (7) is referred to as the fourth cell—the
people who are missed by both the census and by the Pcensus. If one
assumes that the property of being correctly included in the census at the
correct location is statistically independent of being in the Pcensus, then
CE10
( CE11 + WL11 + II01 + NDD01 + OM01 ) = CE10 + WL10 + II00 + NDD00 + OM00 ,
CE11
which is equivalent to (7).
Mulry and kostanich also discuss what information is available from
the field as to which of the sample of census enumerations, and which
of the Psample enumerations (many of which are the same individuals)
fall into the various 13 types of enumerations listed above. Recall that
the Psample enumerations are only matched to matchable census enu
merations in a search area. Also, for persons who have moved into the
Psample block clusters after census day, the Psample is matched to their
residence address on census day. Matches therefore provide an estimate
of the number of correct enumerations in the correct location that were
included in the Psample. The Psample is composed of matches and
nonmatches: the matches, again ignoring sampling variation, are equal to
CE11, and the nonmatches are equal to II01 + WL11 + NDD01 + OM01. These
various types of nonmatches are not distinguishable without further data
collection.
The number of census enumerations is the sum of the correct enumer
ations and erroneous enumerations (as defined by the Census Bureau), or
E = CE + EE, where CE = CE11 + CE10. In the expression CE = CE11 + CE10,
the components are distinguishable for nonmovers because in matching
the Psample to the Esample, it is determined which census enumera
tions were included and which were missed in the Psample. However,
the two components of correct enumerations are not distinguishable for
movers.
Mulry and kostanich further address the measurement of compo
nents of census coverage error. If one wants to decompose the various
summary estimates, more information would be needed than that used
to support dualsystems estimation.
When the objective is the estimation of net coverage error, a very strict
definition of correct enumeration is used, involving a small restricted
search area within the relevant Psample block cluster (and possibly a
small area surrounding that area). But when the objective is to measure
components of census coverage error, one can define a correct enumera
tion in a variety of ways to conform to a given tabulation of interest.
OCR for page 145
10 COVERAGE MEASUREMENT IN THE 2010 CENSUS
For instance, a correct enumeration can be in the correct county, state,
or simply included correctly in the United States, the latter being the
approach taken to simplify the argument given.
Mulry and kostanich state their goal is partly to obtain estimates of
the number of erroneous enumerations, EE10 + EEII00 + EENDD00, and
the number of census omissions, OM01 + OM00. (In this report, the panel
states there is also interest in estimating the number of enumerations in
the wrong place and the number of duplicate enumerations.)
Unfortunately, because of enumerations in the wrong location and
enumerations with either insufficient information for matching or not
data defined, subtracting CE from the census count gives an inflated esti
mate of the number of erroneous enumerations, EE10 + EEII00 + EENDD00.
Specifically, Census – CE11 – CE10 = WL11 + WL10 + II01 + II00 + NDD01 +
NDD00 + EE10 + EEII00 + EENDD00 , so Census – CE is the sum of erroneous
census enumerations (which includes duplicates) plus census enumera
tions in the wrong location plus correct census enumerations with insuf
ficient information for matching.
For the same reason as for erroneous enumerations, subtracting the
matching enumerations from the Pcensus does not provide an unbiased
estimate of the number of omitted people in the census, OM10 + OM00.
In fact, P – M = II01 + WL11 + NDD01 + OM01. To obtain an estimate of
the number of omissions, note that DSE – Census = NetCensusError =
OM01 + OM00 – EENDD00 + EEII00 + EE10, and, therefore, OM01 + OM00 =
NetCensusError + EENDD00 + EEII00 + EE10. So, to estimate the number
of omissions, one can take an estimate of the net census error and add
to it the number of erroneous enumerations (including the number of
duplicates).
The Census Bureau plans to use two definitions of a correct enumera
tion in 2010, one to provide a quality estimate of net census error, which
among other things will help to estimate the number of omissions, and
one to estimate the remaining components of coverage error.
To estimate the number of erroneous enumerations, the Census
Bureau will need:
• to collect additional data to determine where enumerations should
be included if the search area is not the correct location;
• to match the Esample enumerations against the full set of census
enumerations for duplicates, with field validation if necessary to
establish proper census residence; and
• for enumerations in the Esample but not in the matching universe,
to strive to match to the Psample (when possible) to identify
those kEs (responses that are census datadefined but have insuf
OCR for page 145
11
APPENDIX A
ficient information for matching as defined in 2000) that are correct
enumerations.
This appendix omits the remaining details: Mulry and kostanich
discuss how one could separate out those enumerated in the wrong loca
tion from those that are erroneous, other complications raised by cases
with insufficient information for matching, movers, and duplicates, and
when to use imputation methods. Finally, the estimates of the components
are generally represented as sample weighted averages, mainly of 01
indicator variables, but also of imputed probabilities.
OCR for page 145