Evaluation of Some Common Arguments Against ICM-Based Adjustment of the Census
This chapter addresses in detail a number of issues in the statistical research literature that raise important concerns as to whether integrated coverage measurement (ICM) should be used in the 2000 census. This substantial statistical literature—especially portions of Survey Methodology for June 1992, the Journal of the American Statistical Association for September 1993, and Statistical Science for November 1994—specifically addresses the questions of whether to use adjusted counts for the 1980 and 1990 censuses. It contains analyses that both support and oppose the panel's position, that is, that use of integrated coverage measurement in the 2000 census will in all likelihood result in counts that are preferable to the ''unadjusted" counts for key uses of census data.1 In this chapter the panel discusses the results and arguments presented in this literature, providing a detailed argument supporting the panel's position on the likely effectiveness of integrated coverage measurement in the 2000 census. We stress that this chapter is not concerned with issues that might arise in the field operations necessary to support integrated coverage
measurement, except that we do make the assumption that field operations supporting matching in 2000 will be at least as successful as they were in 1990.
The substantial statistical literature highlights three specific concerns related to the use of census adjustment: (1) matching error and the bias from imputation of match status for unresolved cases, (2) unmodeled heterogeneity in census undercoverage for lower levels of geographic aggregation (violation of the so-called synthetic assumption), and (3) correlation bias, and the heterogeneity of probabilities of enumeration of individuals in the census and the integrated coverage measurement survey.2 These concerns are related to potential failures of the statistical assumptions that underlie the integrated coverage measurement estimators. Such assumptions are used only as approximations to the truth (which can never be known), and so the relevant point is not whether these assumptions obtain exactly but the extent to which they do and do not apply and the resulting effects on the quality of the adjusted census counts in comparison with the quality of the unadjusted counts.
Before proceeding, we must consider how one might assess whether adjusted counts are preferred to unadjusted counts, or, more generally, how any one set of estimated counts is preferred to another. In general, the "closer" a set of counts is to the true counts, the better. There are a variety of measures of disparity, known as loss functions, that measure how "close" a set of estimated counts is to the true counts. These loss functions are defined so that smaller values indicate more accurate counts. The many possible loss functions represent different uses of the data and varying notions of the costs of disparities. For example, the apportion-
ment formula for the U.S. House of Representatives can be interpreted as minimizing a certain loss for the discrepancy between the fraction of the population in each state and the fraction of the representatives they are granted (see Balinski and Young, 1982). Since the census is used at different levels of aggregation, loss functions are also applied at different levels of aggregation, such as states, counties, or school districts. Two examples of loss functions are the weighted sum of squared deviations of estimated county counts from true county counts, where the weights might be the inverse of population size, or the sum of absolute deviations of estimated state shares from true state shares.
Given the many uses of census counts, it is unlikely that in comparison of two reasonable sets of estimates, all of the relevant loss functions would find one set of counts superior to the other. However, some uses of the census counts, such as reapportionment, are generally considered of particularly great importance, and it makes sense to consider the loss functions associated with those uses. Most of the key uses of census counts are to allocate a "fixed pie," and therefore loss functions that measure how close the estimated shares are to the true shares at some level of geographic aggregation are more important than loss functions that measure how close the estimated counts are to the true counts.
No one can directly measure loss since a set of true counts does not exist. Therefore, to assess whether loss is greater for one set of counts or shares than another, indirect means are needed.
Matching Error and the Bias From Imputation of Match Status for Unresolved Cases
The panel recognizes that bias was present from a variety of sources in both the adjusted and the unadjusted census counts in 1990 and that bias will be present again in the counts from the 2000 census, whether or not ICM information is used to "adjust" the 2000 census. The panel emphasizes that the statistical term "bias" refers to the fact that an estimator is, on average, higher (or lower) than the quantity it is intended to estimate. As used here, "bias'' carries no connotation of prejudice or manipulation. The key inputs to a decision of whether to use integrated coverage measurement should be estimates of the size of the biases and variances to which the two competing sets of counts are subject. The most useful framework in which to compare adjusted and unadjusted counts is the total error model, used by Mulry and Spencer (1991, 1993; see also Zaslavsky, 1993), in which all biases and variances of the competing counts can be accounted for by measuring their effect on a selected loss function.
All evidence from previous censuses suggests that the unadjusted
census contains substantial biases. Adjusted counts will retain some bias while adding variance through the use of PES sample-based information. Even without use of the total error model, there is evidence that supports the panel's position—that is, that the remaining bias is likely to have a limited effect on the estimated undercount. One source of potential bias that has received particular attention is matching error, including the bias from imputation of match status for unresolved matches.
Matching plays a key role in dual-system estimation. Errors from matching must be minimized, since bias from matching error of the order of just a few percentage points would be of the same order as differential undercount and therefore make it difficult to support the use of integrated coverage measurement. The amount of matching error and bias and variance from the imputation of unresolved match status (actually match probability) must be factored into any decision on the use of integrated coverage measurement. To support integrated coverage measurement, one must be convinced that the amount of matching bias and variance is small enough that the adjusted counts are still more accurate than the unadjusted counts. (Assessment of the effect of matching error in combinations with other sources of bias is then conducted using a total error model.)
A post-enumeration survey has two main components—the P-sample and the E-sample. The P-sample consists of households found by the post-enumeration survey in PES blocks; the E-sample consists of census enumerations for PES blocks. P-sample matches (matches of P-sample households to the census) are key to estimating the rate of census gross undercoverage. P-sample matching error arises from the incorrect determination of which persons in the ICM survey can be matched to persons in the census enumeration. E-sample matches are key to estimating the rate of erroneous enumeration. E-sample matching error is due to the incorrect determination of which forms collected in the census enumeration are erroneous enumerations. Both P- and E-sample matching have three stages: (1) a computer match of a large fraction of the sample (expected to be at least 80 percent for both samples for 2000); (2) a clerical match of most of the remainder, with field follow-up of unresolved matches to collect more information to reduce the number of unresolved cases; and (3) imputation of match status for unresolved matches. (The clerical match is often assisted by potential matches suggested by the computer matching algorithm.) A great many instances of unresolved match status are due to inability to collect adequate information on a household (including the address) or on the people at an address.
In the 1990 P-sample, interviews were not obtained from 1.2 percent of the households, and 2.1 percent of the individuals in interviewed households had unresolved match status because of incomplete information
(Belin and Diffendal, 1991; Belin et al., 1993). Only 0.9 percent of the E-sample had unresolved match status (Ericksen et al., 1991). Logistic regression models were used in 1990 to impute match probability for unresolved cases for both the P- and the E-samples. (Some of the theory underlying these models and assessments of the variability they add to adjusted counts can be found in Belin et al., 1993.) Although simpler imputation methods are planned for the 2000 census to substitute for the use of logistic regression, the argument that this source of error will remain limited in 2000 is similar given the sensitivity analysis work cited below.
Given that at least 3 percent of P-sample cases in 1990 had unresolved match status, an inadequate imputation model would make it difficult to use integrated coverage measurement. (It is also important to check if there were subgroups for which the percentage with unresolved match status was not substantially larger than the overall rate; otherwise, the estimates for a subgroup could be poor.) However, the available evidence indicates that the imputation models worked well. Belin et al. (1993) describe the Census Bureau's effort to validate the P-sample imputation model. The Bureau carried out an evaluation follow-up interview study in which 11,000 households in a sample of evaluation blocks were reinterviewed to collect all data on address errors, non-interviews, and so forth, to resolve their enumeration status. (Given the distance in time from census day, it was successful in resolving enumeration status for only slightly more than 40 percent of households.) In this study, 31.6 percent of households were determined to have been enumerated in the 1990 census. The mean probability of enumeration imputed for these cases using the logistic regression model was 32.2 percent, which compares extremely well with the survey results. This is solid support for the use of the imputation model for P-sample match status. Given the extent of nonresponse in the evaluation follow-up interviews, it is not conclusive evidence, but it is strong evidence against concern that the imputation model was seriously wrong. Furthermore, the work of Mulry and Spencer (1991, 1993), based on the research of Mack (1991), demonstrates that the use of reasonable alternative match status (probability) imputation routines would not have appreciably changed the adjusted census counts in 1990. Therefore, the contribution to loss from misspecification of the logistic regression model is likely small.
The remaining concern involving matching is the frequency of individuals who were assigned matches that were not true matches, or vice versa. The concern that matching error could be the source of an appreciable bias is reasonable because (1) clerical matching involves an element of judgment, even though it is carried out following standardized procedures, and (2) there are more opportunities to mistakenly declare a
matching case to be a nonmatch than to mistakenly declare a nonmatch to be a match, possibly resulting in too high an estimate of the number of nonmatches. (However, given the liberal use of unresolved status, this may be less asymmetric than it appears.) This would give the estimate of undercount a positive bias.3
There were two primary sources of information on matching error from 1990. The first source was the Matching Error Study (Davis et al., 1991, 1992), which involved a dependent rematch of a subsample of 919 block clusters (71,000 P-sample cases), where the rematch was conducted by more highly skilled personnel with more time than those who worked in the census. The term "dependent rematch" indicates that the decisions previously made by clerks during the 1990 census were known to the rematch staff. The results of these studies indicate that the estimated bias in the P-sample match rate for 10 or 13 evaluation poststrata (depending on the study) was only significantly different from zero for one or two poststrata. (We have argued elsewhere for the use of loss functions to make these types of assessments.) The potential effect of this bias on the dual-system estimation counts in one of these two studies was to overestimate the population in these two evaluation poststrata by 1.3 and 0.7 percent, respectively, which was substantially below the amount of net undercount (6.8 percent and 4.0 percent, respectively) for the associated groups in the census (see Mulry and Spencer, 1993).
Breiman (1994) focuses attention on the disagreement rates from this study: he points out that for P-sample cases the average disagreement rate across enumeration strata between the original match status and that of the rematch staff for those cases originally classified as unresolved matches, weighted to the total population, was 23.8 percent.
Although the Breiman result is certainly higher than one would like, it is not particularly disturbing. First, the unresolved matches are a relatively small fraction of the total population. This is clear from the fact that the overall disagreement rate estimated as a fraction of the total population is 1.8 percent. Also, the number of P-sample matches in the rematching study, weighted to the total population, differs from the number in the census production matching by only 0.18 percent. The difference appears because disagreement rates do not allow for offsetting errors.
That is, for a poststratum, one erroneous match and one erroneous nonmatch cancel each other out, so disagreement rates do not translate directly into bias estimates. Finally, some of the individual disagreements would occur when the rematch produced either a match or a nonmatch, when one would surmise that the imputation routine for an unresolved case often produced a probability of match that was either, respectively, very high or very low, which would be essentially an agreement.
The second source of information on the quality of the matches in 1990 is from Ringwelski (1991). After some clerical matching in 1990 was completed, the more difficult matching was processed by two different teams, designated SMG1 and SMG2, which worked independently of each other. Though all cases of disagreement proceeded to an oversight match group, the disagreement rate between SMG1 and SMG2 was about 10 percent, which indicates less reliability than one would desire in the clerical match of difficult cases. The specific disagreement rates, discussed by Breiman (1994), were 10.7 percent for matches, 6.6 percent for nonmatches, and 31.2 percent for unresolved cases. However, this again is presumably an overestimate of the extent of the problem since a large fraction of cases involved one match group designating a case as a match and the other group designating the same case as unresolved, where the unresolved case (possibly frequently) could have been given an imputed match status probability of close to 1.0, thereby contributing little to differences in estimated undercount. Furthermore, it must be understood that this result involves only 10 percent of less than 25 percent, or less than 2.5 percent of the cases; it does not directly measure matching error; and it does not allow for offsetting errors.
The 2000 census could be subject to increases in matching problems, since large increases in the percent of individuals that use "Be Counted" forms in 2000 over the percent that used "Were You Counted" forms in previous censuses would be problematic, and there could be substantially increased difficulties in matching due to use of PES-C or PES-A rather than PES-B.4 It might be sensible to behave as if "Be Counted" would continue to be a relatively small number of additions, based on the experience of the test censuses, but the problems from use of PES-A or PES-C are harder to assess a priori (see discussion in Chapter 3).
In summary, assuming that matching methods in the 2000 census are even modestly improved over those used in 1990, and assuming that changes in census procedures since 1990 do not add substantial new chal-
PES-A, PES-B, and PES-C are various methods for treating the matching of movers as part of a post-enumeration survey and dual-system estimation; see the discussion in Chapter 3.
lenges to matching, matching is unlikely to have a substantial effect on the resulting adjusted population counts. To measure more directly the effect of matching error on adjusted counts, matching error studies for 2000 should try to directly estimate loss, rather than use hypothesis tests, on both a count and a share basis, at the state level and substate levels of interest, in order to measure the effect of matching error on adjusted counts.
Unmodeled Heterogeneity in Census Undercoverage for Lower Levels of Geographic Aggregation
Direct estimates of census undercoverage will exist in 2000 (roughly) at the level of the poststrata, which represent relatively large levels of geographic and demographic aggregation, likely on the order of hundreds of thousands of individuals. In 1990, 1,392 poststrata were initially used for a small PES sample, and 357 poststrata were later used for purposes of examining adjustment for intercensal estimation. The precise number of poststrata for the 2000 census has not been determined, but the need to produce direct state estimates (so all poststrata are defined within state boundaries) will cause, everything else being equal, the number of poststrata to increase relative to 1990. However, the more poststrata, the more variable are the poststrata estimates of undercoverage. These two considerations will likely result in 500 to 1,000 poststrata. Once these direct estimates are made, synthetic estimation and iterative proportional fitting are planned to be used in 2000 to produce estimates at the lowest levels of aggregation (i.e., blocks) consistent with (summing to) the higher level direct estimates.
Clearly, these undercoverage estimates at very low levels of aggregation must derive from direct estimates of much larger aggregates. As a result, a second critical argument often put forth is that while the adjusted census counts are likely better at the original levels of geographic and demographic aggregation (i.e., at the level of poststrata), the adjusted counts are inferior to the census counts at much lower levels of aggregation. The panel argues above that the performance of estimated counts at very detailed levels of geographic aggregation (say, blocks and block groups) is not critical since the key uses of decennial census counts are for purposes such as apportionment, redistricting, fund allocation, and public and private planning, which typically make use of census counts at higher levels of aggregation. The estimates at lower levels of aggregation are used primarily as "building blocks." However, there are some uses of census counts at lower levels of aggregation than the poststrata, so it is important to determine whether adjusted counts are at least as good as unadjusted counts at lower levels of aggregation. The panel finds two
arguments that support this point. First, as Tukey (1983) demonstrates, assuming that it has been established that adjusted counts are preferred at a higher level of geographic aggregation in a single poststratum, adjusted counts (not aggregated over demographic groups) produced by synthetic estimation will also be preferred for all lower levels of geographic aggregation. Here, the term "preferred" reflects that they have lower loss, determined through use of a specific loss function for population counts.5
As an example, assume that there are three areas, A, B, and C, where the aggregate census count is 180 and the adjusted count is 192. Assume that the census counts of areas A, B, and C are 30, 60, and 90, respectively. The adjusted counts at this level of aggregation using synthetic estimation would be 32, 64, and 96, allocating the 12 additional people in proportion to the census counts. Even if the entire undercounted population happened to reside in area A, it is still the case that the adjusted counts would have less loss and therefore be preferred to the unadjusted counts. (In this case the contribution to loss from this poststratum for adjusted counts is 3.05; for the census counts it is 3.43.) Since we have only represented the case for a single poststratum, we cannot demonstrate the contributions to a share loss function. This is discussed below.
The advantages of synthetic estimation have also been examined through use of simulations at the state level by Schirm and Preston (1987, 1992). Using an empirical approach, they demonstrated that counts produced using synthetic estimation were preferred to unadjusted census counts in a wide variety of simulated circumstances. The benefits are acknowledged to be relatively modest—which is only to be expected since no new information is being provided at that level of aggregation—but the preference for adjusted counts to unadjusted counts occurs with relatively high probability. Using results based on the 1990 census, Hartigan (1992) also found that synthetic adjustment is likely to help.
In two respects these analyses do not settle the issue. First, synthetic estimates used for adjustment are aggregated over poststrata to produce estimated counts for small areas. As pointed out by National Research Council (1985), when the results of synthetic estimation are aggregated over demographic poststrata to produce small-area estimates, the optimality theorem demonstrated by Tukey no longer holds, and examples can be created in which unadjusted counts are preferred to the adjusted ones. However, such counterexamples are difficult to construct and are probably relatively rare. It would require, for example, that the
undercount for an undercounted group is less in areas where the group is more concentrated (see Schirm and Preston, 1992), which is contrary to anecdotal evidence that extreme undercounts occur in areas with the most concentrated problems.
Second, the simulation results from Schirm and Preston and Hartigan, as well as Tukey's results, assume that the adjusted estimates for the poststrata have less error than the corresponding unadjusted estimates. A more complete and realistic simulation would assume that estimates for various poststrata are subject to error of various magnitudes probabilistically, and then see whether synthetic estimation does result in counts with reduced loss as measured by typical loss functions.
The research by Wolter and Causey (1991), which partially addresses this point, provides the second and more compelling defense of adjustment at low levels of aggregation. They investigated the problem of when adjustment would be preferred at various levels of aggregation (state, county, and enumeration district), assuming that adjusted counts are unbiased for the truth. Given some assumptions about the distribution of the errors, Wolter and Causey (1991:284) found:
For future censuses, we believe that the following may be a good rule of thumb: Census correction is worthwhile within a stratum if the actual CV [coefficient of variation] of the external estimator of total population is less than the true census undercount rate. For perspective, we note that the Census Bureau's 1990 post-enumeration survey will include about 150,000 housing units, achieving a sampling CV of about 1.4 percent in each of about 100 sampling strata. Total CV's for the 1990 post-enumeration survey will include the 1.4 percent, plus various additions because of nonsampling error . . . and minus various deductions as a result of fitting hierarchical regression models. . . . Thus the level and distribution of original population counts within a stratum will be moved closer to their true values by the correction methods studied here, provided that the actual CV of the post-enumeration survey (designed to be the net of the 1.4 percent plus additions minus deductions) is less than the true undercount rate.
Expanding on this last point, given that the post-enumeration survey for the 2000 census is planned to include 750,000 housing units and assuming 500 poststrata, and assuming that the net of nonsampling error and smoothing is zero additional error, the coefficient of variation in each poststratum would be expected again to be about 1.4 percent. This number is likely to satisfy Wolter and Causey's rule of thumb with respect to expected undercount rates for the 2000 census in poststrata with substantial undercoverage and will otherwise not substantially alter the counts from the census enumeration. If there are 1,000 poststrata, the coefficient of variation would rise to about 2.0 percent.
As pointed out by Schafer (1993), Census Bureau staff are well aware that the "synthetic assumption"—namely, that small areas are homogeneous with respect to their undercoverage properties—is clearly false. Also, Ericksen and Kadane (1991) point out that given the PES sample size and the limited set of variables that are collected on the census short form, the Census Bureau is limited in the number of poststrata that can be formed. So some heterogeneity will exist. However, the question is instead whether the counts resulting from the use of this assumption are inferior to the unadjusted counts with respect to sensible loss functions. The above argument indicates that adjusted counts could very well be preferred at even low levels of aggregation.
Unfortunately, Wolter and Causey's work is based on two assumptions that may limit the applicability of their results. First, the assumption of the unbiasedness of the adjusted counts is clearly not true. Sensitivity analyses should be carried out to examine the effects of the relaxation of this assumption on their results. Second, geographic effects that are not addressed by the post-stratification used that affect the degree of census undercoverage could result in higher errors for low levels of geographic aggregation than represented by Wolter and Causey's analysis. Again, some postulated amount of low-level geographic heterogeneity that is not taken care of through post-stratification should be incorporated into their analysis to see what the effects might be on adjusted loss in comparison to census loss.
One might ask what the total error model of Mulry and Spencer (1991, 1993) indicates about the error through use of the synthetic assumption. Freedman and Wachter (1994) were concerned that it was misguided since it ignored the effects of the failure of the synthetic assumption on comparisons between adjusted and unadjusted counts. To measure the extent to which this might be true, Freedman and Wachter analyzed proxy variables (variables that are assumed to be related to the variable of interest, e.g., the percentage of people who failed to mail back their census questionnaire and the percentage of people whose entire census records were imputed) for which there is no (or essentially no) sampling variability and for which the extent of the failure of the synthetic assumption could be measured directly. Their analysis found that the Mulry and Spencer analysis was biased against adjustment for six of the eight proxy variables, essentially unchanged for one variable, and biased in favor of adjustment for the remaining variable. Of course, analyses using proxy variables are somewhat dependent on the similarity of the relevant characteristics of the distributions (i.e., the patterns of heterogeneity) of the proxy variables to that of the undercount. However, Freedman and Wachter's analysis suggests that the Mulry and Spencer analysis was not biased in favor of adjustment.
Two additional points are worth noting. First, a hypothesis test used in the CAPE report (Committee on Adjustment of Postcensal Estimates, 1992) tested whether adjusted counts had significantly less loss than unadjusted counts. This test demonstrated that adjusted counts were preferred to unadjusted counts at more aggregate geographic levels, but the test did not demonstrate this preference for lower levels of aggregation, i.e., adjusted counts were not shown to be clearly preferred to unadjusted counts. Use of hypothesis testing in this way is not fully informative as a method for comparing adjusted and unadjusted counts since it treats the two sets of counts very asymmetrically. The converse probably was also true—that is, that unadjusted counts were likely not to have demonstrated to have significantly lower loss than adjusted counts. One can argue that a minor advantage of adjusted counts should be ignored because of the many administrative costs and political complications raised from the use of adjusted counts for official purposes. However, a direct comparison of expected loss, with some acknowledgment of the above additional costs, would be preferable to formal hypothesis testing.
Second, the analyses conducted by Tukey, Schirm and Preston, Hartigan, and Wolter and Causey (cited above), considered both loss functions for population counts and loss functions for population shares. However, the question of improvement for shares or counts does complicate the analysis of the benefits of synthetic estimation.
Correlation Bias and Heterogeneity of The Probabilities of Inclusion in Dual-system Estimation
Two kinds of departures from the standard assumptions used in dual-system estimation can cause the resulting estimates to be biased: lack of independence between the event of being enumerated in the census and the event of being enumerated in the post-enumeration survey and correlated heterogeneity (across enumeration systems) in the individual probabilities of being enumerated. While these are conceptually distinct, they both produce the same result—biased estimates.6 We concentrate here on correlated heterogeneity, which causes the bias referred to as correla-
tion bias.7 The independence assumption, as mentioned above, is supported by the Census Bureau's considerable efforts to ensure that the post-enumeration survey is operationally independent of the census. Dependence cannot be measured at the individual level, and at the aggregate level its effect is fully confounded with correlated heterogeneity of enumeration probabilities.
Some effort has been made to model heterogeneity in enumeration probabilities at the level of individuals (see, e.g., Alho et al., 1993), but these efforts are limited by the information that is collected on census forms. It is generally believed that people do have different probabilities of being enumerated and that these probabilities are a function of various individual characteristics. Furthermore, given the similarities of the census and the post-enumeration survey, it is likely that these characteristics would have a similar effect on census and PES enumeration probabilities, which engenders correlated heterogeneity and results in correlation bias. Some of this bias is reduced through use of poststrata that have people with similar characteristics, who thus have similar probabilities of enumeration. The extent to which correlation bias, widely accepted as the largest source of bias in dual-system estimation when used in the decennial census, remains after poststratification, and the effect of any remaining correlation bias on the relative preference of adjusted to unadjusted census counts and shares is the main topic of this section.
In this section, we often refer to the cells of the 2-by-2 contingency table used in dual-system estimation. The set-up is as follows:
The heterogeneity of enumeration probability in the census and the post-enumeration survey is recognized by the Census Bureau, which decided in 1990 to use 1,392 poststrata to minimize heterogeneity in the 1990 census.8 (Sekar and Deming (1949) advocated use of poststrata for the
same purpose.) Since the plans to adjust the 1990 census were required to be prespecified and the pattern of heterogeneity could not be examined a priori, the Census Bureau decided to create a relatively large number of poststrata to accommodate whatever heterogeneity patterns might be discovered. Later analysis identified some patterns of similarity among poststrata with respect to undercoverage. Collapsing of poststrata resulted in the final use of only 357 poststrata for purposes of intercensal estimation. The tables in Hogan (1993) indicate that the enumeration probabilities do differ substantially across these poststrata, so the poststrata do account for some heterogeneity. Unfortunately, it is very difficult to measure how much of the total heterogeneity was removed using either the original 1,392 or the later 357 poststrata, but it is safe to conjecture that other variables that were unavailable to the Census Bureau would have further reduced the heterogeneity. Therefore, the panel agrees with Schafer (1993) and Ericksen and Kadane (1991) that the use of poststrata very likely did not eliminate heterogeneity
The only direct evidence on the size and effects of correlation bias is at the national level and is acquired through demographic analysis. (Even at the national level, demographic analysis is subject to error. Attempts are currently being made to quantify this error; see Robinson et al., 1993. Bell (1993) using demographic analysis to estimate the degree of correlation bias, determined that the total of 4th cells for black males aged 20 to 44 in the 1990 census should have been estimated to be around three times larger than the count estimated through assuming homogeneity in the enumeration probabilities. Other demographic groups experienced different degrees of estimated correlation bias.
Even though little is known, at the level of the poststrata, about the effect of heterogeneity on adjusted counts, correlation bias does not negate the superiority of adjusted to unadjusted counts. As shown by Kadane et al. (1999), if the probabilities of enumeration in both the post-enumeration survey and the census are positively correlated within poststrata, the adjustment would be biased, but in the right direction. This assumed positive correlation is reasonable since census procedures are similar to PES procedures. Therefore, correlation bias due to heterogeneity of enumeration probabilities is likely to result in dual-system-based estimates that are imperfect but better than no adjustment.
This argument is at least somewhat dependent on the use of a loss function based on population counts, rather than population shares. For "small" adjustments, Taylor series arguments can be made to show that similar benefits would transfer to share loss functions. Also, an estimated undercount that had similar bias across poststrata would clearly be beneficial for share loss functions. However, the first point is not compelling for larger adjustments, and it is unlikely that dual-system estimates have
similar bias across poststrata. Further research needs to be carried out as to the effects of correlation bias on loss functions for shares. (Although more theory would be desirable, this question may be more of an empirical than a theoretical one.) A greater understanding of the magnitude of correlation bias in the various poststrata would help to inform a decision as to whether adjusted counts are preferred for share loss functions.
The above argument implicitly assumes that everyone has a non-zero probability of being enumerated in the census and the post-enumeration survey. Some have argued that there is so-called hard undercoverage, individuals who have an enumeration probability equal to zero. Darga (1998) suggests that possibly a great majority of individuals have either a probability of one or zero of being enumerated in the census and the post-enumeration survey. It is possible that at least a close approximation to this problem exists: for example, portions of the homeless population certainly have enumeration probabilities that are very small, if not zero. But this is not a concept that can be rigorously defined. For example, what is often ignored is that some people who refuse to cooperate are still enumerated during last resort through information provided by neighbors, landlords, and postal workers.
We show first that most of census undercoverage is likely 3rd cell undercoverage, those directly measured as missing through use of the post-enumeration survey, as opposed to 4th cell undercoverage, which is estimated through use of the assumption of no correlation bias. Then we demonstrate further why the existence of hard undercoverage is also not a compelling argument against use of adjusted counts.
First, for most poststrata, the size of the estimated 4th cell of the 2-by-2 contingency table is small compared with the size of the estimated 3rd cell—individuals missed by the census enumeration but included in the post-enumeration survey. The ratio of the 3rd cell relative to the estimated 4th cell should be roughly equal to the probability of being enumerated in the post-enumeration survey divided by the probability of being missed in the post-enumeration survey. For 1990, nationally, the sum of the 3rd cells was 18.8 million, while the sum of the 4th cells was 1.5 million (Thompson, 1992). This estimate of 18.8 million 3rd cell enumerations is biased high as an estimate of the number of gross census omissions, since, e.g., census enumerations with insufficient information cannot be matched to the post-enumeration survey, and as a result those people who should have been 1st (or 2nd) cell enumerations become included as 3rd cell enumerations. A relatively unbiased estimate of the number of gross census omissions, as estimated by dual-system estimation, in the 1990 census is 9 million (GAO, 1992).
Clearly, the majority of those added by dual-system estimation are "3rd cell adds," those for whom, in a straightforward sample-based infer-
ence, there is direct evidence of their being missed in the census and counted in the post-enumeration survey. (There is also information on individual and household characteristics for these missed people.) Many if not most of these additions must be due to deficiencies in census operation, since the methods used by the post-enumeration survey and the census are relatively similar. This result strongly suggests that deficiencies in census operations are associated with much measured census undercoverage.
Furthermore, rough correspondences of counts for historically undercounted groups from dual-system estimation in 1990 with those from demographic analysis suggest the possibility that a majority of all census undercoverage, even that not accounted for by dual-system estimation, is due to 3rd-cell enumerations. For example, in 1990, the PES-estimated undercoverage for blacks was 4.6 percent (Hogan, 1993) while that from demographic analysis was 5.7 percent (Robinson et al., 1993). The difference—undercoverage not measured by the PES—is roughly only one-quarter of what the post-enumeration survey counted. (This argument can be extended nationally to suggest that hard undercoverage is less than 2 million, much less than the estimated 9 million gross census omissions. Estimates derived using this method also could be overestimates since some of the difference between post-enumeration survey and demographic analysis is due to correlation bias.) Therefore, the post-enumeration survey accomplishes what it was (partially) designed to do: measure the extent to which census operations are not perfect. The remaining undercoverage is likely substantially less than the part represented by the 3rd cell. (Of course, improved estimation of the population in the 4th cell is still important.)
Second, what would be the effect on adjusted counts from hard undercoverage? Like the argument with respect to the impact of heterogeneity of enumeration probability, the hard undercoverage problem results in a situation at aggregate levels in which the adjusted counts, while not a perfect solution, are still preferred to the unadjusted census counts using loss functions for population counts.9 However, it is more difficult to assert the same for a loss function for population shares, which relates to the key uses of census data for apportionment, most fund allocation, etc.
The worry is that the hard undercoverage population could be dis
tributed in such a way that shares based on adjusted counts would be inferior to shares based on the census counts. The argument against this reasoning is that there is no characteristic that is known to cause census undercoverage that is also known to be distributed in a strongly nonuni-form manner across poststrata. The ethnographic studies that took place in 1988 and 1990 (see Brownrigg and de la Puente, 1993) suggest that the following characteristics are associated with census undercoverage: mobility, language problems, concealment, irregular relationship to head of household, and resistance to government interaction. Some of these characteristics are more prevalent in areas in which the estimated undercount is large. There are no data available to support the hypothesis that hard undercoverage exists and is largest in areas in which estimated undercount is small.
The panel's conclusion, stated broadly, is that one should measure what one can and that, for what cannot be directly measured, it is appropriate to act consistently with the assumption that the part that cannot be directly measured is, at worst, uncorrelated with the part that can be measured. In addition, it seems unreasonable to ignore information about the distribution of a major part of the undercount because there is a hypothetical, unmeasurable, but very likely smaller component that, only if it had a particular (empirically unsupported) distribution, would cause adjusted shares to have greater loss than unadjusted shares.
The more that can be understood about the distribution of the undercounted population, the better informed will be decisions about adjusting the census, especially with respect to loss functions for shares. Efforts to describe census undercoverage in more detail include the research of Hengartner and Speed (1993), who show that the amount of (unexpected) block-level geographic heterogeneity in estimates of census undercoverage is comparable to the amount of demographic (poststrata-explained) heterogeneity. (Of course, census undercoverage at the block level is very indirectly measured, which complicates the interpretation of their findings.) This finding suggests that there could be geographically based clustering of the undercounted population that might reduce the effectiveness of adjustment for share loss functions. (One possibility is that this geographic pattern in undercoverage is due to enumerator effects.) The work of Hengartner and Speed cannot be used to directly compare adjusted and unadjusted counts with respect to a share loss function. Research that directly addresses this issue would be useful.10
All three issues discussed in this chapter have demonstrated that the criticisms against the use of integrated coverage measurement in the 2000 census involve matters for which more research undoubtedly would be useful and areas for which technical or operational improvement (e.g., with respect to matching) would make the decision to use adjustment more clear. The panel discusses this literature to further support its endorsement of integrated coverage measurement. It argues that these three issues are not sufficiently compelling to shift the panel's position supporting the use of integrated coverage measurement as a reliable method for reducing census differential undercoverage and, more broadly, for improving the quality of census counts for the key purposes for which they are used.