Skip to main content

Currently Skimming:

4 Technical Issues
Pages 81-118

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 81...
... , use of logistic regression models, missing data in new coverage error models, matching cases with minimal information, and demographic analysis. On several of these topics the panel offers recommendations for the Census Bureau.
From page 82...
... The Census Bureau examined some alternative specifications for the design of the CCM PES to see if they might have advantages, using simulation studies of both the quality of the resulting net coverage error e ­ stimates and the quality of estimates of the number of omissions and erroneous enumerations at the national level and for 64 poststrata (for details, see Fenstermaker, 2005, 2006)
From page 83...
... Aside from sample size, the selection of a sample design for the CCM in 2010 will involve addressing related but somewhat competing goals, given that there are two overall objectives of the coverage measurement program for 2010. First, there is the primary objective -- the measurement and analytic study of components of census coverage errors.
From page 84...
... So if one had a list of potentially worrisome places where census processes are likely to enumerate certain kinds of housing units with a high frequency of coverage error, those places should be oversampled in the 2010 CCM design. But this should be done while maintaining the ability to produce reliable estimates of net coverage error at some level of geographic and demographic detail.
From page 85...
... and allocates the remaining sample to anticipated problematic regions or block clusters. Such a change would potentially provide a much greater number of census coverage errors to support models examining which factors relate to coverage error.
From page 86...
... Simulation studies of the design alternatives mentioned above, using these metrics, may identify designs that are nearly as effective as the Cen­ sus Bureau's current design at estimating net coverage error at the level of states and major demographic groups while increasing the number
From page 87...
... If the suggested study is carried out, then, analyses in 2010 to identify which factors are and are not associated with various components of coverage error can be used to refine the ­models used for incorporating components of coverage error to better plan the coverage measurement data collection in 2020. Finally, a very serious complication in carrying out this research plan is that many of the most important predictive factors in statistical models of components of census coverage error will have to be indicator variables for the various census processes used in association with the enumeration of each housing unit or individual.
From page 88...
... Of course, over time, new problems will crop up, and old ones will be addressed, and so the process of census improvement will be a dynamic one. Given that the design of the 2010 CCM PES needs to target block groups that have a higher frequency of housing units that are vulner­ able to census coverage error, the Census Bureau should give serious consideration to alternative designs that, without sacrificing much effi­ ciency in estimating net coverage error, could provide a larger number of (anticipated to be)
From page 89...
... LOgistic regression models In the last few years the Census Bureau has devoted a considerable amount of its resources on coverage measurement research to improv­ ing the estimation of net coverage error in 2010, with a primary focus on developing two logistic regression models to replace poststratification to address correlation bias. Any small-area estimates of net coverage error will likely be based on these same logistic regression models, replacing the use of (so-called)
From page 90...
... Finally, not only is logistic regression likely to be better than poststratification in estimating net coverage error for these reasons, but it is also much better suited for the analytic purposes of providing a better understanding of which factors are and are not related to net coverage error than poststratification. Poststratification is mentioned in the earliest literature advocating the use of dual-systems estimation (DSE)
From page 91...
... In theory, for the same reasons that logistic regression may be preferred to poststratification at the aggregate at which that analysis is carried out, small-area estimates that are based on the probabilities of match and correct enumeration status estimated using logistic regression could improve on those provided through syn­ thetic estimation by effectively averaging over more of the data. In the following, a number of issues relevant to the use of logistic regression are raised and discussed, and a variety of suggestions are
From page 92...
... They proposed two separate logistic regressions to model match status (using P-sample data) and correct enumeration status (using the E-sample data)
From page 93...
... Another competing estimator replaces the correct ˆ enumeration probability, pCEj , in these two alternatives by an indicator function for those individuals in the domain that had correct enumeration status, reducing the modeling to only the logistic regression model of match status. The problem with these two alternatives is that they are too sensitive to sampling variation.
From page 94...
... The Census Bureau has focused much of its efforts to date regarding developing the logistic regression approach on the performance of six models for both the P-sample matches and the E-sample correct enu­ merations. These logistic regression models all use explanatory variables that are indicator variables of various combinations of the levels of six factors used to define the 416 poststrata used in the March 2001 net under­coverage estimates: race/origin (seven groups)
From page 95...
... and the information criteria AIC and BIC also provide use­ ful penalties for comparing regression models in a predictive situation. The situation for comparing nonnested models is less straightforward but important to address since the Census Bureau may need to make such comparisons.
From page 96...
... survey weights. The results of the Census Bureau's cross-validation comparison of the five alternative logistic regression models to the 2000 A.C.E.
From page 97...
... This sub­ stitution would not have been available using poststratification. Initial indications are that this substitution provides only modest benefits for the overall fit of the logistic regression models, but there may be substantial advantages for estimation of specific demographic groups, particularly
From page 98...
... However, as mentioned in the discussion concerning use of logistic regression ­models to substitute for synthetic estimation, any predictors used in these logistic regression models must be available from the census to support estima­ tion of net census error for any domain (at least in the form currently proposed for use by the Census Bureau)
From page 99...
... This information is moderately consistent with the variables currently included in the logistic regression models being examined by the Census Bureau, but the linkage between the research findings and the predictors in these models is not as direct as one would like. The logistic regression models should reflect what is known about the sources of census coverage error, to the extent that this information is represented on the short form and in available contextual information.
From page 100...
... A related issue can arise in the application of logistic regression m ­ odels of both the match rate and the correct enumeration rate, but it is substantially more difficult to assess. If the variables differ for these two logistic regression models, coverage rate estimates for some combinations of these variables might be biased, although it is not known whether this would cause bias for the domains (defined by geography, age, race/­ ethnicity, etc.)
From page 101...
... Finally, it should be stressed that this balancing problem is only relevant to the estimation of net coverage error -- it does not arise in modeling the frequency of components of census coverage error. The panel supports further work in developing logistic regression models, given their promise, particularly in looking for the benefits of additional covariates (again, including transformations and interactions)
From page 102...
... In such a research effort, the predictors that are clearly effective in the logistic regression models for match status and correction enumeration status may not be the same predictors that are effective in modeling the components of census coverage error. We describe the various types of predictors that should be considered for use in these models in Chapter 5.
From page 103...
... These missing data problems are currently addressed by some form of imputation. Four general principles of imputation are worth bearing in mind when assessing and refining current approaches (see Little and Rubin, 2002:Chapters 4–5)
From page 104...
... Currently, the Census Bureau first imputes missing characteristics for the P-sample interviews. Next, using those imputed values along with the collected P-sample values and a before-follow-up match code, a logistic regression model is used to impute match status.
From page 105...
... The Census Bureau tends to "compartmentalize" these missing data problems -- first solving the problem of missing Xs and Zs, and then addressing the problem of missing Ms -- in estimating the parameters for the logistic regression model for match status. However, it is better to conceptualize the problem as multivariate missing data, since fully effective imputation for missing data needs to preserve the relationships between various sets of missing and non­ missing values for all the variables that have missing data.
From page 106...
... To demonstrate the advantages of alternative 2 with the intermediate degree of conditioning, one can imagine a number of situations involv­ ing name, which is typically used to determine whether records match but is not typically suggested for use in the logistic regression model to impute match status. Situations range from those in which the E-file and P-file records have the same name, some with spelling inconsistencies but the names appear to match, and some without a name on one or both the E-file, the P-file, or both.
From page 107...
... The Census Bureau appears to be assessing match status, in which there is missing information for various characteristics, based on both P-file and E-file information: however, the Bureau's logistic regression model for the imputation of match status uses P-file information and a limited amount of E-file information through use of the before-­followup match codes. This approach could result in too many imputed non­ matches.
From page 108...
... After estimating the logistic regression of M on X, imputations for missing census characteristics are needed to provide the predictors for input to the logistic regression models to estimate a match probability for these cases, through: ( E )
From page 109...
... 103. The Census Bureau should identify missing data methods that are consistent with the philoso­ phy that is articulated above and implement those methods in support of statistical models of Census Coverage Measurement data in 2010.
From page 110...
... Given that the emphasis in 2000 was on the estimation of net census error, this ­inflation of the estimates of the rates of erroneous enumeration and omission was of only minor concern. However, with the new focus in 2010 on estimates of components of census coverage error, there is a greater need to find ­alternative methods for treating KE enumerations.
From page 111...
... residents and addresses, is also a likely source of information on the number of housing units and residents at small levels of geographic aggregation that could also be used to improve demographic analysis estimates. 11  For the remaining unresolved cases, the Census Bureau currently plans to treat them in a separate category as "enumerations unable to evaluate."
From page 112...
... The error in net undercoverage estimates from demographic analysis then stems from error in the various components, error in the census counts, and any lack of alignment of the demographic categories. Given these concerns, the most reliable outputs from demographic analysis are any national counts by age and sex, and functions of such counts, in particular sex ratios by age; birth and death estimates; and historical patterns of various kinds.
From page 113...
... In addition, the demographic analysis program will produce sex ratios by age and race/ethnic origin, possibly for use in reducing the effects of correlation bias on estimates of net undercoverage from the census coverage measurement program. Even without any major advances from 2000, demographic analysis will still likely play an important role in evaluation of the 2010 census.
From page 114...
... information on sex ratios for more detailed ethnic and racial groups. How should each of these information sources be best used to improve demographic analysis, and what evaluations should be used to support decisions of implementation?
From page 115...
... Because the most obvious source of correlation bias (heterogeneity of enumeration probabilities) would not have resulted in a negative bias for dual systems estimates, the most conservative step, in terms of additional counts, is to leave estimates for the female population unchanged and to increase the male population enough so that the resulting sex ratios for the adjusted counts agree with those from demographic analysis.
From page 116...
... Estimation of Uncertainty of Demographic Analysis The Census Bureau (see Robinson et al., 1993) conducted initial research on developing uncertainty intervals for population forecasts, but to date these have not been fully developed.
From page 117...
... Those improvements include improving the measurement of undocumented and documented immigration, development of sub­ national geographic estimates, development of estimates of uncertainty, and further refining methods for combining demographic analysis and coverage measurement survey information. Recommendation 8: The Census Bureau should give priority to research on improving demographic analysis in the four areas: (1)


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.