National Academies Press: OpenBook

Coverage Measurement in the 2010 Census (2009)

Chapter: Appendix B: Logistic Regression for Modeling Match and Correct Enumeration Rates

« Previous: Appendix A: A Framework for Components of Census Coverage Error
Suggested Citation:"Appendix B: Logistic Regression for Modeling Match and Correct Enumeration Rates." National Research Council. 2009. Coverage Measurement in the 2010 Census. Washington, DC: The National Academies Press. doi: 10.17226/12524.
×
Page 153
Suggested Citation:"Appendix B: Logistic Regression for Modeling Match and Correct Enumeration Rates." National Research Council. 2009. Coverage Measurement in the 2010 Census. Washington, DC: The National Academies Press. doi: 10.17226/12524.
×
Page 154
Suggested Citation:"Appendix B: Logistic Regression for Modeling Match and Correct Enumeration Rates." National Research Council. 2009. Coverage Measurement in the 2010 Census. Washington, DC: The National Academies Press. doi: 10.17226/12524.
×
Page 155
Suggested Citation:"Appendix B: Logistic Regression for Modeling Match and Correct Enumeration Rates." National Research Council. 2009. Coverage Measurement in the 2010 Census. Washington, DC: The National Academies Press. doi: 10.17226/12524.
×
Page 156

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Appendix B Logistic Regression for Modeling Match and Correct Enumeration Rates It is reasonable to suspect that match rates and correct enumeration rates, in addition to being a function of the variables used to define the accuracy and coverage evaluation (A.C.E.) poststrata in 2000, may also vary across the local census offices used to manage the workload in the census. The local office identifiers are on the A.C.E. research database, but they were not included in the six logistic regression models described above or the study by Schindler (2006). Local census office indicator variables might be predictive of match and correct enumeration rates because factors that are particular to small areas could affect ease of enumeration. For example, local economic con­ ditions and the expertise and capabilities of local census office admin­ istrators could vary. Because of the large number of local census offices (more than 500) and the limited amount of data for each, these effects are more naturally represented as random effects. By including these random effects in the logistic regression models, the Census Bureau could estimate the effects of individual offices on match and correct enumeration rates and obtain valid estimates of the contribution of variability across offices to uncertainty about coverage rates in each area. Malec and Maples (2005) explored this approach by adding local area random effects into a synthetic estimation model and then measured the variance component of these random effects for local census offices. The ultimate objective of this approach is a small-area estimation methodol­ ogy that would provide a compromise between synthetic estimation and a design-based estimator for each local office area. 153

154 COVERAGE MEASUREMENT IN THE 2010 CENSUS Because of the complex design of A.C.E.’s postenumeration survey (weighted cases within samples of block clusters), many of the empirically correct enumeration rates and match rates used in Malec and Maple’s model are more variable than the nominal sample sizes would indicate. To account for the extra variability, Malec and Maples (2005) used a pseudo-likelihood approach with effective sample sizes estimated by the bootstrap approach. In this approach, both logistic regression models (for match rate and correct enumeration rate) have the following generic form:  p  log  i , k  = βi + µ k + α i , k ,  1 − pi , k  where bi is the fixed effect for ith poststratum membership, mk is a random effect for the kth local census office, and aik is model error. Furthermore, ( ) µ k ~ N ( 0, Σ ) and α ik ~ N 0 , γ ce(i) , 2 where ce(i) is an index representing the collapsing of the poststrata into 11 or 8 cells, depending on whether the model is applied to the E-sample or the P-sample. Malec and Maples (2005) were able to estimate the large number of parameters in these models using Bayesian simulation. This research suggests that inclusion of small-area effects could sub­ stantially improve coverage estimates. Several questions remain: how best to treat the complex sample design, how many random effects can be included and at what level of aggregation, the best way to estimate the model parameters, and how the model fit should be assessed. The panel is impressed with this high-caliber research that addresses an important issue in coverage modeling; further work in this area would be very valuable. Mulry et al. (2005) examined the following anomalous results in A.C.E. More than 5 percent of incorporated places in 2000 had an esti­ mated net overcount of greater than 5 percent, and 0.5 percent had a net overcount of greater than 10 percent. This result runs counter to findings from the 1980 and 1990 coverage measurement programs of the potential net overcoverage due to true erroneous enumerations and duplications. In contrast with 2000, only 0.1 percent of places had an estimated net under­ count of greater then 5 percent, and nationally, the degree of overcoverage and undercoverage were of essentially the same magnitude. There is a concern that the lack of balance of designated erroneous enumerations and designated omissions may be due to the use of proxy status and the type of census return as poststratification variables for the E-sample but not for P-sample computations.   See http://www.census.gov/dmd/www/ACEREVII_PLACES.txt for a list.

APPENDIX B 155 To examine this further, Mulry et al. (2005) demonstrated that by using proxy status in the E-sample poststratification, there were 91 places with a net overcount of more than 10 percent: however, if it is assumed that there was no error for proxy enumerations, there were only 16 places with net overcounts of more than than 10 percent. Furthermore, if one assumes that there were no errors for proxy enumerations and no errors for late nonmail returns, there were only four places with a net overcount of more than 5 percent. Given this and given that 27 percent of proxy enumerations had insufficient information for matching and follow-up, it is clear that proxy enumerations could contribute to substantial balancing error. The Census Bureau concluded that proxy enumerations contributed to these anomalous findings, but that it was not the only cause. Related research carried out by Spencer (2005) examined the quality of synthetic estimates for block clusters based on A.C.E. revision II esti­ mates, either using 938 E-sample poststrata and 648 P-sample poststrata or using the same 648 poststrata for the E- and P-samples. His findings, in which the standard of comparison was either (a) the direct dual-systems estimate or (b) the census count plus people found in the P-sample who were omitted in the census for each block cluster, suggested that coarser but consistent poststrata may have provided more accurate estimates of net coverage error than finer poststratifications based on different E- and P-sample stratifications. However, for large blocks with proxy rates greater than 10 percent, the finer and inconsistent poststrata performed better. The specific model form for logistic regression is  p  log   = Xβ.  (1 − p )  As described in the literature on generalized linear models, this represents a specific relationship between the mean of a random variable and a linear combination of predictors, called the link function,  y  log  .  (1 − y )  Research on the best link function is continuing at the Census Bureau, with possibilities that include logit, probit, loglog, and robit. An incorrect link function would result in poor extrapolations to situations that do not occur in the P- or E-sample data, unnecessary interaction terms in the model, and other typical results of lack of fit. The panel suggests that if the Census Bureau uses the Hosmer-Lemeshow goodness-of-fit test, it may help to choose the appropriate link function: that test will indicate whether an alternative link function would provide a better fit to the data.

156 COVERAGE MEASUREMENT IN THE 2010 CENSUS Several complications would remain to be addressed. Software for Alternate Link Functions.  If it is discovered that an alter­ nate link function is preferred, it might require a modest amount of software development to implement. However, this should be relatively straightforward in either SAS or R, which are two standard statistical software systems that the Census Bureau uses. Loss Function or Objective Functions for Assessing Fit of Models.  Another complication is that the current loss function underlying the fitting of the coefficients of these logistic regression models is implicit in the separate likelihood equations for the two models and is therefore somewhat dis­ connected from the ultimate goal, which is to predict the population size or, what amounts to the same thing, net coverage error. It may be that the ultimate goal can be better represented by weighting the likelihood equa­ tions to take this modified objective function into account. The Census Bureau has done some work in this direction and we support this research and its implementation if it is found to provide preferred estimates. Measurement Error.  Census data are subject to measurement error, and these errors will have deleterious effects on the application of logistic regression models. If the measurement error is unrelated to the outcome (match status or correct enumeration status), the effect on the data is the attenuation of relationships. In other words, the predictors will not be as effective without the measurement error. But if the measurement error is related to the outcomes, the effect could be much more complicated, including the introduction of severe biases.

Next: Appendix C: Biographical Sketches of Panel Members and Staff »
Coverage Measurement in the 2010 Census Get This Book
×
 Coverage Measurement in the 2010 Census
Buy Paperback | $56.00 Buy Ebook | $44.99
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

The census coverage measurement programs have historically addressed three primary objectives: (1) to inform users about the quality of the census counts; (2) to help identify sources of error to improve census taking, and (3) to provide alternative counts based on information from the coverage measurement program.

In planning the 1990 and 2000 censuses, the main objective was to produce alternative counts based on the measurement of net coverage error. For the 2010 census coverage measurement program, the Census Bureau will deemphasize that goal, and is instead planning to focus on the second goal of improving census processes.

This book, which details the findings of the National Research Council's Panel on Coverage Evaluation and Correlation Bias, strongly supports the Census Bureau's change in goal. However, the panel finds that the current plans for data collection, data analysis, and data products are still too oriented towards measurement of net coverage error to fully exploit this new focus. Although the Census Bureau has taken several important steps to revise data collection and analysis procedures and data products, this book recommends further steps to enhance the value of coverage measurement for the improvement of future census processes.

READ FREE ONLINE

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!