National Academies Press: OpenBook

Assessing the 2020 Census: Final Report (2023)

Chapter: Appendix D: Extensions of Census Coverage Evaluations

« Previous: Appendix C: Additional Detail and Reference on 2020 Census Operations
Suggested Citation:"Appendix D: Extensions of Census Coverage Evaluations." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×

– D –

Extensions of Census Coverage Evaluations

D.1 ESTIMATES OF CENSUS ERROR FROM THE POST-ENUMERATION SURVEY

The primary source of detailed information beyond Demographic Analysis on census enumeration errors is the Post-Enumeration Survey (PES). Briefly (see Chapter 4 for more detail and references), the PES is an independent survey of households in a sample of blocks (the P-sample), which is matched to census enumerations (referred to as the E-sample) in the same blocks. The U.S. Census Bureau, using capture-recapture or dual-system estimation, estimates net undercoverage in the census for various domains (e.g., age, sex, race, ethnicity, housing tenure). The Census Bureau also uses the PES to estimate the contributions of various operations to errors—finding, for example, that proxy enumerations generate more erroneous enumerations and whole-person imputations than other modes. A further analysis of census errors by component (e.g., duplications, omissions) would partition the population in the PES sample into two groups—those addresses, households, or individuals for which attempts at enumeration resulted in a census component enumeration error and those that did not. This analysis, in turn, could help assess the quality of each census process in enumerating the population and provide a basis for considering specific processes for improvement in the future. In this appendix, we outline such an approach.

In making use of the PES in this way, one would consider a large number of households or individuals with the binary attribute (proper or improper

Suggested Citation:"Appendix D: Extensions of Census Coverage Evaluations." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×

enumeration).1 Then, one would utilize various indicator variables identifying the type of living situation, the demography and geography of the household or individual, and the census processes used for that household or individual as predictors, to try to discriminate or predict proper or improper enumerations. The hope is that some variables shown to have predictive power will identify procedures that could be modified for 2030.

Considerable care is needed in interpreting these associations because selection effects challenge causal validity. For a basic example, consider the subset of data collected in an Internet First domain versus an Internet Choice domain, in which households in both instances elected to respond via paper questionnaire. In both cases the collection mode is “form”, but the quality may well vary between the two because one rejected the internet, while the other chose the form. How to extract valid action-worthy conclusions from this basic example and from more complicated examples needs to be addressed.

Keeping the foregoing in mind, estimating associations is very worthwhile. Statistical models relevant in this regard include discriminant analysis and classification trees. Analyses can be broadened from the individual to, for example, geographic aggregates with the relative frequency of enumeration error as the dependent variable. For such aggregate models, groups of census tracts offer a great advantage over other aggregates such as states and counties, because tracts are more homogeneous with respect to census processes and living situations. Furthermore, using such a unit of aggregation allows one to draw on information from the American Community Survey (ACS), which can offer additional predictors, such as the degree of access to broadband internet connections, which could help identify reasons for the frequency of Internet Self-Response.

National Research Council (2009:Ch. 5) provides useful suggestions for the proposed analysis, with a focus on answering this general question: “Which census processes are associated with a substantially increased rate of erroneous enumerations, duplications, omissions, or enumerations in the wrong location?” Specific questions follow, including types of housing units that were missed or duplicated more often than others and types of people who were missed, erroneously enumerated, duplicated, or counted in the wrong locations more often than others. Statistical modeling could help answer such questions, with the expectation that the findings would identify pathways for census improvement in 2030.

The modeling approach outlined here differs somewhat from that described by the National Research Council (2009). That report suggested conducting the analysis at the level of census domains. The proposed approach also includes analysis at the level of individuals or households.

___________________

1 The outcome could be expanded to a more refined metric.

Suggested Citation:"Appendix D: Extensions of Census Coverage Evaluations." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×

D.1.1 Candidate Attributes

Candidates for dependent variables include indicators of four main types of error:2

  • Whether a P-sample enumeration is a census omission—refinements would include indicator variables as to whether the omission is: (a) a within-household omission, for which others in the same housing unit were enumerated; (b) a whole-household omission in a building included in the Master Address File (MAF); or (c) an entirely missed housing unit.
  • Whether an E-sample enumeration is a census duplicate—refinements would include indicator variables that denote whether the situation is: (a) a whole-household duplicate; or (b) a duplication of an individual in a household in which others were counted only once. It could also be useful to distinguish between indicators of duplications of individuals in non-group quarters residences with people in various types of group quarters (GQ) residences (college/university student housing, nursing homes, correctional facilities, military barracks), or indicators of duplications of individuals in standard residences with someone in a seasonal residence, or indicators of duplications of nonmovers with a recent mover.
  • Whether an E-sample enumeration is erroneous—refinements would include whether the erroneous enumeration is: (a) a fictitious person; (b) someone who was born after Census Day; (c) someone who died before Census Day; or (d) a visitor.
  • Whether an E-sample enumeration is in the wrong location—refinements would include whether the wrong location is due to: (a) a geocoding error; (b) a recent move; or (c) a person counted at a Census Day residence that is not their usual residence.

Other dependent variables could include missing item responses, item responses in error, population-count-only households, and others.

Candidates for predictor variables are identified by the National Research Council (2009:123):

To understand which subset of individuals and housing units are more frequently subject to coverage errors of the four indicated types, and to understand what census processes contributed to those errors, it is necessary to focus on predictors that distinguish between individuals and housing units that are likely to have different interactions with census enumeration processes, as well as predictors that indicate the census processes

___________________

2 Some of these analyses might be improved through use of a non-dichotomous loss structure reflecting errors of differing gravity. The most natural example is counting people in the wrong location, which has the natural ordering of counts associated with locations that only become correct at some higher level of aggregation being more serious errors than those for locations that become correct at some lower level of aggregation.

Suggested Citation:"Appendix D: Extensions of Census Coverage Evaluations." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×

that were used to attempt to enumerate those individuals and housing units [emphasis added].

We caution that the set of census component processes that one might want to represent is much richer than the discussion here would suggest. To get some sense of the complexity, there will be dozens of different questionnaires used in the 2010 Census (to account for different forms of delivery, foreign languages, and other factors). Given the size of the postenumeration surveys (PES) used to date, the sample sizes of the coverage measurement program are unlikely to support analysis of the rarer component processes used in conjunction with the more detailed subsets of the population. Therefore, some compromise between full representation and parsimony in modeling must be struck.

We recognize that some of the covariates suggested for use in statistical modeling of the frequency of various components of coverage error may not be routinely available given the planned data collection in the P-sample blocks. For instance, we discuss below the possibility of determining whether someone in a household telephoned for questionnaire assistance and whether that was associated with a higher or lower rate of census coverage error of some type. Such information has not been collected in previous coverage measurement programs and would therefore not be available to modelers. . . . If this information will not be available, we hope that this discussion motivates a revision of plans to make more predictors available to support the models.

The types of predictors that would be useful for 2030 are indicators of: (1) the area of interest; (2) census processes; (3) the degree of enumeration error; and (4) context.

Indicators of the area of interest could include:

  • Type of enumeration area and other features that identify the local geography and the types of housing units in the area—candidate variables include the quality of the mailing list and whether the housing units have unique identifiers; the frequency of small multiunit residences, the rates of new construction and recent demolition; and the degree of mixing of residential and business establishments.
  • Housing unit variables—candidate variables include whether the household itself is newly constructed or is part of a small multiunit building and whether the housing unit is part of a GQ and, if so, what type.
  • People’s demographic characteristics—candidate variables include indicators for demographic groups that are historically subject to varying degrees of net census undercoverage (e.g., ages 18–22 are associated with increased chances of duplication and ages 0–4 with increased chances of omission).
  • Relationships of residents—candidate variables include whether the household includes one or more unrelated people or more than one nuclear family.
Suggested Citation:"Appendix D: Extensions of Census Coverage Evaluations." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×

Indicators of census processes could include:

  • Results of the MAF building process—four types of coverage error can stem from mistakes in MAF building: (a) if a nonresidential unit is included in the MAF; (b) if a housing unit is included twice (at different street addresses); (c) if a housing unit is omitted (which can happen in multiple ways since the building a residence is found in could be included but the residence omitted, or the entire building could be missed); and (d) if an address is geocoded in the wrong location. While none of these errors necessarily results in a census coverage error (since field work can remedy such mistakes) such errors are likely associated with increased frequency of coverage error. Therefore, indicator variables for these types of error are obvious variables for inclusion in such models. Further, it might be useful to use such indicator variables themselves as dependent variables in models that attempt to find out why such errors occurred.
  • Variables associated with self-response—indicator variables could include that a foreign language questionnaire was requested or some other contact was made with telephone questionnaire assistance, some degree of item nonresponse occurred on the returned questionnaire, some of the response needed to be keyed, or the response did not use the provided census identification number (Non-ID response).
  • Nonresponse Followup (NRFU)-associated variables—indicator variables could include the number of nonresponse attempts needed and whether the ultimate enumeration was through a proxy respondent or through whole-household imputation.
  • Variables associated with other modes of enumeration—indicator variables could include seasonal residence in an Update Leave or Update Enumerate area.

Indicators of degree of enumeration error could include:

  • Variables associated with enumerator training and turnover rates.
  • Variables from the quality-control checks of enumerators’ work.

Indicators of contextual factors could include:

  • Variables associated with neighborhood characteristics, such as the percentage of people in an area that own their own residences, the local mail return rate, the local crime rate, and the health of the local economy.

D.1.2 Implementation Issues

As noted in National Research Council (2009:126):

It would be premature at this point to suggest the precise form of the statistical models that could be used for this application, but since most of the dependent variables are dichotomous, logistic regression, discriminant

Suggested Citation:"Appendix D: Extensions of Census Coverage Evaluations." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×

analysis, and classification trees (including random forests) are obvious models to consider. Moreover, it is very likely that given the complexity of the underlying phenomena being modeled, focusing on models with predictors restricted to the poststratification variables from 2000 will be unsatisfactory. Instead, what is needed is a representation of the complexity of the situation, involving characteristics of households, housing units, census processes, enumerator’s performance, and interactions among these variables. In addition, the Census Bureau should also examine the possibility of using separate regression models for separate geographic domains.

D.1.3 PES Redesign

While the usual structure of the PES could be continued in 2030 for both the traditional net and gross coverage error estimation and the proposed discriminant analysis, other designs could provide greater utility in this regard. One way to make the PES sample useful for statistical modeling in addition to the traditional estimation would be to oversample areas that were likely to experience problems with census enumeration. For instance, three-quarters of the usual PES design could be structured as it was in the past, with one-quarter of the sample devoted to oversampling areas in which problems were anticipated. The hope would be to get more examples of improper enumerations for the logistic regression or classification tree in addition to the many instances of no enumeration problems. For example, additional sample could be added in immigrant communities, which have a higher rate of omissions; seasonal housing communities, which have a higher rate of duplications; and the like. This design would not add a substantial amount to the estimated error of the coverage estimates for large domains. However, given that the causal structure of census enumeration errors is not fully understood, deciding what types of areas to oversample is not completely obvious.

D.2 RESEARCH TO GENERATE TESTABLE IDEAS TO REDUCE THE DIFFERENTIAL NET UNDERCOUNT IN 2030

D.2.1 The Problem

Differential net undercount is a continuing problem in the U.S. decennial census. Groups that are historically undercounted include Black people, Hispanic people, American Indian or Alaska Native people, renters, young children, young men from various demographic groups, areas with low income and low levels of education, areas with large numbers of recent immigrants, and remote rural areas. Other groups are historically overcounted, including older people, homeowners, and White people (also Asian people in 2020). There are many reasons for undercount, and the Census Bureau has successfully reduced

Suggested Citation:"Appendix D: Extensions of Census Coverage Evaluations." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×

net undercount rates for the total population and population groups since 1950. However, the differential net undercount (e.g., between Black people and all others) remains stubbornly high at 3–4 percentage points. Moreover, the 2020 Census experienced higher net undercount rates for disadvantaged groups than were seen in 2010, and a wider differential. These differential undercounts are consequential for critical uses of census data that involve allocating resources from a fixed pie, including redistricting and federal fund allocation, and they affect other important uses as well.

Identifying strategies that could reduce differential net undercount has suffered from a lack of granular data for analysis. Demographic Analysis does not provide subnational estimates or estimates for race groups other than Black and All Other Races. The PES provides race and ethnicity detail at the national level and total state estimates, but not substate estimates. Administrative records vary in their completeness of population coverage as well as race and ethnicity detail (their use for race and ethnicity imputation, as well as other characteristics, such as age, should be studied).

D.2.2 Proposal for a Data Linkage Study Focused on Undercounts

The proposed study would link ACS individual records for 2019, 2020, and 2021 to the 2020 Census and to as many kinds of administrative records as possible and not only those used nationwide in the 2020 Census. The goal would be to tease out patterns and characteristics that suggest strategies that could improve the census count for some groups in some areas.3 The study would need to be carried out by Census Bureau staff or, perhaps better, by a consortium of researchers working with census staff and people with local and group knowledge at a Federal Statistical Research Data Center. It would be ideal if the study could be completed within 18–24 months and presented to the Census Bureau and stakeholder leadership with recommendations for actionable ideas to pursue (such as obtaining authorization for a particular type of administrative record or illustrating the extent of positive effects on self-response of universal affordable broadband).

The ACS has response and coverage problems itself; however, the hypothesis is that, just as in the PES, matching the ACS with the census and administrative records would find some addresses missed in the ACS that are found in the census and vice versa. Also, some households would match but be missing one or more household members in the ACS or the census. Similarly, various administrative records would include or omit people found in the ACS

___________________

3 To the argument that there should not be separate methods for specific areas or groups in the census, a response is that the census has always had a range of methods—for example, Update Enumerate in remote areas. Moreover, the goal is equity in the count, which given long-lasting differential net undercounts, necessarily means additional and/or different methods directed in a targeted way to try to reduce the differential.

Suggested Citation:"Appendix D: Extensions of Census Coverage Evaluations." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×

Table D.1 Hypothetical Example of How a Specific 5-Person Household Would Appear in Various Data Sources

Data Source Adult Citizen Child
Legal Resident Undocumented Age 12 Age 9 Age 3
2020 Census Yes No Yes Yes No
2019 ACS Yes No Yes No No
Birth Records No No Yes Yes Yes

Earnings Records (SSA, LEHD)

Yes Yes No No No
SNAP Records Yes No Yes Yes Yes

SLDS (School) Records

No No Yes Yes Yes

NOTES: ACS, American Community Survey; SSA, Social Security Administration; LEHD, Longitudinal Employer-Household Dynamics; SLDS, State Longitudinal Data System; SNAP, Supplemental Nutrition Assistance Program.

SOURCE: Panel generated.

and/or census and vice versa. The hope would be to obtain more complete representation of households with information about which sources and combinations of sources are better for including households in the census. The extensive demographic detail available in the ACS for individual respondents and for neighborhoods would enable exploration of a number of scenarios and possibilities.

As an example, consider a household of a legal resident adult, an undocumented adult, and three citizen children. The household might show up as depicted in Table D.1 in various sources, suggesting ways to count people who would otherwise be missing in the census.

D.2.3 Feasibility

Is such a study feasible in terms of numbers of available cases? It appears so. Each year, the ACS has 3.5 million addresses in sample that typically yields about 2 million completed interviews. For 2019, 2020, and 2021, the ACS had 9.9 million addresses in sample and 5.4 million completed interviews. The average tract has about 1,650 addresses, of which about 120 addresses would be in sample. For analysis purposes, a possible plan could be to identify the lowest one-quarter of census tracts in Self-Response—about 21,000 tracts—in states for which the Census Bureau has Supplemental Nutrition Assistance Program records (setting aside tracts in remote areas, on American Indian or Alaska Native reservations, and the like for separate attention). To generate sufficient cases for analysis, a plan could be to combine the 21,000 tracts into groups of, say, five, which would generate 4,200 analysis units with 600 addresses in each

Suggested Citation:"Appendix D: Extensions of Census Coverage Evaluations." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×

unit. The groupings would include tracts with similar characteristics, such as race/ethnicity, broadband availability, etc. Records would be matched (in a protected setting) on an individual basis. There would be error in matching, as there is in the PES, but matching at a microlevel on the scale proposed (about 2.5 million addresses) and with as many record sources as possible could hopefully identify a number of avenues for testing in the lead-up to the 2030 Census. While there is no guarantee of success, without such a study it seems unlikely that new evidence-based ideas would emerge that would prove fruitful in testing.

Suggested Citation:"Appendix D: Extensions of Census Coverage Evaluations." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×

This page intentionally left blank.

Suggested Citation:"Appendix D: Extensions of Census Coverage Evaluations." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 459
Suggested Citation:"Appendix D: Extensions of Census Coverage Evaluations." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 460
Suggested Citation:"Appendix D: Extensions of Census Coverage Evaluations." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 461
Suggested Citation:"Appendix D: Extensions of Census Coverage Evaluations." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 462
Suggested Citation:"Appendix D: Extensions of Census Coverage Evaluations." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 463
Suggested Citation:"Appendix D: Extensions of Census Coverage Evaluations." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 464
Suggested Citation:"Appendix D: Extensions of Census Coverage Evaluations." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 465
Suggested Citation:"Appendix D: Extensions of Census Coverage Evaluations." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 466
Suggested Citation:"Appendix D: Extensions of Census Coverage Evaluations." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 467
Suggested Citation:"Appendix D: Extensions of Census Coverage Evaluations." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 468
Next: Appendix E: 2020 Census Group Quarters Definitions and Type Code List »
Assessing the 2020 Census: Final Report Get This Book
×
 Assessing the 2020 Census: Final Report
Buy Paperback | $60.00 Buy Ebook | $48.99
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

Since 1790, the U.S. census has been a recurring, essential civic ceremony in which everyone counts; it reaffirms a commitment to equality among all, as political representation is explicitly tied to population counts. Assessing the 2020 Census looks at the quality of the 2020 Census and its constituent operations, drawing appropriate comparisons with prior censuses. The report acknowledges the extraordinary challenges the Census Bureau faced in conducting the census and provides guidance as it plans for the 2030 Census. In addition, the report encourages research and development as the goals and designs for the 2030 Census are developed, urging the Census Bureau to establish a true partnership with census data users and government partners at the state, local, tribal, and federal levels.

READ FREE ONLINE

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    Switch between the Original Pages, where you can read the report as it appeared in print, and Text Pages for the web version, where you can highlight and search the text.

    « Back Next »
  6. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  7. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  8. ×

    View our suggested citation for this chapter.

    « Back Next »
  9. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!