National Academies Press: OpenBook

Assessing the 2020 Census: Final Report (2023)

Chapter: 8 Use of Administrative Records for Enumeration in the 2020 Census

« Previous: 7 Nonresponse Followup
Suggested Citation:"8 Use of Administrative Records for Enumeration in the 2020 Census." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×

– 8 –

Use of Administrative Records for Enumeration in the 2020 Census

Administrative records data are typically collected as a necessary step in the administration of governmental programs. Key U.S. examples include personal income tax data and Medicare data. The U.S. Census Bureau has used administrative records for decades in its census and survey programs. It produces annual population estimates by updating a prior census with birth, death, and other records (see Chapter 10). In turn, it uses the population estimates to reweight surveys to address differential undercoverage by age, sex, race, and ethnicity. Demographic Analysis, one of the major methods of census coverage evaluation, uses the same kinds of records as the population estimates to produce independent estimates of the population (see Chapter 4). The Census Bureau uses tax records for small businesses to eliminate the need to survey these businesses for the economic census,1 and it has long used records from the U.S. Postal Service (USPS) to update its Master Address File (MAF) (see Chapter 5).

With rising costs of field data collection, statistical agencies in the United States and abroad have sought new and expanded ways to use administrative records for censuses and surveys. These ideas include using administrative records: (1) to help in the development of a survey or census frame; (2) for editing and imputation of missing survey and census responses; (3) for survey or census evaluation; (4) to improve the operational efficiency of survey or census

___________________

1 See the technical documentation for the economic census at https://www.census.gov/programs-surveys/economic-census/year/2022/technical-documentation.html.

Suggested Citation:"8 Use of Administrative Records for Enumeration in the 2020 Census." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×

data collection; or (5) as a replacement for survey or census data collection for all or a subset of the units (see relevant discussions in National Academies of Sciences, Engineering, and Medicine, 2017, and National Research Council, 1994).

The idea of using information from administrative records to enumerate the U.S. population may have first been raised over 40 years ago by Alvey and Scheuren (1982). To date, the notion of conducting a census of the United States by using only the data collected in administrative records, as is done in several countries, has not garnered support. Reasons include records’ unequal coverage of all population groups and their frequent lack of such characteristics as race and ethnicity. Nonetheless, because of the cost advantages and, for some purposes, quality advantages (e.g., for estimating various types of income in surveys), the Census Bureau has invested considerable effort over the past two decades to expand, evaluate, and conduct research on applications of available administrative records. Research leading up to the 2020 Census focused on using administrative records information to replace some of the field enumeration conducted during Nonresponse Followup (NRFU). Specifically, the idea was to use information available from administrative records as a form of census enumeration in a limited supplementary role, in cases for which an initial contact did not elicit a response and the Census Bureau felt it had sufficiently high-quality administrative records for the household that could be used. The Census Bureau also planned research to compare a compilation of administrative records with actual 2020 Census results.

In this chapter, we focus on the first-ever use of administrative records data in a U.S. census for enumeration of selected households that did not self-respond and for which one enumerator visit in NRFU also failed to produce a return. The hope was that NRFU data-collection costs would be reduced, while the information quality would be equal or better using administrative records instead of further NRFU field work. As it turned out, 3.8% of occupied units were enumerated with administrative records, either as planned after the first attempt at a NRFU interview or in NRFU closeout (see Chapter 7 for discussion of NRFU). In addition, 14.4% of vacant units were enumerated with administrative records, as were 3.4% of addresses that turned out to be nonresidential or nonexistent. In total, 4.6% of all addresses were resolved using administrative records (calculated from U.S. Census Bureau, 2021a). At the end of the chapter, we briefly consider the possible role of administrative records enumerations in the 2030 Census. For the role that administrative records played in enumeration of various group quarters populations in 2020, see Chapter 9.

Suggested Citation:"8 Use of Administrative Records for Enumeration in the 2020 Census." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×

8.1 ADMINISTRATIVE RECORDS ENUMERATIONS IN THE 2020 CENSUS

Using administrative records for enumeration after a failed NRFU visit may seem straightforward in concept, but there are a number of complexities that make implementation a challenge. First, the information contained in administrative records data can pertain to occupancy of a residence by different groups of people at different time periods close to Census Day. Second, the residence rules of the decennial census may differ from residence rules employed by various governmental programs whose administrative records might be considered for enumeration. Third, the Census Bureau’s use of a composite of national administrative records databases requires reconciling differences in information for the same address/household across individual records sources. Fourth, the definition of “household members” can vary depending on the governmental program. For instance, the residents of a household on personal income tax forms, according to the Internal Revenue Service (IRS), can be a subset of the residents of a household as defined by the Census Bureau, since the IRS does not include nondependents who are not members of the nuclear family. Also, not all children qualify as dependents, and children and other relatives do not have to be members of the taxpayer’s household all year to be claimed as dependents.2 Finally, the quality of some administrative records can be poor, especially with respect to the characteristics information that the decennial census collects. Poor quality involves not only missing information on race and ethnicity in many instances, but also some amount of missing or erroneous entries for other characteristics.

Before discussing the statistical models used for administrative records enumeration in select circumstances, it is important to point out the effects of the COVID-19 pandemic on the operation (see Appendix C.7.5). COVID-19 delayed the collection and delivery of some of the administrative records files that various federal agencies had agreed to provide. In particular, it delayed receipt of IRS data because of postponed deadlines for the filing of personal income tax forms. To err on the conservative side, the Census Bureau also scaled back its plans to use administrative records to determine that addresses were vacant or nonresidential. On the other hand, the pandemic motivated the Census Bureau to make an unplanned use of administrative

___________________

2 From Mulry et al. (2023b): “The IRS has rules about whom a taxpayer can claim as a dependent on a 1040 tax return. There are two basic sets of rules for when a taxpayer may claim a person as a dependent, one set applies to children and the other applies to other relatives. There are situations when a child does not qualify as a dependent child but does qualify as a dependent relative. Usually, a dependent must reside with the household of the taxpayer all year, but there are situations where a close relative does not have to reside with the household. However, the taxpayer does have to provide support for the dependent for at least half of the year. . . . To meet the Qualifying Child Test, the child must be . . . either younger than 19 years old or be a student younger than 24 years old as of the end of the calendar year.”

Suggested Citation:"8 Use of Administrative Records for Enumeration in the 2020 Census." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×

records for the closeout phase of enumeration, allowing enumeration based on one administrative records source rather than multiple sources.

8.2 MODELS DEVELOPED FOR THIS PURPOSE

Deciding to use administrative records information to enumerate an address in the NRFU workload involved three decision points, each supported by probabilistic models. The first decision was to estimate the probabilities that the address was not a housing unit (either because it was nonresidential or nonexistent), that it was a vacant housing unit, and that it was an occupied housing unit. The second decision was to estimate the probability that, if occupied, the number of residents was equal to various possible counts. The third decision was to judge whether the quality of the information used for the first two decisions was sufficiently high to rely on it for census enumeration, or instead to return the enumeration of the housing unit to NRFU.

The Census Bureau used information from multiple administrative records databases as input to these models, each set of records having advantages over the others, especially for specific subpopulations. The primary databases used in 2020 for the decision models were: IRS 1040 forms filed for 2019, IRS 1099 forms filed for 2019, Medicare records, the Indian Health Service Patient Database, and the Household Composition Key File at the U.S. Census Bureau.3 In addition, a number of databases could be used solely for corroboration of the information found in the above databases: USPS Change of Address File, 2000 and 2010 Census Unedited File, Medicare files, U.S. Department of Housing and Urban Development (HUD) Public and Indian Housing Information Center, HUD Tenant Rental Assistance files, HUD Computerized Home Underwriting Management System, HUD Public and Indian Rental Certification System, Selective Service Registration file, Targus Wireless, Targus Federal Consumer, Targus Address file, Veteran Service Group of Illinois Name and Address Resource file, Veteran Service Group of Illinois Tracker Plus, and Data, Analytics, and Research Partners.

Linking the information from these disparate sources was a key step. To enable such linkage, the Census Bureau assigned an anonymized identifier to each person record in each administrative records system, using the Census Bureau’s Person Identification Validation System (PVS). The PVS determines a protected identification key (PIK) via a probabilistic matching algorithm

___________________

3 The Household Composition Key File is a database created by Census Bureau staff using applications for Social Security numbers (SSNs) from the Social Security Numerical Identification (Numident) File, which contains names and SSNs that are used in assigning protected identification keys (PIKs) to records, enabling linkage of child-to-parent records. See the Spring 2017 Administrative Records Modeling Update for the Census Scientific Advisory Committee for further information, at https://www2.census.gov/cac/sac/meetings/2017-03/admin-records-modeling.pdf.

Suggested Citation:"8 Use of Administrative Records for Enumeration in the 2020 Census." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×

between the administrative record source data and a series of reference files. The algorithm used for determining administrative records enumerations assumed that the PIK assignment was correct, though it is important to acknowledge the possibility of linkage error associated with the PVS methodology. (See Layne et al., 2014, and Wagner and Layne, 2014, for details on PVS methodology.) The Census Bureau then used these linkable administrative records files to create household rosters for all housing units in the decennial census universe. These rosters are the collection of PIKs found in any of the administrative records files at a given address. Characteristics of the housing unit and of its occupants are provided by sources such as the Social Security Numerical Identification (Numident) File and the most recent decennial census. It is important to note that the algorithm could not proceed unless the housing unit existed on the MAF and therefore had a MAFID. Children were linked to rosters for each of the addresses where either the mother or father appeared.

In thinking of these models, it is important to stress that they are premised on the existence of undercoverage and overcoverage between census returns and administrative records data, and on understanding the properties of those groups. That is, there should be substantial overlap between households/persons in the census and households/persons in the composite of administrative records sources formed through linkage. But there should be important differences as well, with some households/persons appearing in administrative records but not in the census, and vice versa. Finally, it is important to note that the models draw predictor variables from a variety of sources as well: the USPS Delivery Sequence File, the American Community Survey (ACS), the MAF, census operational information, and USPS Undeliverable as Addressed (UAA) reason codes.

8.2.1 Housing Unit Status Model

To discriminate between nonexistent or nonresidential (deleted), vacant, and occupied housing units, the Census Bureau developed a multinominal logit model using housing unit-level administrative records data as regressors to predict these three possibilities for housing unit status. The indicator variables denoting housing unit status are:

y h unoc = { 1, if occupied 2, if vacant 3, if deleted

The “unocc” superscript denotes the model used for administrative records-based determination of housing status. The subscript indexes the individual housing units. The multinominal logit model is used to estimate the probability of each status type for each housing unit. The predicted probabilities denoted

Suggested Citation:"8 Use of Administrative Records for Enumeration in the 2020 Census." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×

by:

p h , occ ^ unocc = P ( y h unocc = 1 ) , p h , vac ^ unocc = P ( y h unocc = 2 ) , p h , del ^ unocc = P ( y h unocc = 3 )

are passed to the distance function, described below, to determine which cases are identified as being of each type (or are left undecided). The covariates for this model include the following ACS variables (all as percentages at the block-group level):

  • Persons in block group between 25 and 44;
  • Persons in block group greater than 64;
  • Persons in block group identifying as Black;
  • Persons in block group identifying as Hispanic;
  • Occupied housing units in block group with at least two related household members;
  • Persons over age 4 in block group speaking language other than English at home;
  • Housing units in block group considered as mobile homes;
  • Housing units in block group where householder/spouse are members of household;
  • Occupied housing units in block group that are not owner occupied;
  • Housing units in block group occupied at time of interview; and
  • Persons in block group living below the poverty level.

The covariates for this model also include the following individual housing unit characteristics:

  • Number of neighbors in NRFU;
  • USPS UAA reason; and
  • Housing unit type (e.g., multifamily).

Housing unit variables collected from administrative records for inclusion in the model include flags indicating whether more than one person in the household falls into any of the following categories: Black; Hispanic; missing ethnicity; and less than 2, 2–10, 10–17, 18–24, 25–44, and over 64 in age. Finally, there are indicator variables for which of the various administrative records information sources are available.

8.2.2 Models to Estimate the Number of Occupants

Once the probability of housing unit status was modeled, the Census Bureau also needed to model the probability that various numbers of occupants

Suggested Citation:"8 Use of Administrative Records for Enumeration in the 2020 Census." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×

were resident in the occupied housing units. The Census Bureau defined seven valid household compositions for estimation purposes, six defined as the combination of {1 adult, 2 adults, or 3 adults} with {0 children or at least 1 child} and the seventh being {at least 4 adults}. With this base, the Census Bureau developed two competing models to estimate the number of occupants, referring to them as the person-place model and the household composition model.

8.2.3 Person-Place Model

The person-place model estimated the probability that a person found in the union of the administrative records files at an address would have been enumerated at that same address in the 2020 Census had the NRFU operation been completed using field work. The Census Bureau fit a logistic regression at the person level using 2010 Census data to predict the following outcome:

y i h occ1 = { 1 , if person  i  is found in administrative records and 2020 Census at same address 0 , otherwise

where the occ1 superscript denotes that the housing unit status model predicted that the housing unit was occupied, and that, given this, the person-place model was used to provide the probability that a given individual was resident, with denoting the address in question and i denoting the administrative records person. For all person-place pairs in the union of the administrative records files, this model provides a predicted probability that the 2020 Census and the administrative records roster data place the person at the same address, and is denoted:

p ^ i h occ1 = P ( y i h occ1 = 1 )

The 2020 Census person records are assigned PIKs with the methodology summarized above, which are used to identify duplicates across administrative records sources. The predictor variables used to estimate the above probability include: indicators of the presence of the administrative records person in each source at the address; indicators of the presence of the administrative records person at a different address within the same administrative records source; address-level administrative records information (e.g., the number of administrative records people associated with an address); field operations information (e.g., USPS mailing information, number of NRFU neighbors); and information from other survey sources (e.g., characteristics of the local geography, such as poverty rate, renter rate, and vacancy rate).

While the person-place model is fit at the person level, the decision on the housing unit is made at the housing unit level. Therefore, the person-level

Suggested Citation:"8 Use of Administrative Records for Enumeration in the 2020 Census." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×

predicted probabilities must be combined to provide the probability that the household is occupied with the same number of people as the census would have counted, as follows:

p ^ h occ1 = min ( p ^ 1 h occ1 , ... p ^ n h h occ1 ) .

This use of the minimum estimated probability for any number of occupants between 1 and n (the maximum number of persons the administrative records have “shown” to be resident at the address in question) is acknowledged by the Census Bureau as a conservative estimated probability. The Census Bureau justified this conservative approach given that this type of enumeration with administrative records had not been attempted previously.

8.2.4 Household Composition Model

The household composition model is used to estimate the probability that the address in question has the same household composition (number of adults and children) determined through searches through the administrative records databases, as would have been determined by NRFU fieldwork (had it been conducted for that address). The Census Bureau fit a multinomial logit model on 2010 housing unit-level administrative records data and 2010 Census data and used those estimated coefficients to predict the outcome from the 2020 Census. The dependent variable had the following form:

y h occ2 = { 0, if unit  h  is vacant in the 2020 Census 1, if unit  h  has 1 adult and 0 children in the 2020 Census 2, if unit  h  has 1 adult and at least 1 child in the 2020 Census 3, if unit  h  has 2 adults and 0 children in the 2020 Census 4, if unit  h  has 2 adults and at least 1 child in the 2020 Census 5, if unit  h  has 3 adults and 0 children in the 2020 Census 6, if unit  h  has 3 adults and at least 1 child in the 2020 Census 7, if unit  h  has at least 4 adults in the 2020 Census

The occ2 superscript denotes that the household was predicted to be occupied by the housing unit status model and that the household composition model was being used. The h subscript denotes the individual housing unit. For every address in the NRFU universe, this model provides the predicted probability of each of the eight household composition types: p ^ h . k occ2 = p ( y h occ2 = k ) , for k = 0, 1, . . . 7 (see above). The covariates for this multinomial logit model include those from housing unit-level administrative records information (i.e., the count of all administrative person records associated with the address), person-level administrative records information (e.g., indicators of whether any

Suggested Citation:"8 Use of Administrative Records for Enumeration in the 2020 Census." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×

administrative records person attached to the address in question was found at a different address within the same administrative records source), and housing unit-level information from other survey sources (e.g., flags indicating that young children were associated with the household). The Census Bureau then conditioned on the household composition that was determined by searches through the relevant administrative records databases to provide a k*, the best administrative records composition. Then, p ^ h occ2 = p ^ h , k * occ2 is the probability of the household composition sent forward from the household composition model. For example, p ^ h occ2 = p ^ h , 3 occ2 would be the estimated probability for a housing unit with an administrative records household composition type of two adults and no children. These predicted probabilities are input into the distance function (discussed below) to determine which cases are identified as “AR Occupied.”

8.3 JOINING THE THREE MODELS—DISTANCE FUNCTIONS

The initial idea considered by the Census Bureau was to combine information from the housing unit status, person-place, and household composition models using linear programming (LP) techniques to determine the occupancy status of the housing unit. The possible status codes for the housing units were nonresident or nonexistent, vacant, or occupied. For those housing units that were occupied, the number of people that were residents was represented in the LP models. For example, to identify administrative records-estimated vacant units, this method set linear constraints that the average vacant predicted probability must exceed a prespecified threshold. In addition, the sum of the predicted probabilities for occupied units could not exceed a certain percentage of the estimate of occupied housing units from the ACS. For identifying administrative records-based estimated number of residents for occupied units, the constraints were that the average person-place predicted probability must exceed a prespecified threshold, and the average household composition-predicted probability must exceed a different prespecified threshold. Without getting into specifics, this approach ran into problems setting the various thresholds due to: (1) the fact that the averaging performed allowed the estimated probabilities for other households to contribute to the identification of a household as being vacant; and (2) the geography over which the averaging was done mattered a great deal.

The above approach was replaced by an alternative approach using a distance function, which overcame or reduced these concerns. Each housing unit was evaluated on its own, and the thresholds used were more transparent and interpretable. This distance function method was also easier to implement in real time. Starting with the predicted probabilities for vacant and occupied, p ^ h , vac unocc and p ^ h , occ unocc , as estimated by the housing unit status model, these two probabilities can be thought of as points on a two-dimensional plane of

Suggested Citation:"8 Use of Administrative Records for Enumeration in the 2020 Census." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×

estimated probabilities. The most likely vacant cases would be those that have the shortest distance to the point at which the estimated occupied probability equals 0 and the estimated vacant probability equals 1. The vacant Euclidean distance from this “perfect” point of estimated probabilities is then d h vac , where for each unit h, d h vac = ( 1 p ^ h , vac unocc ) 2 + ( 1 p ^ h , occ unocc ) 2 .

With regard to enumerating occupied housing units using administrative records, the Census Bureau started with the predicted probabilities of number occupied, p ^ h occ1 and p ^ h occ2 from the two models. These can be thought of as two measures of quality (probability of count match and probability of household composition match, respectively), such that the housing units with higher-quality administrative records are associated with higher estimated probabilities. Even though the predictions from these two models are clearly correlated, research showed that important additional information was gained by using the estimated probabilities from both models. Here, the ideal point in the two-dimensional plane of estimated probabilities is (1, 1), indicating an estimated probability of 1 that the administrative records count will equal the decennial census count, and likewise an estimated probability of 1 that the administrative records household composition will equal that of the decennial census composition. The corresponding distance function here is:

d h o c c = ( 1 p ^ h occ1 ) 2 + ( 1 p ^ h occ2 ) 2 .

When implemented, two cut-off thresholds are selected for d h vac and d h occ to determine housing unit status and number of residents. When those thresholds are satisfied, they become the enumerations. When those thresholds are not satisfied, the unit is returned to NRFU because the administrative records information has been judged as being of insufficient quality.

There are three remaining issues regarding implementation. The first issue is the mechanism for providing characteristics information for the people enumerated in this manner. Administrative records information is generally available for sex and age. Information for race and ethnicity is somewhat less available, though for many cases, one can collect information from the Numident file or from the previous census.

The second issue is how to estimate the coefficients of these three models. One needs information on how households were enumerated by the decennial census data collection in the field and the relevant information for each household from the existing administrative records databases. In other words, a training set is needed. In 2020, data were used from the 2010 Census, with the assumption that the estimated coefficients remained relevant over the decade. If that assumption seemed too strong, one could use data from a large census test that included a NRFU component.

Suggested Citation:"8 Use of Administrative Records for Enumeration in the 2020 Census." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×

Third, one specific problem that will arise in using this approach in 2030 is that no field data collection was conducted for those housing units that were enumerated using administrative records and therefore were not returned to NRFU in 2020. In other words, there will not be a training set relevant to the entire population. For that reason, it might be beneficial to follow through with NRFU for at least a sample, or maybe for all cases in a large census test to find out whether things have changed since 2020.

8.4 EVALUATING THE APPROACH USED IN 2020

In the 2020 Census, the Census Bureau enumerated about 4.6% of the verified MAF addresses using administrative records in this first-of-its-kind application. It is useful to point out that a number of groups, including previous National Academies of Sciences, Engineering, and Medicine Committee on National Statistics panels, had recommended that the Census Bureau research the use of administrative records for this purpose for at least two censuses, and the fact that this method was used to enumerate a nontrivial fraction of the country should be regarded as a success.

However, this operation was conducted in conjunction with the promise that, in doing so, a considerable amount of public funds would be saved (possibly to be used in other aspects of the taking of the decennial census) and that the quality of the resulting counts would be at least as good as if NRFU had been allowed to proceed for its complete workload. Given that 4.6% of all the enumerations represents about one-seventh of the NRFU universe, this suggests that hundreds of millions of dollars may have been saved. However, it should be cautioned that the savings from this use of administrative records enumeration may have been less than anticipated because the temporary field staff levels were affected by shifting schedules and court orders. To our knowledge, no data on actual or potential cost savings are available.

Given the Census Bureau’s general perception that the quality of the resulting enumerations was relatively high, an expanded version of this application will likely be used again in the 2030 Census. Therefore, it is important to examine what is currently known about the quality of these enumerations and how more can be learned prior to 2030. At a minimum, the Census Bureau needs to evaluate the general quality of the resulting counts and the success of the three models used for this purpose (as well as the distance functions)—both to understand their strengths and weaknesses and to examine the possibility of replacing them with superior alternatives, if any.

The Census Bureau has conducted some evaluations of this recent use of administrative records, both in census tests before the 2020 Census and in the 2020 Census itself. Focusing on the procedure used in 2020, Mulry et al. (2023a,b) are the primary reports available that assess the 2020 administrative

Suggested Citation:"8 Use of Administrative Records for Enumeration in the 2020 Census." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×

records enumeration. The main analysis tool contributing to these evaluation(s) of the use of administrative records for enumeration of some NRFU households in 2020 was a comparison of rosters for self-responding households with the corresponding rosters for the housing unit from administrative records, and similarly, rosters for NRFU respondents (excluding proxy respondents and units requiring whole-household imputations) compared again to the corresponding administrative records rosters. This was done with the aim of developing a better understanding of the degree of agreement between rosters developed from administrative records and rosters collected in the census.

Given this approach, these evaluations were not intended as a direct assessment of the use of administrative records in 2020 for enumeration of some NRFU households. This is evident in a variety of ways. First, the population for which the Census Bureau attempted an administrative records enumeration during the 2020 Census consisted of those households that did not respond to the first attempt at a NRFU enumeration. Therefore, while both the evaluation study and what transpired during the 2020 Census examined those households personally interviewed as part of NRFU for the second through sixth interview attempts, the evaluation also included both self-respondents and those who were successfully interviewed for the first NRFU interview. The 2020 Census implementation of this type of enumeration also included cases that would have been proxy cases and whole household imputations had they not been removed from NRFU. Second, these evaluations do not address other implementation issues, such as the use of administrative records for enumerating NRFU closeout cases through use of specific thresholds. In that application, the thresholds were lowered to relax the various quality standards in place for noncloseout NRFU cases, and corroborating administrative records were not required. The evaluation reports calculated that, of those enumerated using administrative records, 92% of households were of the typical situation—only one failed NRFU visit, with corroborative administrative records available—rather than close-out cases. In addition to checking competing rosters for their degree of agreement, these evaluations examined the impact of both the mode of response (Self-Response, NRFU) and the timing of the enumeration on the degree of agreement, as well as the impact of the demographic makeup of the household.

Summarizing the findings in Mulry et al. (2023a,b), nationally, of the 152 million occupied housing units, 66 million (43%) had a high-quality administrative records-based roster for use in enumeration. Of the NRFU population’s 16.5 million housing units (which includes those who were interviewed on a first visit and therefore were excluded from consideration for this type of enumeration), only 3.8 million (23%) had high-quality administrative records-based rosters for use in enumeration. Therefore, one might be concerned that, unless some substantial changes are made in the

Suggested Citation:"8 Use of Administrative Records for Enumeration in the 2020 Census." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×

quality of administrative records by 2029, this method could be difficult to implement for a majority of the NRFU population in 2030.

For Self-Response households having both a census roster and a high-quality administrative records roster, about 79.5% had population counts that were in agreement; the difference between census and administrative records counts for these Self-Response households was +1 for 8.6% and −1 for 8.1%. Hence, the percentage of Self-Response households with rosters in which the counts were at worst one person apart was 96%. However, for NRFU cases—more relevant to the application of enumeration in the 2020 Census—the percentage of exact agreement dropped to 58.7%. An additional 15.8% and 12.7% of NRFU households differed by +1 and −1 respectively, so the percentage of NRFU households with rosters differing in count by ±1 was 87%. The distributions for both Self-Response and for NRFU cases were relatively symmetric.

Other findings reported by Mulry et al. (2023a,b) include: (1) the earlier the data were acquired, the higher quality and therefore the greater the agreement between census and administrative records; (2) closeout NRFU cases were of lower quality than noncloseout NRFU cases; (3) the percentage of cases in which the census roster had a higher count than the administrative records roster varied as a function of the race or ethnicity of the householder; (4) the extra census person not found on the administrative records roster for NRFU households was spread among householders, spouses, relatives, children, unmarried partners, and nonrelatives; (5) there was a clear spike at age 17 for the age of the additional census person; (6) for one-visit NRFU households with multiple-source administrative records rosters in which the administrative records roster had one additional person in comparison with the census roster, the extra person was resident at the same address 98% of the time, though over 75% of the additional people were also found at other addresses, suggesting either a census undercount, movers, or people with attachments to multiple housing units; and (7) when the difference between the census count and the administrative records roster is 1, the census roster is larger than the administrative records roster between 55–95% of the time, depending on the race/ethnicity of the head of household, the race/ethnicity of each resident, whether the spouse, unmarried person or children are present, and the ages of the residents.4

When examining the nature of the additional person on either the administrative records roster or the census roster, the impact of the COVID-19 pandemic on increasing the percent of movers must be kept in mind. The pandemic also increased the number of people leaving colleges and universities earlier in the year, increased the number of evictions because of job losses

___________________

4 Analyses of who the extra person was when administrative records rosters have an additional resident in comparison to the census roster are understandably much more limited given the lack of information on administrative records forms.

Suggested Citation:"8 Use of Administrative Records for Enumeration in the 2020 Census." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×

and subsequent inability to make rent payments, and increased the number of people leaving large cities for less-populated areas, to avoid exposure to the virus.

8.5 WHAT THE EVALUATION DOES NOT ANSWER

Given the stated aims of this particular use of administrative records data in the 2020 Census—to reduce the costs of NRFU field work while not detracting from the quality of the counts generated—it is unfortunate that data are not available on cost reductions and that the nature of any resulting changes to the quality of the census counts remains unclear. It is unknown how the quality of the administrative records enumerations would have compared with the quality of the enumerations if those cases had been retained in NRFU. As Fay (2021) states: “the available evidence from the published record does not support a definitive answer to the question of whether use of AR [administrative records] enumerations reduced or improved the quality of the 2020 Census.”

Limited information on the quality of administrative records enumerations is available from the 2020 Post-Enumeration Survey (PES) and from documentation of item nonresponse rates. This information is suggestive but just scratches the surface of what needs to be learned.

Based on the PES, in terms of correctness of enumeration, administrative records enumerations in 2020 were on a par with Self-Response and also with NRFU enumerations obtained from the household heads. The 2020 PES estimated that 97% of self-responses with an ID were correct, compared with 95% for administrative records enumerations, and 94% for NRFU enumerations with the household head. In contrast, only 87% of NRFU proxy enumerations and only 77% of NRFU enumerations with a household member other than the head were correct (see Table 3.4). Moreover, by definition, administrative records enumerations did not produce whole-household imputations, which occurred for every other enumeration type, even Self-Response.

In terms of item nonresponse, the picture was not as favorable. While only 1% of self-responses were missing age or date of birth and only 2–3% were missing race and ethnicity, 3% of administrative records enumerations were missing age or date of birth, 18% were missing race, and 28% were missing ethnicity. By comparison, NRFU enumerations with a household member had a higher percentage than administrative records enumerations missing age or date of birth (16%) but substantially lower percentages missing race (9%) and ethnicity (6%). Proxy NRFU enumerations were the worst—61% were missing age or date of birth, 41% were missing race, and 38% were missing ethnicity.5

___________________

5 From 2020 Census Operational Quality Metrics Release 3, Table 2 (U.S. Census Bureau, 2021c). Item nonresponse rates refer to missing responses; item imputation rates are usually higher because they include cases in which responses had to be blanked and imputed. Denominators

Suggested Citation:"8 Use of Administrative Records for Enumeration in the 2020 Census." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×

The evaluation work done to date (Mulry et al., 2023a,b), while helpful, intriguing, and possibly the only type of assessment one could do regarding this component of the 2020 Census without making fairly heroic assumptions, does not greatly advance understanding of the causal nature of the differences between these two types of enumeration. Looking, for example, at administrative records and census rosters that differed by one (or more) people, it is unknown how many differing rosters were due to movers, students away at college, or people that had attachments to multiple households. It is also unknown how many differing rosters were due to the way administrative records rosters were formed, particularly those associated with personal income tax records. The main reason for this lack of understanding is that there is not a great deal of useful characteristics information attached to many of the administrative records files. NRFU was not pursued for even a sample of these administrative records-enumerated households during the 2020 Census, which further limits the possibility of obtaining a better understanding of the counterfactuals involved.

Questions in need of further investigation include the following:

  1. How often did the administrative records model for housing unit status correctly identify nonexistent, vacant, and occupied dwellings? Even this question is not easy to answer since there are no true values for housing unit status (though the independent listing done for the PES could be used as a substitute for true values). Also, the quality of USPS UAA codes is unknown.
  2. How does the quality of administrative records enumerations compare with the quality of self-responses? This does not bear immediately on how the use of administrative records enumerations affected the quality of the counts in 2020, but it could provide useful information about when to cut off Self-Response in the future.
  3. How does the quality of administrative records enumerations compare to the enumerations collected during NRFU interview attempts 1–6, using several thresholds for the quality of administrative records?
  4. How does the quality of administrative records enumerations compare to NRFU proxy enumerations? This could be answered for several thresholds for various demographic groups and types of housing (e.g., detached, multiunit, seasonal). The impact of needing corroboration could also be examined. Further, the possibility of using proxy information in conjunction with administrative records information for purposes of enumeration could be investigated. Finally, the possibility of using differing thresholds and differing degrees of corroboration depending on the urgency of closeout could be examined.

___________________

for percentages are people in occupied units providing the designated type of response (e.g., Self-Response).

Suggested Citation:"8 Use of Administrative Records for Enumeration in the 2020 Census." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
  1. How does the quality of administrative records enumerations compare to NRFU-generated whole-household imputations?
  2. What are answers to questions 1–5 if conditioned on areas that are majority Hispanic people or majority non-Hispanic Black people? Fay (2021) mentioned that an examination of vacancy rates for block groups classified in this way led to a Census Bureau concern about the overestimation of administrative records-determined vacant housing units in those block groups. Fay (2021) noted, “Whether the quality of AR [administrative records] enumeration will vary in other respects for members of previously undercounted groups remains an important question.” These questions could also be considered as a function of the threshold used to assess the quality of the administrative records used.
  3. The training set for the occupancy models was 10 years old when applied, so are there alternative possibilities for a more recent training set?
  4. How could the Census Bureau address the fact that administrative records enumerations in the 2020 Census were not completed in NRFU for interview attempts 2–6, so no training set based on 2020 data could provide information from comparisons of NRFU with administrative responses?

In addition, both the person-place model and the household composition model need to be validated using standard diagnostics, such as cross validation, to assess their degree of fit.

There is no question that the Census Bureau is to be applauded for its research program and first application of administrative records for enumeration, which likely resulted in improved census counts. However, questions do remain—the most important being the degree of variation in the quality of enumeration for race and ethnicity groups.

8.6 LOOKING TO 2030

8.6.1 Administrative Records Enumeration as Part of Nonresponse Followup

The use of administrative records for census enumerations in NRFU, while kept to a relatively modest 4.6% of total enumerations in 2020 (including vacant and nonresidential housing units), is potentially a key change for future census taking. There are good reasons to believe that this method would become more effective through increased use, especially given that the quality of administrative records may improve over time. The remaining questions are: (1) what share of enumerations in the 2030 Census could be completed using administrative records while still keeping with census data-quality standards; (2) what are the research studies needed to answer this question; and (3) what data

Suggested Citation:"8 Use of Administrative Records for Enumeration in the 2020 Census." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×

inequities are a consequence of the use of administrative records, if any, and how might they be reduced?

The complication in conducting such research is that there are no true values for the population counts for domains of interest. The ideal would be to compare the counts from a census using administrative records for enumeration to the same census not using this methodology, and then to compare both to the truth. But, without truth, one must be aware that both the administrative records entries and the census counts are subject to undercoverage, duplication, and erroneous enumeration, which complicates comparisons. This problem could be addressed by applying the results of the 2020 PES to produce a set of basic collection units (similar to block groups) for which there is reason to believe the counts are reasonably close to being correct. However, the time lag of the 2020 PES could potentially be a concern for this purpose.

The use of artificial (simulated) populations with a variety of properties—for example, various relations of administrative values to true values—is a valuable component of a comprehensive assessment of the performance of administrative records and should be pursued. The idea is to simply make a simulated census data set based on conjectured degrees of Self-Response, NRFU of various stages, MAF errors of various types, enumeration errors of various types, etc., and then test the performance of various models as the simulated dataset is adjusted for various assumed properties of the U.S. population and NRFU workload. The disadvantage is all the assumptions one must make, but the benefit is that one knows what the true values are for the simulation. If some robustness is evident across assumption sets, it could provide a degree of confidence that the findings are generalizable.

Given all the above, it seems that the best way of answering these questions is through a census test, in which a sample of cases that are enumerated using administrative records are carried through to the end of NRFU enumeration for purposes of comparison with a PES providing “true” values. Absent such a research effort, there will not be an evidence base for the Census Bureau to determine the extent to which this methodology should be used for the 2030 Census.

In terms of error assessments for such research, it is reasonable to treat the use of administrative records for enumeration as a type of whole-household imputation. In that case, to make a global comparison of the two techniques (as opposed to answering specific research questions as to why these two systems provide different counts), it is relatively unimportant to compare the number of occupants of individual households using census field processes to the number arrived at through estimation using administrative records—it makes more sense to compare the counts for small areas.

If one is interested in testing artificial populations, the application to this problem would be straightforward. Given the concern about the quality of administrative records-based enumerations for some demographic groups,

Suggested Citation:"8 Use of Administrative Records for Enumeration in the 2020 Census." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×

domains defined using such demographics could also be employed, and the resulting impact on errors examined.

While progress was made on the problem of incorporating administrative data into the enumeration of the 2020 Census for particular housing units for which this approach was supportable, there remain reasons to assume that the best approach for 2030 may differ in important respects from the approach taken in 2020. This is because there is no reason to believe that any of the current models—household unit status model, person-place model, and household composition model—or the distance function are optimal. For example, for the person-place model, one might replace the minimum with something less conservative. In the case of the distance function, some alternatives to consider are weighted Euclidean metrics, weighted absolute deviations, or probability of error with cutoffs. (See Efron, 1978, for additional ideas.) Also, it is unclear whether the optimal thresholds were applied. There are also questions about the appropriate census processes used in conjunction with administrative records data. For example: (1) Should every NRFU case receive a single NRFU visit prior use of administrative records data? (2) Should every initially determined vacant household be sent an additional postcard? (3) Should the Census Bureau separate various administrative records files into primary and corroborative groups? Given these and other questions, a research program is needed that assumes nothing about the best way to proceed. All these questions and other aspects of the use of administrative records in 2030 NRFU need to be examined and alternatives evaluated so the best version of the method can be applied.

Assuming continued research and evaluation can be done to establish the quality of administrative records-based enumerations of portions of the NRFU workload and to address other methodological and operational issues, it could well make sense for the Census Bureau to plan to expand the percentage of NRFU cases enumerated in this manner. A doubling from 5 to 10% of administrative records-based enumerations of addresses (which would be about 25% of the NRFU workload based on 2020 self-response rates) could be a challenging but feasible target. Of course, other applications of administrative records in the census, such as assistance in identifying new addresses, editing and imputation, improving the operational efficiency of data collection, and assistance in evaluation are important, not only to continue, but also to expand and improve to the extent possible. In addition, use of administrative records (e.g., birth records) to correct the persistently large net undercount of children ages 0–4 in the census (see Chapter 4) is an area to pursue.

Suggested Citation:"8 Use of Administrative Records for Enumeration in the 2020 Census." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×

8.6.2 Using Administrative Records on a Large Scale for Census Enumerations—Examples from Peer Countries

Peer countries that do not have well-developed population registers are conducting research on using administrative records to augment the traditional census (e.g., for imputation of missing items) and to enumerate a large portion—perhaps a majority—of the population. Surveys (self-responses) would be used for the remainder of the population in a “combined approach.” Examples include:

  • The United Kingdom (Henry et al., 2023) used administrative records for such purposes as imputation for missing items in its 2021 Census and is working toward a fuller use of administrative records for its annual population estimates for local areas with about 1,000–3,000 people.
  • Canada (Bowlby and Beaulieu, 2023) is working to increase the percentage of administrative records enumerations from less than 0.1% of occupied housing units in 2021 to a significant percentage of the NRFU workload in 2026, and then to a census that combines self-response with administrative records in 2031. Alternatives being considered for 2031 include eliminating NRFU altogether or dividing the country into zones enumerated exclusively with administrative records and zones enumerated with a traditional census.
  • Australia (Holmberg and Watmuff, 2023) used integrated person-level administrative records in its 2021 Census for such auxiliary purposes as improving imputation. It also had a backup plan (as did Canada) to use administrative records in areas that experienced wildfires or similar natural disasters at the time of the census. Australia is working toward the use of administrative records for enumeration on a small scale in 2026 and on a much larger scale for 2031.
  • New Zealand (Bycroft and Matheson-Dunning, 2023) is working toward an “administrative records first” census with surveys for some information in 2028, driven by natural disasters (an earthquake delayed the 2011 Census to 2013 and cyclones have delayed the 2023 Census), unexpectedly low response rates in the 2018 Census, and high-cost field work in the 2023 Census. Its new Data and Statistics Act of 2022 gives authority to Statistics New Zealand to obtain records from public-sector agencies and drops language prescribing how census data were to be collected for both population and dwellings. New Zealand used administrative records in 2018 to create its dwelling frame, for item nonresponse, and for quality assessment. Because of low response, New Zealand ended up using administrative records for about 3% of the population in specific dwellings and another 8% in small geographic areas. By design, New Zealand is expanding the use of administrative records-based enumerations in 2023 and engaging in public discussions about
Suggested Citation:"8 Use of Administrative Records for Enumeration in the 2020 Census." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
  • its preference to use a combined administrative-records-first-with-surveys model for 2028.

Bowlby and Beaulieu (2023) report the following from Statistics Canada’s evaluations of its administrative records and models to date:

  • Statistics Canada has access to good-quality administrative data but not for all areas or population groups;
  • Administrative data population counts appear to overcount the population somewhat but less so than the traditional census undercounts the population;
  • The current “in-scope” model accurately predicts if an individual is in or out of scope for the census 88% of the time;
  • The current “household model” places individuals at correct addresses 80% of the time; and
  • The large majority of census households can be exactly recreated by administrative data—61% of derived households match exactly to the census.

These assessments by Statistics Canada are encouraging but also indicate how far Canada still has to go to use administrative records enumerations with confidence in the quality of the resulting data. The United States would need similar fine-grained assessments for geographic areas and population groups.

8.6.3 Determining the Appropriateness and Feasibility of a Significantly Increased Role for Administrative Records Enumerations in the 2030 Census

Congruent with peer countries, the Census Bureau and groups contracted to consider new methods for the 2030 Census have signaled interest in the potential of using administrative records enumerations for a substantial portion of the population with follow-up surveys to fill in gaps (e.g., Macagnone, 2022; JASON, 2016 and associated news accounts such as Mervis, 2017; and Keller and Shipp, 2021). We do not have knowledge of the Census Bureau’s specific research plans for possibly increasing the use of administrative records enumerations in a substantial way in 2030 and thus cannot comment on them, nor do we take a position on the general principle of original enumeration via administrative records. That said, as we discuss again in Chapter 12, we believe that such a major shift is almost certainly premature for the 2030 Census. In this section, we note the following kinds of technical, operational, political, legal, and ethical issues that need to be considered—issues that statistical offices in the UK, Canada, Australia, and New Zealand have identified as critical in their own research on administrative records for enumeration purposes, particularly the need for public and stakeholder outreach and engagement.

Suggested Citation:"8 Use of Administrative Records for Enumeration in the 2020 Census." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×

Data Quality

A key issue for assessing the potential for administrative records-based enumerations is the quality of currently and potentially available records, assessed in terms of such attributes as: timeliness (referencing Census Day and available to the Census Bureau in the needed timeframe); accuracy of identifying a living person at the person’s usual residence according to census residence rules; and completeness and accuracy of characteristics. For equity in allocation of votes and resources, quality needs to be assessed not only overall but also for geographic areas and population groups. Thus, for legislative redistricting, the United States has statutory and case law requirements for accurate data for small areas and race and ethnic populations. Quality also needs to be assessed both in absolute terms, relative to a measure of truth, and in comparison with other types of enumerations, such as NRFU household and proxy interviews and whole-person imputations.

Public and Stakeholder Acceptance

In the United States, the census is required by the Constitution, with Congress having authority over the content and methods. Title 13 delegates this responsibility to the U.S. Secretary of Commerce, but Congress can and has exerted its will—to add race categories, for example, and to mandate that the Census Bureau provide small-area data to states for redistricting. According to Title 13, the census of population is also a census of housing (first conducted in 1940). Regarding items in the census, a review of federal agency requirements determined that all items, including age, sex, race, ethnicity, household relationship, and housing tenure, were mandated in legislation for subnational geographic areas, such as census tracts and school districts (National Research Council, 1995:App. M).

Regarding administrative records use, Title 13, Section 6 provides authority for the Secretary of Commerce to obtain records of federal agencies and other entities for use in its censuses and surveys “consistent with the kind, timeliness, quality and scope of the statistics required.” Nonetheless, given the long history of “actual enumeration” for the census, and of adverse reactions to such concepts as data warehouses (see, e.g., Yorsz, 1976), an active and early dialogue with Congress would be essential to achieve buy-in for any Census Bureau plan involving substantial use of administrative records for original census enumeration. Indeed, legislation allowing such use could be desirable to avoid litigation over the meaning of “actual enumeration.”

Dialogue would also be needed with other stakeholders, including federal agencies, state and local governments, and data users in government, academia, the private sector, and the nongovernmental organization sector. Civil rights groups and others would be particularly concerned with how well

Suggested Citation:"8 Use of Administrative Records for Enumeration in the 2020 Census." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×

administrative records cover areas and population groups and what means would be used to fill in gaps. Finally, the general public would need to be consulted regarding perceptions of a combined census approach as opposed to the concept of the census as a civic ceremony in which personal participation by all is important and expected.

A related issue that would need stakeholder and public input concerns the move from a household-based census to a person-based census, which use of administrative records entails. There are many important user needs for data not only on housing units and people, but also on households and families. For example, knowledge of household and family structure in terms of numbers of children and adults is critical for many services, including education. Historically, the census has been a nested data collection, using the address as the basic enumeration unit and determining the occupancy status of housing units, and the people resident in occupied units and their relationships. Administrative records may contain only individual-level data (e.g., Medicare) or data for disparate program units (e.g., income tax filing units or income support program eligible units). The Census Bureau’s models described above link people to housing units and addresses, but their quality needs to be assessed.

One more issue concerns the choice of administrative records. Brown et al. (2023) used a large number of administrative records to simulate a “real-time” 100-percent-administrative-records census (see Section 8.6.4). The records included those authorized under a 2019 Executive Order intended for the Census Bureau to estimate the citizen population, such as U.S. Department of Homeland Security records. For use as census enumerations, it would be essential to take ethical considerations into account in selecting records so that the trust of the public and stakeholders is not undermined by the use of records perceived with distrust and apprehension.

Operational and Technical Considerations

A “combined approach,” which uses administrative records enumerations and surveys to fill in gaps, appears to be the goal of countries such as Canada and Australia. A challenge for a combined approach is how to handle messaging, self-response targeting, and field operations for areas and groups that are not well counted by administrative records and thus require surveys. It is easy to imagine considerable public confusion with likely adverse effects for cooperation. Another possible operational problem involves the potential for an important administrative records source to become unavailable or substantially altered close to the census.

It follows that, to generate confidence in the use of administrative records data for original enumeration of households, it is even more imperative to resolve technical concerns with the models associated with using administrative records to supplement NRFU. As pointed out in this chapter, research is needed

Suggested Citation:"8 Use of Administrative Records for Enumeration in the 2020 Census." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×

on (among other things) bridging the long 10-year gap between U.S. censuses in using the previous census as the training set for inference. We also noted that, logically, the 2020 Census will be unable to serve as a training set for original enumeration via records.

8.6.4 “Real-Time” Simulation of an Administrative Records Census

The Census Bureau recently released Real-Time 2020 Administrative Record Census Simulation: A Design for the 21st Century (Brown et al., 2023), one of the planned evaluations in the formal program of evaluations and experiments for the 2020 Census. The continuation of work that has been conducted for over 25 years at the Census Bureau on the application of administrative records for various uses in censuses and surveys (and been an experiment in 2000 and 2010 Censuses), Brown et al. (2023) considers the problem of producing a person-based population count from administrative records in place of a traditional census. It is crucial to differentiate, again, this person-based approach from the methodology actually used in the 2020 Census: using administrative records data in a small percentage of NRFU cases for which at least one in-person NRFU attempt was made and for which the quality of administrative records information was judged to be high. By comparison, Brown et al. (2023) did not attempt to enumerate the occupants of residences but instead computed a sum of weighted averages of probabilities of occupation over possibly several residences.

The administrative records census simulated by Brown et al. (2023) exceeded the count achieved in the 2020 Census by 2.3%, and it is certainly a useful analytic exercise to guide further research and development. However, in several respects, it falls short of the quality standards that would be needed for actual implementation in a decennial census. First, the method used in the simulation added into the census people for whom the location of residence was unverified or, if verified, not geocoded. As Brown et al. (2023:xiii) pointed out: “About 14.4 million administrative record people are observed at 9.8 million addresses outside the 2020 Census collection universe, and another 8.7 million are at addresses not linked to the MAF.” Where would such households be placed geographically for reapportionment and redistricting? Also, as mentioned above, the algorithm used often did not place individuals in single households. Instead, it used a set of weights for each person based on probabilities that each individual was a resident of various households. The resulting uncertainty, especially for low levels of geographic aggregation, can be seen in Brown et al. (2023:Table 69) for aggregates at the tract level and below. This uncertainty is inconsistent with data users’ needs for small-area data from the census.

Second, the characteristics information necessary for various applications, including redistricting, often had to be modeled or imputed. It is difficult to compare the degree of modeling or imputation carried out in Brown

Suggested Citation:"8 Use of Administrative Records for Enumeration in the 2020 Census." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×

et al. (2023) for characteristics information with the degree of missingness of characteristics information in the 2020 Census responses. This is because in Brown et al. (2023), a characteristics variable could be missing on all of the administrative files used, or multiple disagreeing values could be available for a characteristic. While comparisons are complicated, the degree to which unique characteristics were unavailable in the group of files was a quality issue that Brown et al. (2023:248) acknowledged: “More research is needed to improve business rules and models for predicting demographic characteristics of people in administrative records. The rules and demographic modeling for administrative records in this report do not match well with as-reported values in the 2020 Census.” Further, Brown et al. (2023:Table 77) showed that characteristics imputation alone was responsible for a considerable amount of uncertainty for race and ethnicity values.

Third, Brown et al. (2023:247) conceded that matching error, both false nonmatches and false matches, are a concern using this method. They recommended that:

Record linkage error should be quantified and minimized. AR [administrative records] census estimates contain record linkage error. Though we believe the remaining duplication in the AR census after unduplicating person records by PIK/EPIK6 is likely to be very low, the magnitude of the false nonmatches (as well as false matches) should be estimated. It would be helpful to conduct research on the duplication rate among PIKs and EPIKs and on how entity resolution could be improved to decrease that rate. It would also be desirable to develop record linkage error measures for the Census Bureau’s production record linkage system and to estimate how the errors propagate in the statistics.

In addition, as noted previously, some of the administrative record files used in the study could create considerable fear and distrust if used in the census. For instance, the 31 administrative lists used included: (1) Immigration and Customs Enforcement (ICE)-provided data from its Enforcement and Removal Operations; (2) ICE-provided data from the Student Exchange Visitor Information System; (3) U.S. Citizenship and Immigration Services (USCIS)provided data on non-U.S. citizens receiving Deferred Action for Childhood Arrivals, those classified as Special Immigrant Juveniles, and those otherwise interacting with USCIS and appearing to lack a lawful immigration status; (4) USCIS-provided data on non-U.S. citizens with Temporary Protected Status;

___________________

6 Enhanced Protected Identification Keys” (EPIKs) are unique identifiers attached to persons, and thereby represent an extension of PIKs, through the use of additional reference files added to the Person Validation Identification System (PVS). EPIKs enable the Census Bureau to provide unique codes for residents of the United States who, without such an extension, would be unable to be provided with such identification codes. Doing this greatly facilitates record linkage for these individuals across various administrative records, which is used to identify their duplicates across administrative records systems.

Suggested Citation:"8 Use of Administrative Records for Enumeration in the 2020 Census." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×

and (5) U.S. Customs and Border Protection-provided data from the Arrival and Departure Information System. It is easy to see how use of some of these lists could seriously impact public opinion concerning the census. Brown et al. (2023) noted that these types of records (i.e., from ICE) were available to the Census Bureau pursuant to Executive Order No. 13880 (2019) that was subsequently repealed and that these records have not been approved for future use.

Finally, and perhaps most importantly, there is currently no comprehensive assessment of the quality of an administrative records-based approach in comparison to that of the current methodology of the decennial census. While the total count of 2.3% above the 2020 Census is encouraging, there are larger discrepancies for smaller geographic areas, and there are sizeable discrepancies for various demographic groups. Beyond what is provided in the Brown et al. (2023) study, fuller understanding is needed of the differences between the two methods as a function of the various types of households and housing units, as well as the impact such an approach would have on public opinion and on the uses of these estimates for standard decennial census purposes.

8.7 CONCLUSIONS AND RECOMMENDATIONS

Conclusion 8.1: Based on the information currently available, the U.S. Census Bureau deserves commendation for the meaningful steps made in the limited use of administrative records as an alternative to the enumeration of some nonresponding households in the 2020 Census. The Census Bureau acted with appropriate levels of both boldness and caution in settling on conditions under which a nonresponding household at a designated residence would be enumerated using records information after only one enumerator visit rather than six, yet still ensuring that single in-person visit. Moreover, this change in approach was preceded by an extensive history of high-quality experimentation and testing.

Recommendation 8.1: The U.S. Census Bureau should continue its research and development program concerning the best ways to use administrative records as a supplement to decennial census operations. Potential uses of administrative records include expanding enumeration of limited subsets of the 2030 Census population in the Nonresponse Followup workload, reducing proxy responses and whole-person imputations, and possibly redressing the long-standing net undercount of children ages 0–4.

Conclusion 8.2: Moving to the use of administrative records to enumerate a substantial proportion of the population without a contact

Suggested Citation:"8 Use of Administrative Records for Enumeration in the 2020 Census." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×

attempt does not seem feasible for the 2030 Census. Rather, it appears to be a long-term proposition, requiring extensive testing, research, and engagement with Congress, other stakeholders, and the public. Not only are there operational and technical issues to resolve and data-quality standards to determine and assess for geographic areas and population subgroups, but there is also the need to consider the ramifications of switching from a household/address-based focus to a person-based focus and to address issues of legal, constitutional, and public acceptability. It will be essential for the U.S. Census Bureau to be as transparent as possible about its research and testing and to have open and constructive dialogue with all parties.

Suggested Citation:"8 Use of Administrative Records for Enumeration in the 2020 Census." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 201
Suggested Citation:"8 Use of Administrative Records for Enumeration in the 2020 Census." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 202
Suggested Citation:"8 Use of Administrative Records for Enumeration in the 2020 Census." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 203
Suggested Citation:"8 Use of Administrative Records for Enumeration in the 2020 Census." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 204
Suggested Citation:"8 Use of Administrative Records for Enumeration in the 2020 Census." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 205
Suggested Citation:"8 Use of Administrative Records for Enumeration in the 2020 Census." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 206
Suggested Citation:"8 Use of Administrative Records for Enumeration in the 2020 Census." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 207
Suggested Citation:"8 Use of Administrative Records for Enumeration in the 2020 Census." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 208
Suggested Citation:"8 Use of Administrative Records for Enumeration in the 2020 Census." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 209
Suggested Citation:"8 Use of Administrative Records for Enumeration in the 2020 Census." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 210
Suggested Citation:"8 Use of Administrative Records for Enumeration in the 2020 Census." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 211
Suggested Citation:"8 Use of Administrative Records for Enumeration in the 2020 Census." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 212
Suggested Citation:"8 Use of Administrative Records for Enumeration in the 2020 Census." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 213
Suggested Citation:"8 Use of Administrative Records for Enumeration in the 2020 Census." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 214
Suggested Citation:"8 Use of Administrative Records for Enumeration in the 2020 Census." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 215
Suggested Citation:"8 Use of Administrative Records for Enumeration in the 2020 Census." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 216
Suggested Citation:"8 Use of Administrative Records for Enumeration in the 2020 Census." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 217
Suggested Citation:"8 Use of Administrative Records for Enumeration in the 2020 Census." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 218
Suggested Citation:"8 Use of Administrative Records for Enumeration in the 2020 Census." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 219
Suggested Citation:"8 Use of Administrative Records for Enumeration in the 2020 Census." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 220
Suggested Citation:"8 Use of Administrative Records for Enumeration in the 2020 Census." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 221
Suggested Citation:"8 Use of Administrative Records for Enumeration in the 2020 Census." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 222
Suggested Citation:"8 Use of Administrative Records for Enumeration in the 2020 Census." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 223
Suggested Citation:"8 Use of Administrative Records for Enumeration in the 2020 Census." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 224
Suggested Citation:"8 Use of Administrative Records for Enumeration in the 2020 Census." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 225
Suggested Citation:"8 Use of Administrative Records for Enumeration in the 2020 Census." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 226
Next: 9 Measurement of the Group Quarters Population »
Assessing the 2020 Census: Final Report Get This Book
×
 Assessing the 2020 Census: Final Report
Buy Paperback | $60.00 Buy Ebook | $48.99
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

Since 1790, the U.S. census has been a recurring, essential civic ceremony in which everyone counts; it reaffirms a commitment to equality among all, as political representation is explicitly tied to population counts. Assessing the 2020 Census looks at the quality of the 2020 Census and its constituent operations, drawing appropriate comparisons with prior censuses. The report acknowledges the extraordinary challenges the Census Bureau faced in conducting the census and provides guidance as it plans for the 2030 Census. In addition, the report encourages research and development as the goals and designs for the 2030 Census are developed, urging the Census Bureau to establish a true partnership with census data users and government partners at the state, local, tribal, and federal levels.

READ FREE ONLINE

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    Switch between the Original Pages, where you can read the report as it appeared in print, and Text Pages for the web version, where you can highlight and search the text.

    « Back Next »
  6. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  7. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  8. ×

    View our suggested citation for this chapter.

    « Back Next »
  9. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!