National Academies Press: OpenBook

Assessing the 2020 Census: Final Report (2023)

Chapter: 11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products

« Previous: 10 Measurement of Race and Ethnicity
Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×

– 11 –

Impact of New Confidentiality-Protection Methods on 2020 Census Data Products

The panel’s charge was to study the quality of the 2020 Census broadly defined (see Chapter 1) and make recommendations for 2030. Other chapters address the quality of 2020 Census data through the steps to produce the Census Edited File (CEF)—from generating the address list to enumerating residents to capturing and editing their responses, coding write-in answers to the race and ethnicity questions, and imputing values for missing answers. In this chapter, we assess the impact on data quality of the final step in creating tables and other products from the CEF for public use—namely, implementing one or more approaches to protect the confidentiality of the information.

Over the decades, the U.S. Census Bureau has used various methods (e.g., data reduction, cell suppression, swapping, noise injection using formally private algorithms—see Box 11.1 and McKenna, 2018) to reduce the chances that someone could “reidentify” a respondent from published census tables in reports and computer summary files.1 The Census Bureau calls the set of confidentiality-protection methods used for a specific census or survey a Disclosure Avoidance System or DAS. The application of a DAS to data products is essential to honor confidentiality pledges to respondents and

___________________

1 “Reidentification” involves “reconstructing” micro records (finest grain, individual person- or household-specific information) from published tables, followed by linking the records with outside sources to find potential matches—see Muralidhar and Domingo-Ferrer, 2023a).

Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×

preclude misuse of respondent information for administrative or enforcement (i.e., nonstatistical) purposes. By definition, however, disclosure avoidance impairs data quality relative to the collected data. The degree of impairment depends on the disclosure avoidance choices of the data producer. The question is how well the 2020 Census DAS preserved data utility while protecting confidentiality.

As described in Chapter 1, the decennial census is a national resource and the only reliable source of data for the nation’s many small governmental jurisdictions. The return on the nation’s sizeable investment in the census data collection comes from data products that are available on a timely basis and address users’ needs for accurate data for small geographic areas and population groups, defined by race, ethnicity, and other characteristics. In this context, confidentiality protection is an important but not overriding objective of data product design and implementation, to the exclusion of the public’s needs for useful data.

This chapter reviews the confidentiality-protection methods for the 2020 Census, which include traditional methods such as reduction of table content in most 2020 data products and cell suppression for detailed race and ethnicity tables (termed “adaptive design” by the Census Bureau for the particular form of cell suppression used in 2020). More importantly, the Census Bureau adopted algorithms that satisfy the concept of “formal” or “differential” privacy, developed by computer scientists in the mid-2000s, as the principal protection method for 2020, in place of the data-swapping methods used in the 1990–2010 Censuses.

We first provide an overview of 2020 Census data products that have been released or are still under development. Then we review the Census Bureau’s decision to adopt formally private confidentiality-protection methods for the 2020 data products, describe challenges the Census Bureau encountered in implementation, and draw lessons for 2030. Throughout, we support the intention of the 2020 DAS, namely, to provide added protection in the face of threats to confidentiality from the readily available, sophisticated matching technologies and multiplicity of data on the internet. We find, however, that implementation of the 2020 DAS substantially delayed data release, resulted in sizable cutbacks in available data, and impaired the quality of some data (see Box 11.2).

Three appendixes support this chapter. Appendix F.1 provides a mathematical explanation of differential privacy. Appendix F.2 presents two important reference tables, one comparing the major 2020 Census and 2010 Census data products and the other describing the “demonstration files” generated by the Census Bureau, applying various settings of the 2020 confidentiality-protection methodology to 2010 Census data to enable feedback from data users. Appendix F.3 provides examples of the effects of the 2020 DAS on data quality for a range of uses.

Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×

11.1 CENSUS DATA PRODUCTS: OVERVIEW OF CONTENT AND TIMING OF RELEASE

Historically, data from the decennial census were tabulated and made available in the form of printed reports. Publication of the reports could take years after data collection, but it was common practice to issue “preliminary” reports of total population very quickly after completion of data collection and “advance” reports with specific tables that were later bound into books. Although only a fraction of the data collected could be printed in the precomputer age, small-area data were included in census reports from the beginning: for counties and minor civil divisions back to 1790; for incorporated places back to 1880 (and even further back for some large cities); for census tracts in the cities that pioneered these neighborhood areas back to 1910;

Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×

for enumeration districts (the equivalent of today’s block groups) back to 1930; and for blocks in large cities back to 1940 (housing characteristics) and 1960 (population and housing characteristics) (Census Geographies Project, 2022:Ch. 3).

The 1960 Census saw a major expansion of geographic and subject detail included in printed reports. In addition, Summary Files of aggregate statistics (e.g., a table of single years of age by race and ethnicity for census tracts) and Public Use Microdata Samples (PUMS) were produced at the specific request of and with funding from outside groups (U.S. Census Bureau, 1966). Computer data products became standard beginning in the 1970 Census.

Four major computer summary products from the 1990–2020 Censuses are listed below (see Table F.1 in Appendix F.2 for detailed information on their content and other features for 2010–2020).2 They are:

  • 50 state total population counts (including federal employees overseas) for reapportionment of the U.S. House of Representatives;
  • Redistricting File of race and Hispanic origin tables for the total population and voting-age population (ages 18 and older) for all census geographies, including but not limited to states, counties, towns, places, school districts, American Indian tribal areas, Alaska Native villages, census tracts, block groups, and blocks;
  • Summary File 1 (SF1), renamed for 2020 as the Demographic and Housing Characteristics (DHC) File—a workhorse file for federal, state, and local governments and other users, with such information as single years of age by race and ethnicity, household type, and owner/renter status for all census geographies down to the block level (in 2020, a Supplemental-DHC File is to provide a limited set of tables of linked household and person characteristics with greatly curtailed geographic detail); and
  • Summary File 2 (SF2), renamed for 2020 as the Detailed Demographic and Housing Characteristics (DDHC) File with two subfiles A and B, which provide data for hundreds of race and ethnicity categories on age and sex (DDHC-A) and household type and housing tenure (DDHC-B) with less geographic detail than SF1/DHC.

The 1990–2010 Censuses had processes in place to generate data products on a timely basis; see Table 11.1. The 50 state numbers for reapportionment

___________________

2 Each census has released additional specialized tabular products, such as Demographic Profiles and Congressional District Files, and PUMS files—these files are not further discussed except that Table F.1 summarizes content and release schedules for the 2010 and 2020 Demographic Profile and PUMS files. In addition, the 1970–2000 Censuses released data products from large samples of the population that received a “long-form” questionnaire, which obtained data on education, income, housing costs, and many other variables. These variables are now collected in the American Community Survey.

Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×

Table 11.1 Release Dates/Scheduled Dates for Four Census Data Products, 1990–2020 Censuses

Census/Year/Data Product Release Date
Census Year Census Year + 1 Census Year + 2 Census Year + 3
1990
Reapportionment Dec
Redistricting Jan–Mar
SF1/DHC Mar–Jun
SF2/DDHC Sep–Dec
2000
Reapportionment Dec
Redistricting Mar
SF1/DHC Jun–Aug
SF2/DDHC Jan–Apr
2010
Reapportionment Dec
Redistricting Feb–Mar
SF1/DHC Jun–Aug
SF2/DDHC Dec– -Apr
2020
Reapportionment April
Redistricting Aug
SF1/DHC Maya
SF2/DDHC Sepb

a DHC was released on May 25, 2023; Supplemental DHC is scheduled for September 2024 release.

b DDHC-A was released on September 21, 2023; DDHC-B is scheduled for September 2024 release.

NOTES: SF, Summary File; DHC, Demographic and Housing Characteristics (File); DDHC, Detailed DHC (File). Range of months indicates product was released state by state. In 2020, reapportionment and redistricting data were delayed because of delays in enumeration and processing from the COVID-19 pandemic. In 2000 and 2010, a national file for SF2 was released in December of Census Year + 2. SF1 and SF2 were delayed in the 1980 Census (not shown) because of budget cuts in the first term of the Reagan administration.

SOURCE: Compiled by panel from Census Bureau public releases.

Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×

were released 9 months after Census Day (April 1), and a Redistricting File (replacing the old preliminary and advance reports) was released 12 months after Census Day as provided by statute (13 U.S.C. § 141(b)–(c)).3 SF1 (now DHC) was typically released state-by-state 2–5 months after the Redistricting File. SF2 (now DDHC-A, B) was typically released 5–8 months after SF1. Data release was essentially complete 2 years after Census Day.

In contrast, 2020 Census products’ preparation and release processes have been far from smooth or timely. Delays in the enumeration consequent to the COVID-19 pandemic led to delays in release of the reapportionment state counts (released April 2021) and the Redistricting File (released August 2021). The 2020 DHC File was released on May 25, 2023, and the DDHC-A File was released on September 21, 2023. The DDHC-B and Supplemental-DHC Files are not scheduled for release until September 2024. The delays in releasing the DHC and subsequent products are primarily due to the Census Bureau’s decision to adopt formal privacy methods at a late stage in census planning. The new approach had not previously been tested in the census context, and the Census Bureau had no backup plan when implementation proved more difficult and time consuming than expected.

With regard to content, the 2010 and 2020 Redistricting Files are virtually the same in terms of meeting the needs of the redistricting community. The 2020 Redistricting File added a table of people in group quarters by type (e.g., nursing home, dormitory, etc.).

The 2020 DHC File has reduced content compared with the 2010 SF1.4 For example, the 2020 DHC File has 36 tables for blocks as the lowest level of geography compared with 73 block-level tables in the 2010 SF1 File; similarly, the 2020 DHC File has 19 tables for census tracts as the lowest geographic level compared with 28 tract-level tables in the 2010 SF1 File. Some amount of table deletion is sensible for added confidentiality protection, although only a comprehensive user survey could determine which deletions were broadly acceptable and which caused problems for long-established data uses. Some deletions are known to be problematic. For example, item nonresponse indicators were provided in 2010 for blocks, which users found helpful in deciding which areas to flag as likely having lesser quality data; but these indicators were deleted altogether in 2020, and tables that link household or family and person characteristics were cut back for all geographies. Some of the deleted household or family and person “join” variables will be provided in the Supplemental-DHC File in September 2024—for the nation and states only, and not for any substate geographies as originally planned.

___________________

3 The current form of the deadlines in 13 U.S.C. § 141(b)–(c) dates to amendments in 1975–1976.

4 On the other hand, the Census Bureau did add some tables to the 2020 DHC that were not previously available, including sex-by-age tables for all combinations of race/ethnicity.

Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×

The 2020 DDHC-A File and forthcoming DDHC-B File have greatly reduced content compared with the 2010 SF2 (and a companion American Indian or Alaska Native File). DDHC-A has two tables for race and ethnicity groups with at least 22 people in a census tract or other area: (1) total population; and (2) sex by selected age categories depending on the size of the group. Similarly, DDHC-B has two tables: (1) household type (number of categories depending on the size of the group); and (2) tenure (owner/renter). In contrast, SF2 in 2010 had 61 tables for census tracts as the lowest level of geography and 10 tables (for people in group quarters) for counties as the lowest geographic level.5 Also, the DDHC-A and -B Files provide tables for a subset of types of geography including cities but not towns, although towns play an important role in local government in many states.

The 2020 DAS design and implementation process generated data-user concerns beyond time delays and reduced content and geographic detail. These concerns, expressed in a letter to the Census Bureau director (Federal State Cooperative on Population Estimates Steering Committee, 2022), focused on accuracy and utility and included:

  • Anomalies in the 2020 Redistricting File and in “demonstration” files for the DHC File, which call into question the data’s accuracy and utility, particularly for small geographic areas and population groups;6
  • Lack of adequate guidance for how to interpret or explain the error introduced by the DAS;
  • Concerns that the TopDown Algorithm used to protect the 2020 Redistricting and DHC Files cannot provide acceptable accuracy for variables that link or join person characteristics with housing unit characteristics;7
  • Indicators of bias in the demonstration files for the 2020 Redistricting and DHC Files that give rise to concerns about data equity (e.g., for rural versus urban areas, more diverse versus less diverse areas, majority rental versus majority owner-occupied areas); and

___________________

5 The 2010 SF2 had a higher population threshold (at least 100 people in an area) than the DDHC-A and -B Files but did not vary the number of categories for any tables.

6 On inaccuracies in the Redistricting File, see Kenny et al. (2021) and Appendix F.3; also see Appendix F.3 for examples of anomalies in the DHC File based on using 2010 data to demonstrate the expected operation of the 2020 DAS. The 2020 DAS also impairs comparability with previous censuses for small geographic areas and population groups.

7 The Census Bureau acknowledged this problem from the outset and warned users not to calculate persons per occupied housing unit from the Redistricting File. The Census Bureau included several join variables in the DHC File by adding recodes to the household head’s record for input to tabulations (e.g., size of household, presence of people ages 65 and older, presence of own children under age 18). Additional complex join variables, but many fewer than in 2010 and none for substate geographies, are to be included in the Supplemental-DHC File, developed with a different algorithm—see Table F.1.

Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
  • Concerns that the delays in releasing DHC and DDHC data are affecting other Census Bureau programs, including population estimates and coverage evaluation via Demographic Analysis (DA).

11.2 CONFIDENTIALITY PROTECTION IN THE DECENNIAL CENSUS

The Census Bureau collects census and most survey data under a pledge of confidentiality,8 which is meant to respect respondents, establish and maintain public trust, and foster the willingness of respondents to provide accurate data to the census and surveys. Indeed, since 1929, census law has stipulated that census employees must not release data that could “identify” an individual (similar protection for businesses dates to 1909—see Box 11.3). Census employees, along with external researchers, contractors, and others who have passed background checks and have Special Sworn Status, face substantial penalties for disclosure of individual information. The law does not specify what “identify” means—specifically, whether it pertains solely to such individual identifiers as name and address, or whether it also pertains to individual attributes (i.e., characteristics such as age, sex, race, household relationship), or whether it also pertains to group attributes (e.g., everyone in census block 101 is of White non-Hispanic origin). The Census Bureau has come to adopt a broad interpretation of the information protected under census law, including not just personally identifying information in census or survey responses but information about general attributes of respondents and operational paradata on the mechanics of response (e.g., whether a person self-responded or if a household did not respond).9

Census data products are never released in raw form but always include protections to minimize the risk of disclosure of an individual’s responses.10 As summarized in Box 11.1, the Census Bureau has used a wide variety of disclosure-protection methods, including table and cell suppression and, in the 1990, 2000, and 2010 Censuses, data swapping—that is, exchanging some household records among small areas to try to thwart direct one-to-one identification of census returns based on auxiliary data. The Census Bureau came to believe that swapping did not provide sufficient protection, although

___________________

8 The exception is surveys of state and local governments, through which the Census Bureau collects publicly available information on employment and finances.

9 See, for example, the material for annual data stewardship training for Census Bureau staff, the fiscal year 2023 version of which is at https://broadcast.census.gov/main/training/2023/fr/dscui/t13/story.html.

10 The exceptions are that microdata records from the census are released by the National Archives 72 years after a census and that an individual (or their heir) may request their own records from censuses not yet released by the National Archives through the Census Bureau’s Age Search service. See https://www.census.gov/history/www/genealogy/decennial_census_records/census_records_2.html.

Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×

there has been no known reidentification of individuals outside the Census Bureau from swapped summary files. Swapping also introduced error, like any disclosure-protection method, but did not change the number of people and households in an area, and swapped data from previous censuses have passed users’ face validity test for reasonable accuracy.

The Census Bureau decided to adopt a new DAS for the 2020 Census, premised on the cryptographic concept of differential privacy.11 A differential

___________________

11 The relevant computer science literature on formal privacy references “privacy” and not “confidentiality.” Statistical agencies have traditionally distinguished the two, defining “privacy” as control of one’s information, which they respect by asking only for information that is needed for public purposes and does not violate long-standing public concerns (e.g., U.S. statistical agencies do not ask about religion because of separation of church and state). Respondents give up control when they reply to censuses and surveys. In return, they are assured their information will be used for statistical purposes only and not against them as individuals (e.g., not shared with immigration or

Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×

privacy-based algorithm would infuse a controlled amount of random noise into all of the tabular cells of each data product, in the context of an overall privacy-loss budget allocated to various tabulations to reflect the tradeoff of disclosure risk and accuracy. The actual noise added to a specific cell count would remain unknown, but the distributional properties of that noise would both be known and publicly shareable. The privacy-loss budget is commonly denoted by ∊ (epsilon); even though later versions of the Census Bureau’s algorithms are parameterized by rho and delta (ρ and δ), we use ∊ as shorthand for the parameters in this report because ∊ has been the basis for most discussion with data users and the public.12 (See Appendix F.1 for a mathematical explanation.)

Any attempt to reduce disclosure risk necessarily compromises data utility. In contrast to data swapping, the new approach promised to be transparent and to explicitly acknowledge the tradeoff between data utility and confidentiality protection.13 It also promised to respond to the threats to confidentiality protection from sophisticated matching technologies and the availability of voluminous data sources about people on the internet, which could be used to try to reidentify individual respondents to the census from published tables.

11.3 THE ROAD TO THE 2020 DISCLOSURE AVOIDANCE SYSTEM

Planning a decennial census is a decade-long (and often longer) enterprise, building on lessons learned from the previous count. The Census Bureau develops an “operational plan,” initially published mid-decade, which sets forth its goals and challenges (see Chapter 2). As the decade progresses, it revises the plan and issues other public statements about decisions made and still to be made.

11.3.1 What Did the Operational Plan and Other Statements Say About Disclosure Avoidance?

The original 2015 operational plan design for disclosure avoidance simply stated that disclosure avoidance was a step in response processing in preparation for production of data products (U.S. Census Bureau, 2015:123). This statement remained as late as Version 3 of the operational plan (U.S. Census Bureau, 2017:128). Both statements located disclosure avoidance at the stage of

___________________

tax authorities). They are also assured that the agency will strive to protect their information against breaches of confidentiality—that is, disclosure of individual identities. The difference between the two terms—privacy and confidentiality—is subtle but important in framing the debate on data utility versus protection.

12 This text was added after the prepublication release to clarify the report’s usage of ∊ parameter notation due to its prominence in public discussion of disclosure avoidance in the 2020 Census.

13 A recent paper by Bailie et al. (2023) places swapping into the differential privacy framework.

Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×

processing the CEF to produce a Hundred Percent Detail File (Version 1) or a Microdata Detail File (Version 3) as input to all the data products, as was done in 2010.

In the original plan, there was no indication that disclosure avoidance would require major changes to the delivery and content of data products. Nor had there been external pressure for changes in the disclosure avoidance method, akin to the complaints about costs, undercounts, and the quality of the Master Address File (MAF) or the calls for greater utilization of administrative records for data collection (see, e.g., National Research Council, 2004a, 2010, 2011).

Concurrently, in the global context of the emergence of Big Data and sophisticated linkage technologies, computer scientists in academe and the private sector were making technical advances in the theory and practice of curating and protecting large datasets. The work was designed to protect the identity of individuals (e.g., people posting on Facebook) with new methods that would allow for research on these rich, new data sources but not leak information that could potentially harm subjects (Herdağdelen et al., 2020).

To produce new data products from highly sensitive economic datasets, the Census Bureau researched and adopted the new protection methods for products such as OnTheMap.14 Late in the 2010s, the Census Bureau’s Research and Methodology Directorate proposed applying those new methods to the 2020 Census. The Census Bureau made a presentation to the Census Scientific Advisory Committee (CSAC) in fall 2016 that argued for a new approach for confidentiality protection; in a fall 2017 presentation to CSAC, the Census Bureau indicated it planned to use differential privacy-based algorithms to infuse noise into the 2020 Census data products (Abowd, 2016; Garfinkel, 2017).

During 2016–2018, the Census Bureau simulated an “attack” on 2010 data products using commercial and publicly available data. The simulation involved first “reconstructing” individual-level records from 2010 published tabulations, then matching the commercial data (which contained name, address, sex, and age) to the reconstructed individual-level records on age, sex, and census block but not race or ethnicity. The Census Bureau estimated that 17% of the U.S. population could have their race or ethnicity identified with a high level of certainty based on comparing the matched commercial-reconstructed records to the actual responses in the CEF (which, however, an outside attacker could not do).15

___________________

14 OnTheMap (introduced in 2008) displays where employed people live and work at the block group level. It uses formally private synthetic data on the residence side, an early variant of differential privacy called probabilistic differential privacy. See Dajani et al. (2017:Slides 2, 18) and Abowd (2016).

15 The Census Bureau reported a 58% confirmed match rate by using the CEF as the outside data source in place of the commercial data, on the assumption that outside data sources would improve in accuracy in the future (see Abowd, 2021a:Table 3). In subsequent work, the Census Bureau has

Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×

The description of this simulation is available only in abbreviated form in presentations and with somewhat more detail in affidavits filed in State of Alabama v. U.S. Department of Commerce (2021) (Abowd, 2021a,c). A full description is said to be in peer review for publication in a journal16 but was not made available to this panel. There are criticisms of the simulation in the literature (e.g. Muralidhar and Domingo-Ferrer, 2023a),17 but detailed criticisms are precluded by the absence of the complete methodology. Similarly, there is not sufficient information to assess the findings in Abowd and Hawes (2023:Table 5) of a lower but not negligible confirmed match rate for the 2020 DAS (noise infusion with the TopDown Algorithm, TDA) compared with the 2010 swapping and an “enhanced” swapping. (These comparisons matched the CEF as the outside data source to a reconstructed DHC demonstration file.) That information is to be in the as-yet-unpublished paper.

Census Bureau executives initially said that the results of the original reconstruction-reidentification study did not represent a major risk for the public in responding to the census.18 At the same time, they said that the simulation provided strong justification for abandoning data swapping for a new disclosure-avoidance method based on formal privacy (e.g. Abowd, 2019b).

In July 2018, the Census Bureau issued a Federal Register notice calling for data users to explain and justify their uses of 2010 Census data tables, to finalize the roster of 2020 Census data products. After the fact, the notice’s call for feedback could be characterized as an attempt to prioritize use cases, to allow the Census Bureau to precisely tune the noise that would eventually be added to 2020 Census tables, to ensure greater accuracy for higher-priority uses (at the necessary expense of lesser accuracy for other uses). But, in the moment, that purpose was not specified; instead, the notice included only the cryptic warning that “given the need for improved confidentiality protection, we may reduce the amount of detailed data we release to the public” (83 FR 34111, July 19, 2018). Respondents to the notice were unclear of the specific purpose, found it difficult to work with the Excel spreadsheet provided as a template for response, and expressed concern over where and to what extent product content might be cut back (Hotz and Salvo, 2022). Table F.1 indicates that substantial amounts of content were dropped or provided for higher-level geographies for both the 2020 Census DHC and DDHC products compared with the 2010 Census SF1 and SF2 products.

___________________

reported confirmed match rates of 90% and above by allowing age to vary by ±2–5 years for people older than age 21. See Abowd and Hawes (2023:Section 3).

16 Personal communication from S. Keller, U.S. Census Bureau, to J. Salvo, March 9, 2023.

17 Muralidhar and Domingo-Ferrer (2023a) is accompanied by a comment by Garfinkel (2023) and a rejoinder from the authors (Muralidhar and Domingo-Ferrer, 2023b).

18 For example, Jarmin (2019a) stated in a February 2019 blog post that “the accuracy of the data our researchers obtained from this study is limited, and confirmation of re-identified responses requires access to confidential internal Census Bureau information.”

Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×

Finally, in December 2018, Version 4 of the 2020 Census operational plan (U.S. Census Bureau, 2018:139) marked the shift to what would become the new DAS:

Changes Made Since Version 3.0 Operational Plan Release: [The Data Products and Dissemination operation] now includes the application of the 2020 disclosure avoidance methodology to the microdata19 in order to produce the Microdata Detail File. The process was previously part of the Response Processing Operation (RPO). The disclosure avoidance methodology that will be implemented for the 2020 Census is known as differential privacy. Differential privacy is the scientific term for a method that adds “statistical noise” to data tables we publish in a way that protects each respondent’s identity.

The plan continued (U.S. Census Bureau, 2018:140) that “this new methodology will be tested and implemented with the 2018 End-to-End Census Test.” Version 5 of the operational plan repeated the statement (U.S. Census Bureau, 2022b:145). Further public announcements followed in a February 2019 blog by deputy director Ron Jarmin and a presentation to the media in February 2019 by chief scientist John Abowd at a meeting of the American Association for the Advancement of Science. These statements engendered concern, questioning, and criticism from the data-user community (see, e.g., Ruggles et al., 2019).

11.3.2 Implementation Issues

What Versions 4 and 5 of the operational plan did not say, nor did the public announcements in early 2019 convey, was that implementation of the revised DAS was fraught with problems from the outset. Version 5 did note (U.S. Census Bureau, 2022b:145–146):

To complete the implementation of the new disclosure methodology for 2020 Production, the entire 2020 Census data products suite must be defined in advance. The process for determining the 2020 Census data products began with the announcement of the Federal Register Notice to solicit data user feedback. The assessment phase is now underway with a final determination of 2020 Census products expected to be completed in early 2019. The Census Bureau anticipates publishing the plans for the 2020 Census data products in a future notice. . . .

The 2018 and 2020 Disclosure Avoidance System (DAS) algorithms are highly complex, under active development, address subtle though precisely stated mathematical privacy issues, and solve genuinely novel outstanding scientific problems. All of these factors create an environment where only

___________________

19 “Microdata” is a misstatement—the 2020 DAS would be applied to tabular data produced from the CEF, which would then be turned into synthetic microdata for input into the Census Bureau’s tabulation software.

Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×

rigorous software engineering standards, code review, and formal external auditing of internal code can be expected to reasonably remove major bugs from the code.

In Box 11.4, we summarize the basic steps in the TDA used in the production of the 2020 Census Redistricting and DHC Files. Tracing through the steps in the process, the complicated procedural issues and challenges (both anticipated and unanticipated) come into clearer relief, from determining the functional form of the noise infusion to allocating portions of the privacy-loss budget ∊ to subsets of variables and geographies. More importantly, the steps in the algorithm speak to the importance of three particularly fundamental disconnects between the methodology and its application in the census context to meet users’ data needs. First, census data users need non-negative, integer numbers for official purposes, with at least some of these basic counts being held immune to any change (the “invariants” issue). Second, data users need variables that intersect person and housing characteristics, such as children in specific household types, and this is greatly complicated by the TDA’s separation of person and housing data tables at the outset (the “joins” issue). Third, data users need variables and data for a plethora of political and statistical geographies, many of which are small in population or do not “nest” (the “off-spine” issue) or both. These three issues, as we discuss below, proved especially challenging for the approach, leading to manifold changes in the DAS implementation plans such that the statement in Version 5 of the operational plan (U.S. Census Bureau, 2022b:145) that “a final determination of 2020 Census products [is] expected to be completed in early 2019” proved aspirational.

The Problem of Invariants

Differential privacy theory for disclosure avoidance is designed to deploy noise to all cell counts in a set of cross-tabulations, to protect against current and future threats to confidentiality, without having to specify attackers’ motives or estimate the plausibility of a particular attack. A requirement for some cell counts to remain “invariant,” that is, reported as their actual collected values, makes it more difficult to inject noise in ways that do not markedly alter other cell counts or, indeed, undercut the theoretical basis of the privacy guarantee. Early on, the Census Bureau decided that the state population totals for congressional reapportionment would have to be invariant because of the constitutional language about “direct enumeration” of the population.

In a September 2017 presentation to CSAC, the Census Bureau further stated that much more would be held invariant—block-level population counts (not just state-level), voting age population counts, and occupied and vacant housing unit counts. These items were to be held invariant because they were also held invariant under confidentiality-protection methods used in the 2000 and 2010 Censuses, and because of a commitment made by the Census

Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×

Bureau to the U.S. Department of Justice for the 2000 Census, when statistical adjustment of the census counts for coverage error had been under consideration (Garfinkel, 2017). In a December 2018 presentation to CSAC, however, the Census Bureau proposed eliminating some of the invariants because of the difficulties they posed for privacy protection and implementation of noise infusion and the TDA (Abowd, 2018). The presentation previewed that the only invariants for the 2020 Redistricting File would be state-level population counts and block-level counts of housing units and the number and type of group quarters. The presentation acknowledged that the noise-infused population counts for blocks and other substate areas would need to add up to the invariant state totals. The reason for invariant total housing units and number and type of group quarters for blocks was because such counts are regularly available to localities working with the Census Bureau to update the MAF. Having the number and type of group quarters invariant would also ensure that a correctional facility, for example, would not appear in a block that had no group quarters or had a nursing home instead. Though the reduction in invariants was foreshadowed by Abowd (2018), the decision on invariants for the first set of 2020 Census data products was not finalized for another two years—in November 2020—and continued to generate user concerns and proposals for alternative solutions (Federal State Cooperative on Population Estimates Steering Committee, 2020).20

The actual extent to which the population and housing invariants undercut the privacy guarantee for the 2020 Redistricting and DHC Files is unclear. From the viewpoint of localities, the lack of substate invariant population totals

___________________

20 The invariants were announced in a November 25, 2020, press release, reflecting a decision by the Census Bureau’s Data Stewardship Executive Policy Committee the day before; see https://www.census.gov/programs-surveys/decennial-census/decade/2020/planningmanagement/process/disclosure-avoidance/2020-das-updates/2020-11-25.html.

Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×

contradicted the idea of a census and raised concerns about the quality of the published data.21

The Problem of “Joins”

The census is a count of individuals within housing units, families, or living situations, generated by contacting a master file of addresses, identifying one person in the living unit as the individual (Person 1) around whom everyone else (Person 2, Person 3, etc.) is arrayed. The “household head,” or “householder,” is responsible for reporting on that household. In turn, data users have many uses for variables that link or join persons to their households, such as the number of persons per household (occupied housing unit).

Traditional census disclosure-control methods had no problem dealing with the hierarchical nature of the collected data. So, for example, in 2010, swapping was done by moving an entire household to a different geography, not by plucking out one person in a household and swapping them with someone in a different household. In contrast, the TDA used to produce the 2020 Redistricting and DHC Files infused (discrete) Gaussian noise into the person and housing data separately (the Gaussian distribution replacing the discrete Laplace noise infusion originally used—see Table F.2 and Appendix F.1). The TDA post-processing then continued separately for person and housing data, ultimately deriving synthetic microdata for persons and housing units with no direct linkage between the separate files.22 Hence, the TDA had the desirable property of producing non-negative, integer numbers for each table cell that added up geographically (i.e., block populations added to census tract populations to county populations to state populations) but it could not make the person and housing tables consistent. Consequently, calculating persons per occupied household from the Redistricting File often gave anomalous answers (e.g., values of < 1 person per household), while some blocks had people but only vacant housing or only occupied housing but no people.

A different kind of differential privacy-based algorithm was needed to handle variables that required “joining” person and housing characteristics. The Census Bureau is developing the PHSafe algorithm for the join tables to be included in the Supplemental DHC File, but it did not have such an algorithm available at the time of producing the Redistricting File or the DHC File.23 Yet person-household joins are of vital interest to state and local governments and other

___________________

21 Personal communications from members of the Federal-State Cooperative Program for Population Estimates to J. Salvo.

22 The reason for processing the person and housing tables separately is said to do with a technical issue—namely, that the sensitivity parameter in the noise infusion model could not be specified for households, due to their varying size. See Abowd et al. (2022:§ 5.3).

23 As noted in an earlier footnote, the Census Bureau included some person-household variables in the DHC File through adding recodes to records for the household head on the CEF prior to creating tables for input to the noise infusion step.

Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×

data users for projecting, locating, and evaluating services and programs; for validating their own intercensal estimates of population and housing; and for analyzing demographic and social trends.24

The Problem of Small Population, Off-Spine Geography

The United States has a large and bewildering variety of political and statistical units of geography, with considerable variation across states. Political geographies that are functioning governments range from villages and American Indian reservations with fewer than 100 people to cities, counties, and states with millions of people. Statistical geographies, developed by the Census Bureau with input from users, also range in population size—from blocks with fewer than 10 people to large metropolitan areas. Each functioning government has a need for census data, particularly if the political unit is too small in population to benefit from the American Community Survey (ACS). Figure 11.1 demonstrates the small size of three important political geographies: incorporated places (19,483 in every state but Hawaii); minor civil divisions (MCDs—17,870 in 21 states); and American Indian reservations and trust lands (337 in 40 states). Statistical geographies such as census designated places (12,134 in 50 states, with a median population of about 700 people) and census tracts (83,848 populated tracts in 50 states and the District of Columbia, with a median population of about 3,775 people) are also important for many data users.25

The small population size of many geographies makes it challenging to inject noise while providing acceptable accuracy; the task is harder yet because many geographies are not nested in a neat hierarchy. While blocks nest into block groups, census tracts, counties, and states, other types of geographies are “off-spine” (see Figure 11.2).26 A handful of American Indian reservations cross state boundaries, and almost 40% cross county boundaries. While MCDs nest within counties, 7% of incorporated places cross county boundaries. Moreover, 40% of incorporated places cross census tract boundaries, as do many MCDs (see Census Geographies Project, 2022:32).

___________________

24 Similarly, researchers frequently use public use microdata samples from the census (and surveys) to construct dependent and independent variables to analyze household and family characteristics—for example, differences in home ownership for households of various sizes as related to the race-ethnicity of the household. The separate person and housing privacy-protected microdata files for 2020 (see Table F.1) preclude the ability to carry out such studies unless the household variables of interest to the researcher are already encoded in the records of household heads.

25 Data from the 2020 P.L. 94-171 Redistricting File; see the state tables at the Census Geographies Project, https://mdi.georgetown.edu/census-geographies-project/.

26 The use of the term “spine” instead of “nested” can imply that “off-spine” geographies are less worthy of accurate data, which is not the case given that many off-spine geographies are functioning governments and locally recognized communities.

Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Image
Figure 11.1Population at the 10th–90th percentiles, minor civil divisions, incorporated places, and American Indian reservations, 2020 Census.

NOTES: MCDs, minor civil divisions. Population numbers not shown for American Indian reservations and off-reservation trust lands because they are too similar to MCDs to be legible.

SOURCE: Census Geographies Project (2022), Figure 3-4, middle panel, using data from 2020 Census Redistricting File.

Unlike the problems of invariants and joins, the Census Bureau did not appear to fully recognize the off-spine problem until December 2019, when users presented comparisons of 2010 SF1 tables as originally released with TDA-protected 2010 SF1 tables at a Committee on National Statistics workshop (National Academies of Sciences, Engineering, and Medicine, 2020). These comparisons showed much larger differences for geographies that did not nest compared with those that did. The Census Bureau then worked to “optimize” the spine and in other ways ensure greater accuracy for off-spine geographies.

11.3.3 Implications for Development of the 2020 Disclosure Avoidance System

By late 2018, as Version 4 of the 2020 operational plan announced that the 2020 Census DAS would use differential privacy, the Census Bureau was implementing its TDA for redistricting data from the June 2018 End-to-End Census Test. The Census Bureau and external critics also began assessing the planned methods on the 1940 Census complete count file, which was available through IPUMS at the University of Minnesota.

In the first half of 2019, calls grew for more formal testing of the new DAS before final implementation (see, e.g., Ruggles et al., 2019). Discussions

Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Image
Figure 11.2Hierarchy of census geographic entities.

NOTES: Entities in the dashed rectangle are the “spine” or “on-spine” geographic levels, in the parlance that has developed around the Census Bureau’s TopDown Algorithm; those not in the rectangle are “off-spine.” Counts associated with geographic levels are tallies from the 2020 Census Redistricting File, as derived by Census Geographies Project (2022).

SOURCES: Adapted from “Standard Hierarchy of Census Geographic Entities” published by U.S. Census Bureau at https://www2.census.gov/geo/pdfs/reference/geodiagram.pdf?#.

Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×

at a CNSTAT workshop held in June 201927 supported the need for more testing with the opportunity for user feedback (see Hotz and Salvo, 2022, for a chronicle of events during this period). Beginning in fall 2019, the Census Bureau responded by releasing “demonstration products” of noise-infused data from the 2010 Census which could be compared with the official published 2010 releases.28 Each release was followed by an opportunity for user feedback.29 The Census Bureau also offered webinars and developed materials to inform users about the new DAS (see Section 11.4 for more about the Census Bureau’s communication efforts and our assessment of those efforts in relation to user needs). Key decisions resulting from the demonstration file releases and associated user feedback (including feedback from subject matter specialists at the Census Bureau) included:

  • To continue and refine the TDA for the Redistricting and DHC Files, which meant continued separate processing of person and housing tables;30
  • To prioritize development of the DAS for the Redistricting File, given the more pressing statutory deadlines for apportionment and redistricting data (and the lack of motion in being granted an extension on those deliverables), and to slow active DAS development for the DHC File;
  • To adopt discrete Gaussian rather than discrete Laplace noise infusion to reduce the prevalence of outliers with large differences between the demonstration products and the originally published statistics;
  • To tune the noise infusion in the Redistricting File so that the largest demographic group in any area of more than 500 people would differ no more than ±5 percentage points from the collected value 95 percent of the time (responding to Voting Rights Act requirements on majority-minority districts);31
  • To optimize the geographic spine, including the creation of a separate spine for American Indian or Alaska Native (AIAN) areas by state, to improve accuracy for the many off-spine geographies;
  • To develop separate algorithms (SafeTab-P and SafeTab-H) for the DDHC-A and B files that could produce better-quality data than the TDA

___________________

27 See https://www.nationalacademies.org/event/06-06-2019/challenges-and-new-approachesfor-protecting-privacy-in-federal-statistical-programs-a-workshop.

28 The Census Bureau also eased user concerns that differential privacy-based algorithms would soon be applied to ACS data products by announcing in summer 2019 that no such change would be made before 2025 at the earliest (Jarmin, 2019b).

29 See Table F.2 in Appendix F.2 for a list of releases beginning with the 2018 End-to-End Redistricting File and related events for the DAS development from 2018–2023.

30 The Census Bureau experimented with a “bottom up” approach with the 1940 Census data, in which noise was injected in all of the smallest geographic areas (enumeration districts) in one operation but quickly ruled out this approach because the noise-infused data would become less rather than more accurate when aggregated for larger geographic units. See Abowd (2019a).

31 See Wright and Irimata (2021) for an empirical analysis.

Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×

Table 11.2 Features of the TopDown Algorithm Compared with the SafeTab-P, SafeTab-H, and PHSafe Algorithms

TopDown Algorithm—2020 Redistricting, DHC, Demographic Profile Files

SafeTab-P, SafeTab-H, PHSafe Algorithms—2020 DDHC-A, B; Supplemental-DHC Files

Produces privacy-protected microdata

Produces privacy-protected tabulations directly

All geographies add up as expected

No requirements for geographies to add up

When aggregating data, noise generally cancels out and data become more accurate

Aggregating data increases noise the more one aggregates

Consistency across data products

Not consistent with Redistricting or DHC Files

Overall accuracy can be targeted but exact accuracy levels cannot be known in advance

All margins of error determined in advance and met 95% of the time

Same amount of data provided for each type of geography

Detail of data provided depends on population/household size of geographic area or race/ethnicity group (Census Bureau terms this “adaptive design” but it is similar to cell suppression used in censuses prior to 1990)

NOTES: DHC, Demographic and Housing Characteristics (File); DDHC, Detailed DHC (Files).

SOURCE: Adapted from Devine et al. (2023:Slide 21).

  • for the large number of race, ethnicity, and AIAN tribes and villages, although at the cost of inconsistency with the Redistricting and DHC Files (see Table 11.2 for a summary of the differences between the TDA and the new algorithms);
  • To develop yet another algorithm (PHSafe) for the Supplemental-DHC File that could handle complex household-person joins; and
  • To successively increase the value of the privacy-loss budget (∊) for tables in the Redistricting File from 4 (persons) and 2 (housing units) in the 2010 Demonstration File, October 2019 version, to 17 (persons) and 2.5 (housing units) in the final file (released August 2021)—these increases responded to user feedback after each demonstration product release that identified large differences for key variables compared with the original 2010 data.32 The Census Bureau increased the value of ∊ for

___________________

32 The Census Bureau has never set a global value for ∊ for the suite of 2020 Census data products, although differential privacy theory calls for setting such a budget and then allocating the budget among data products and then among variables and geographies in each product. As noted in Section 11.2, epsilon (∊) is used to express the privacy-loss budget although rho (ρ) is the parameter in the discrete Gaussian implementation (see Appendix F.1) because users became accustomed to ∊ values.

Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
  • the combination of person and housing unit tables in the Redistricting and DHC Files from 40 (2010 DHC Demonstration File, March 2022 version) to 53 in the final DHC File (released May 2023).

User comments on successive demonstration products commended the Census Bureau for improved accuracy over earlier products but continued to raise concerns about remaining anomalies, particularly for small off-spine geographies and small race and ethnic groups (see Table F.2). The Federal State Cooperative on Population Estimates Steering Committee (2020) asked that “illogical and implausible values” evident in the demonstration products be addressed before final parameters were set. “At the root of these issues,” they wrote, “is the separation of people and housing in the DAS processing.” They suggested adding invariants, which would have put more constraints on the already-stressed DAS implementation and were not adopted.33

In August 2021, when the official 2020 Redistricting File was released, the “illogical” counts remained, and the larger public learned of the impact of the DAS on small-area counts. The Census Bureau recommended aggregating blocks to address the issue but did not otherwise provide detailed guidance to users. At least two private-sector data-dissemination firms are offering the Redistricting File to users with anomalous data smoothed or otherwise altered (Cassal, 2022; Hodges and Cortes, 2022).

11.4 ASSESSMENT

We assess the development and implementation of the 2020 DAS in three ways. First, we refer to statements of principles, practices, and fundamental responsibilities for federal statistical agencies; second, we contrast the DAS development process with other census innovations; and third, we briefly look at how other countries are responding to contemporary threats to confidentiality protection. We also note the effects of the 2020 DAS on other Census Bureau programs, including DA and population estimates, and on the Census Bureau’s ability to produce informative substate operational quality metrics (e.g., self-return rates) for public use.

We do not pass judgment on the appropriateness of differential privacy-based algorithms in the toolkit of statistical confidentiality protection. We are aware of effective applications and are confident there will be more such as the theory and available suite of algorithms evolve. The issue before us is the application of differential privacy-based algorithms for 2020 Census products.

___________________

33 For example, the committee recommended: “In addition to adding occupied housing units to the list of invariants please: • Limit the distribution of household population to blocks with occupied housing units [and] • Force each occupied housing unit to have at least one person associated with it” (Federal State Cooperative on Population Estimates Steering Committee, 2020).

Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×

11.4.1 Fundamental Responsibilities of Statistical Agencies

Box 11.5 reproduces relevant text from three well-known and highly regarded statements of statistical agency responsibilities. All three statements put the production of relevant, timely, accurate, and objective statistics for the public good as the first responsibility of statistical agencies. Each list also includes protection of confidentiality of respondents’ information but as an instrument to enable the agency to provide high-quality data by gaining respondents’ cooperation.

Principles and Practices for a Federal Statistical Agency (National Academies of Sciences, Engineering, and Medicine, 2021:156) also introduces the useful concept of data stewardship in its description of the operational principles and best practices established in the U.S. Office of Management and Budget’s Federal Data Strategy.34 It states that the Federal Data Strategy practices, grouped under the rubrics of “Building a Culture that Values Data and Promotes Public Use,” “Governing, Managing, and Protecting Data,” and “Promoting Efficient and Appropriate Data Use,” “represent aspirational goals that are intended to improve the government’s approach to data stewardship and the leveraging of data to create value.”

The Census Bureau’s original decision to develop the 2020 DAS using formal privacy methods did not contradict the various lists of principles (although it contradicted the tenets of census planning—see Section 11.4.2). Its decision to continue with that approach, however, when it became clear that it would be difficult, if not impossible, to produce timely, accurate, relevant data for users, arguably did. The continuation decision prioritized confidentiality protection above data utility broadly defined.

The situation in which no product beyond the 2020 Redistricting File had been released until just over three years after Census Day is a marked departure in fulfilling reasonable (based on previous censuses) expectations of data users for timely data release. That the scheduled release dates for the DDHC-B File (counts of household types and tenure by detailed race, ethnicity, and AIAN tribal and village groups) and the Supplemental DHC (S-DHC) File (household-person join tables) have been pushed back to September 2024 means that data release will not be complete until almost the end of year four after Census Day.

With regard to accuracy, users have documented problems with the Redistricting and DHC File demonstration products that remain despite the Census Bureau’s best efforts to fine tune the TDA and the decision to increase the privacy-loss budget to levels that were previously believed to be disclosive. With regard to relevance, the paucity of person-household join tables relative to 2010 and earlier censuses and their limited availability for geographic areas

___________________

34 See Memorandum M-19-18, Federal Data Strategy—A Framework for Consistency, at https://www.whitehouse.gov/wp-content/uploads/2019/06/M-19-18.pdf.

Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×

is problematic for many census data users, who find such tables essential for government services planning and other purposes.

A benefit of formally private approaches is their transparency in many respects. For approaches like data swapping, the percentage of swapped households in an area and other parameters are kept confidential. By comparison, the Census Bureau has released the code and values of key parameters of the 2020 DAS (e.g., the allocation of privacy-loss budget ∊ to tables at various geographic levels). The Census Bureau has also released what are called noisy measurement files (NMFs) for the 2010 demonstration products and the 2020 production Redistricting and DHC data. These files—the result of noise infusion into every table cell—enable analysts to get some sense of the effects of the discrete Gaussian noise infusion per se versus the postprocessing steps in the TDA to eliminate negative values and enforce consistency of the data for smaller geographic areas with larger geographic areas.35

___________________

35 Kenny et al. (2023:1) analyzed the 2010 NMF and demonstration files, finding “that the NMF contains too much noise to be directly useful alone, especially for Hispanic and multiracial populations. TopDown’s post-processing dramatically reduces the NMF noise and produces similarly accurate data to swapping in terms of bias and noise. These patterns hold across census geographies with varying population sizes and racial diversity. While the estimated errors for both TopDown and swapping are generally no larger than other sources of Census error, they can be relatively substantial for geographies with small total populations.” Hopefully, others will replicate

Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×

This technical information about the 2020 DAS is invaluable for advancing the field of confidentiality protection using formally private methods. But there remains a lack of the promised transparency in important senses. The code and technical parameters do not help the vast majority of data users interpret and work with the 2020 data products, nor do they explain the effects of invariants and other deviations from the pure application of differential privacy on the actual confidentiality protection obtained by the 2020 system. There are also lingering questions in the absence of full methodological documentation for a key input to the decision to adopt and stick with the new DAS approach—namely, the 2016–2018 reconstruction and reidentification experiment.

Beginning in late 2019, the Census Bureau came to realize that users needed not only more explanatory information, but also materials that would help them assess the 2010 demonstration files to provide informed feedback and get ready to work with the 2020 data products. Box 11.6 illustrates efforts by the Census Bureau to fill these information gaps, citing examples of webinars, blogs, demonstration products, summary metrics, published papers, handbooks, and issue briefs.

Despite the Census Bureau’s efforts, the materials available to date do not address users’ needs for (a) concrete guidance and the equivalent of confidence intervals for working with the data; and (b) an understandable metric for estimating the increased disclosure risk with the higher values of ∊ adopted for the production 2020 Redistricting and DHC Files, which would inform users’ views on the confidentiality-utility tradeoff. The materials are either too complex for most users to analyze (e.g., the NMFs) or they are too general. For example, the practical advice in the primer released with the 2020 Redistricting File (U.S. Census Bureau, 2021e) boils down to two suggestions: (1) aggregate blocks; and (2) compare the most recent 2010 Redistricting File demonstration product with the originally released 2010 file as a guide to the likely accuracy of the 2020 file. Similarly, users do not know what to make of the allocation of privacy-loss budget ∊ to various topics and geographies except in the most general sense.

The Census Bureau recently announced an initiative to expand its guidance on using the 2020 Census differential privacy-protected data products, to provide measures of total error introduced by the 2020 DAS and mechanisms for generating confidence intervals, and to develop other aids for users (Devine et al., 2023:Slides 16–18). As noted in Box 11.6, several short documents, written at a general level, were released March 27, 2023, describing why the Census Bureau chose differential privacy, the TDA, and disclosure avoidance for the Redistricting File.36 For estimating error from the privacy-protected

___________________

and expand on this analysis to produce a better understanding of the effects of formally private confidentiality-protection techniques on census data.

36 See https://www.census.gov/programs-surveys/decennial-census/decade/decennialpublications/2020/census-briefs.html under “2023.”

Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×

2020 data products, the Census Bureau is investigating an “approximate Monte Carlo” or “parametric bootstrap” approach. This approach is computer-resource-intensive but, if the Census Bureau can work through the technical and computing issues, it could well provide useful metrics to assist users in understanding the error in 2020 data products.37

Vink (2023), New York’s state representative to the Federal-State Cooperative for Population Estimates, transformed summary metrics provided by the Census Bureau for the Redistricting and DHC Demonstration Files into fitness-for-use scores (e.g., mean absolute percentage error or MAPE < 10% = 1, MAPE > 10% = 0). The individual scores can be combined into an overall fitness-for-use score for a variable. He illustrated these scores using age in the August 2022 and October 2019 DHC 2010 Demonstration Files, concluding that the August 2022 file was substantially improved over the original October 2019 file, but that age distributions for small-population geographic units were still problematic. His approach may be useful for other variables while waiting for the Census Bureau to provide tools for estimating error bounds.

11.4.2 Census Planning, Testing, and Innovation

The decennial census is the U.S. government’s largest peacetime operation and has tight deadlines, mandated in law, for the delivery of reapportionment and redistricting data, in addition to expectations for timely release of additional data generated from enumerating a large and highly diverse population. The planning process (see discussion of the 2020 operational plan series in Section 11.3) is lengthy, detailed, and meticulous, and involves extensive testing and research, which typically spans multiple censuses. Sometimes, the process has taken longer than need be,38 but, generally, robust experimentation and testing, including under the conditions of an actual census, are very much in order for major census innovations. Examples include the Mailout/Mailback census introduced for two-thirds of housing units in the 1970 Census, after several decades of smaller-scale testing (National Research Council, 2010) and the multidecade development of the modern Post-Enumeration Survey (PES) methodology for coverage measurement (National Research Council, 2009). In contrast, the decision to adopt a markedly different and much more complex method for confidentiality protection was made at virtually the eleventh hour of census planning, without prior testing in the census context or buy-in from the user community.

Problems in the rollout of new methods and technologies are not unexpected. Virtually every census has encountered the unexpected from external

___________________

37 See U.S. Census Bureau (2023c:Slides 71–102); see also Neunhoeffer et al. (2023).

38 For example, the Census Bureau ran out of time to develop the logistics for a second questionnaire mailing in 2000 by conducting its own tests on response-rate effects despite decades of positive survey research findings (National Research Council, 2010).

Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×

forces (e.g., COVID-19 in 2020 and the failure of a key contract in 2010). Time and again, the Census Bureau has shown outstanding ability to modify affected operations in real time to cope with such exigencies. The decisions to adopt the new DAS very late in planning for 2020 and not to reconsider in the face of serious implementation challenges, however, were the Census Bureau’s own doing. Moreover, the Census Bureau did not have a backup plan were the new DAS to prove more difficult and time consuming to implement than expected.

The Census Bureau consults often with users and stakeholders about census planning through advisory committees, congressional hearings, presentations at professional conferences, interactions with National Academies’ study panels, U.S. Government Accountability Office staff, and others. For decisions that affect data products in major ways, however, such engagement is rarely enough. Extensive and repeated user interactions are essential to identify key user needs and obtain buy-in when painful trade-offs are needed between, in this case, confidentiality protection versus data availability and utility.

Although many Census Bureau staff interact frequently and effectively with data users, the organization apparently did not realize the incompatibilities between differential privacy-based methods, as developed prior to their adoption for the 2020 Census, and user needs for official (invariant) counts, household-person join variables, and accuracy for many small, off-spine political geographies. The Census Bureau made every effort to accommodate those needs, but its solutions are not yet viewed as satisfactory by users, and the costs in delays and undermining of user trust have been great.39

The comment periods following the release of each demonstration product update (see Table F.2) gave users the opportunity to provide feedback that the Census Bureau used to improve the next update. Many users with limited resources, however, found it difficult to meet the short, frequent deadlines, and many others were not aware of the commenting opportunities (O’Hara, 2022).

In addition to using differential privacy-based methods, the Census Bureau cut back on the content of many data products, either by eliminating tables or limiting them to larger geographic areas than in previous years. Such actions are a reasonable way of enhancing confidentiality protection, when used proportionately in concert with other masking or noise infusing methods and when done in consultation with users. The Census Bureau, however, made most of these decisions on its own. For example, the very first “crosswalk” the Census Bureau issued in fall 2019 proposed decreasing the geographic detail in the DHC File for a number of tables compared with their counterparts in 2010. The final DHC File restored this detail for many of the affected tables.40 The Census

___________________

39 See Garner (2023). Abowd and Hawes (2023:23–24) acknowledge underestimating the challenges of effective user communication.

40 This information had been maintained in a “Change Log” tab in the Data Table Guide at https://www2.census.gov/programs-surveys/decennial/2020/program-management/data-table-guide-dhc-dp.xlsx, though it seems to have been dropped from recent versions.

Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×

Bureau’s recent decision to drastically reduce geographic detail (eliminating data for all substate geographies) for the already-trimmed set of household-person join tables in Supplemental-DHC File (U.S. Census Bureau, 2023b) was not discussed in advance with data users.

The Census Bureau also could not avail itself of a potential opportunity to cut back on the race detail provided for redistricting, which could have significantly increased privacy protection for that product without impairing utility. Its late start ruled out that opportunity, given that the redistricting community of federal and state agencies, state legislatures and redistricting commissions, the political parties, and the consulting firms that draw up redistricting plans would need to be fully onboard with any potential changes.

11.4.3 Privacy Protection in Other National Statistical Offices

The United Kingdom (UK)’s Office of National Statistics recently investigated use of differential privacy-based methods on mortality data and concluded (Dove, 2021:19):

The independent noise addition method is best suited to releases with a limited set of outputs, known ahead of time. . . . The top-down method we attempted to apply suffered from significant bias issues, arising from perturbing small counts as well as perturbing zeros. Perturbing zeros increases the noise given and causes additional information loss (less utility). Assigning proportionally more epsilon to lower levels in the hierarchy also slightly reduced utility. This was possibly because as the adjustment to higher level totals were performed sequentially, higher level totals are more important. Assigning more epsilon to high levels may slightly improve results.

The UK adopted a varied suite of confidentiality-protection methods for its 2021 census,41 including swapping, cell suppression, and the cell key method developed by the Australian Bureau of Statistics.42 The UK has a threshold of at least 100 people for the smallest geographic areas (“output areas”) for which it releases census data, which would seem restrictive. As it turns out, the UK has never seen the need for smaller geographic areas—it does not have many

___________________

41 See “Protecting personal data in Census 2021 results” at https://www.ons.gov.uk/peoplepopulationandcommunity/populationandmigration/populationestimates/methodologies/protectingpersonaldataincensus2021results.

42 See Templ and Sariyar (2022:1236). The cell key method “adds noise to frequency and contingency tables using predefined look-up tables consisting of random number values, which allows producing consistent results from queries on dynamically generated tables. Before the tables are generated, a fix[ed] numerical code is assigned to every record. All codes of records falling in a cell are summed, and this sum, the cell key, is used for selecting the random number from the look-up table.” The protection is done on the source microdata so that data products can roll out without further protection needed.

Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×

small-population governmental jurisdictions nor does it have legal and court requirements for redistricting that require block-level data.43

Bach (2022) reviewed random noise approaches for statistical disclosure control for official censuses and other population statistics conducted by members of the European Union. He observed (Bach, 2022:669):

  • In most countries, censuses are very costly investments that must be justified by their unique value added for policymakers and researchers. Consequently, user concerns must be taken very seriously. . . .
  • For typical systematic attacks on static outputs . . . strictly ∊ -DP mechanisms (with unbounded noise) do not offer significant benefits over utility-driven approaches with bounded noise. . .
  • Mechanisms based on unbounded noise . . . may ruin unique utility features of censuses, such as small-area accuracy.

Bach (2022:678) noted that statistical disclosure control is the responsibility of member states—Eurostat will not receive or house confidential data. For protection of the 2021 round of censuses, the European Statistical System (see Antal et al., 2017) therefore recommended the use of bounded noise methods, particularly the cell key method, for protection of outputs with limits on the amount of noise injected. Bach (2022:671) stressed the importance that official statistical agencies “invest more in explaining clearly and exhaustively what certain noise protection setups imply for . . . products.”

11.4.4 Effects on Other Census Bureau Programs

The adoption of the 2020 DAS led to delays and changes in methodology for key Census Bureau programs, including:

  • Demographic Analysis: To date, DA estimates of net coverage error for the 2020 Census are available for single years of age and sex but not for Black and all other races. The latter are essential to extend the time series of differential net undercount estimates between race groups, which are a key high-level indicator of trends in census quality. The reason is a longer-than-anticipated development of a privacy-protected “Modified Race File,” which would permit direct comparisons of census counts and

___________________

43 For example, England has 309 local authorities (compared with over 40,000 functioning local governments in the United States), with a median population of about 143,000. The three smallest have 2,000, 8,600, and 41,000 people, respectively; the largest has 1.1 million people. See “Demography and migration data” at https://www.ons.gov.uk/peoplepopulationandcommunity/populationandmigration/populationestimates/articles/demographyandmigrationdatacontent/2022-11-02#demography-unrounded-population-estimates and the description of geographies in the UK 2021 Census at https://www.ons.gov.uk/methodology/geography/ukgeographies/censusgeographies/census2021geographies.

Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
  • DA estimates for Black people and all other races (see Chapter 10 for details).
  • Population Estimates for the Nation, States, and Counties: The population estimates program, mandated in Title 13, also requires a Modified Race File (see Chapter 10). Given the delays in obtaining such a file and to keep to its annual schedule for updates, the population estimates program adopted a “blended base.” The estimates currently start with 2020 Census counts for the nation, state, and counties protected with a small amount of noise infusion; incorporate estimates of race/ethnicity by age and sex starting with the 2010-based estimates updated through 2020 and forward; and control the estimates at the national level using 2020 DA age and sex estimates (see Hartley, 2022).

Another example of a Census Bureau program for which decisions appear to be driven, at least in part, by the nature of the differential privacy-based mechanism for the 2020 Census data is defining urban areas based on housing unit criteria rather than population criteria.44

11.4.5 Effects on Operational Quality Metrics

The Census Bureau deserves credit for releasing operational quality metrics (e.g., percentages of occupied households that self-responded or for which administrative records were used, and percentages of people missing responses for individual items) soon after completion of the census. This was not the historical practice but was prompted by user requests, given the uncertainty about census quality due to the COVID-19 pandemic and other factors that could have affected response.45

Operational metrics provide indirect indicators of differences in quality for areas for which reliable estimates of coverage errors are not available from the PES or DA. For example, self-responses are known to be more accurate in terms of coverage and content than responses obtained by proxy in Nonresponse Followup (NRFU) (see Chapters 3, 4). This knowledge can help users interpret the data and can help the Census Bureau work with communities to improve the next census.

___________________

44 See “Redefining Urban Areas following the 2020 Census,” https://www.census.gov/newsroom/blogs/random-samplings/2022/12/redefining-urban-areas-following-2020-census.html. Housing unit counts are invariant, but occupancy status and population are not.

45 See https://www2.census.gov/programs-surveys/decennial/2020/data/operational-quality-metrics/operational-quality-metrics-technical-documentation.pdf. Since 2000, the Census Bureau has released unperturbed, real-time response rates for states, counties, and census tracts as a spur to respond. These metrics, however, are limited for quality assessment because the denominator is all addresses in the self-response universe (including addresses that turn out to be vacant or nonresidential/nonexistent).

Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×

The Census Bureau issued four releases of 2020 Census Operational Quality Metrics in 2021–2022, providing comparable metrics (to the extent possible) for the 2010 Census.46 Release 1 in April 2021, Release 2 in May 2021, and Table 2 of Release 3 in August 2021 covered data-collection modes for addresses, occupied versus vacant housing units by census operations, and item nonresponse rates; these data were protected by rounding or noise infusion. However, states are too internally heterogeneous for operational metrics to be of much utility in assessing quality. Table 1 of Release 3, also in August 2021, provided summary statistics (means, medians, standard deviations) for counties and census tracts within states for selected metrics (e.g., percent addresses responding by internet or proxy), and these data were also protected by rounding. Finally, in October 2022, the Census Bureau’s Release 4 included 6–8 operational metrics for individual counties and tracts, using differential privacy with ∊ = 3 for disclosure avoidance. Unfortunately, these metrics, particularly for tracts, are too noisy for use, as can be seen by adding up the percentages of NRFU enumerations by a household member, a proxy, or administrative records within a tract. Representing the three principal response modes in NRFU, the three percentages should roughly add to 100% but vary widely around that mark.47

On metrics, our interim report (National Academies of Sciences, Engineering, and Medicine, 2022) stated:

[Interim Report] Conclusion 4.6: It will not be possible for this panel (or any other evaluator) to understand and characterize the quality of the 2020 Census unless the Census Bureau is forthcoming with informative data quality metrics, including new measures based on operational/process paradata, at substate levels and small-domain spatiotemporal resolution, unperturbed by noise infusion. . . .
[Interim Report] Recommendation 4.1: The Census Bureau should work on ways to make 2020 Census data quality metrics publicly available at small-domain spatiotemporal resolutions, unperturbed by disclosure avoidance, to bolster confidence in the published tabulations. The Census Bureau should also develop ways to enable qualified researchers to access a full range of data quality metrics and report their findings.

___________________

46 All of the Operational Quality Metric releases in public form are available under “Evaluating Quality” at https://www.census.gov/programs-surveys/decennial-census/decade/2020/planningmanagement/process/data-quality.html.

47 C. Dick, Demographic Analytics Advisors (2023, January), Operational Metrics—A Focus on Release 4 and NRFU. PowerPoint presentation to the Census Quality Reinforcement Task Force. The Census Bureau also has not yet included small-area “low response scores” or other operational metrics from the 2020 Census—including Self-Response return rates (self-responses from occupied housing units)—in its Planning Database, presumably because it is working out the privacy-protection issues. This file provides invaluable information for local groups striving to encourage response to the census and other purposes. See “Planning Database” at https://www.census.gov/topics/research/guidance/planning-databases.html.

Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×

The Census Bureau justified a low ∊ for operational quality metrics because of a JASON recommendation that the Census Bureau not spend any of its privacy-loss budget on substate metrics.48 Yet the Census Bureau has never provided a convincing argument as to how disclosure of such metrics for small areas increases identification risk for individuals in the census.49 While theoretically there could be a risk, it is difficult to imagine a plausible attack scenario. Importantly, not publishing such metrics closes off a way for data users to determine the confidence to place in data for specific areas and hampers research into improvements for 2030.50

11.5 CONCLUSIONS

Conclusion 11.1: In an era of Big Data, linkage technology, and other aspects of today’s computational and data environment, it is difficult for the decennial census to balance the need for accurate data for small areas and small population groups with an adequate level of confidentiality protection. When evaluating methods and levels of confidentiality protection, stewardship of census data for the public good requires balancing societal needs for the information—particularly for equity in representation, fund allocation, and many other uses—with plausible harms of breaches of confidentiality.

Conclusion 11.2: The decision of the U.S. Census Bureau to respond to the threats to confidentiality protection at a very late date in 2020 Census planning with a new, more complex Disclosure Avoidance System (DAS) using differential privacy-based algorithms went counter to long-standing principles of decennial census planning. The approach had not been tested in a census environment nor had the ability of the algorithms to handle critical user data needs been assessed; the Census Bureau had no backup plan should implementation of the new DAS prove challenging. The decision to continue to deploy the new DAS in the face of serious implementation problems has resulted in marked delays in delivery of data products, with some variables and

___________________

48 JASON (2021:8). JASON is a group of scientists who conduct analyses for the U.S. Department of Defense and other agencies; their activities are managed through the MITRE Corporation.

49 The 2010 Census provided item imputation rates for blocks in SF1 and census tracts in SF2—see Table F.1 in Appendix F.2.

50 The CSAC recommended, May 25, 2021, “that the Census Bureau not apply DP [differential privacy] to substate quality metrics that are being released to the public. . . . Applying DP to the quality metrics could make them largely irrelevant, and will take part of the PLB [privacy-loss budget] away from important future data products. If the Bureau concludes that the quality metrics cannot be released without applying DP, CSAC requests that the Bureau justify their decision by explaining how these metrics could be used in a reconstruction scenario.” See https://www2.census.gov/about/partners/cac/sac/meetings/2021-05/2021-05-25-census-response.pdf.

Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×

types of geographic units of questionable utility, and other variables and geographies not provided at all. In addition, it is not clear that the chosen privacy budgets for the various 2020 Census data products, with high values of theparameter that trade off accuracy with confidentiality protection, provide much actual protection.

Conclusion 11.3: Sustained, interactive communication with data users in advance of major changes to data products from a census or long-running survey is critical for a statistical agency to maintain credibility with and serve its user community. The late decision to adopt new confidentiality-protection techniques for the 2020 Census led to untimely and inadequate user communication on the part of the U.S. Census Bureau. The comment periods after the release of each demonstration file for the Redistricting and Demographic and Housing Characteristics Files were valuable, but many users lacked the staff and resources to respond adequately.

Conclusion 11.4: By not releasing unperturbed operational quality metrics for census tracts, the U.S. Census Bureau missed an opportunity to inform the public of quality differences in the 2020 Census among substate areas and to gain the assistance of interested stakeholders in planning for 2030. Other than the theoretical concept that the release of any piece of information about an individual increases the risk of disclosure, the Census Bureau has not justified its decision to release census tract metrics that are too noisy for meaningful analysis.

Conclusion 11.5: For a major change in data products such as the new confidentiality-protection techniques for the 2020 Census, it is essential for a statistical agency to document fully and publicly the need for the change and the implications for the products. In the case of the 2020 Census, there has not yet been peer review of the crucial 2016–2018 reconstruction-reidentification study that was the public basis for the decision to adopt a formal privacy approach for protecting confidentiality at a late date in 2020 Census planning. Similarly, there has not been an actual demonstration of why more-than-minimal confidentiality protection is needed for substate operational quality metrics (e.g., self-return rates for census tracts and counties) or why it is not possible to provide as-collected population counts to local governments with a reasonable level of confidentiality protection for other data.

Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×

11.6 LOOKING AHEAD TO 2030

The use of differential privacy-based algorithms for the 2020 Census data products is a fait accompli. That said, it would be imprudent for the Census Bureau to assume that such algorithms will be appropriate or sufficiently well developed for 2030 data products, as seems to be the assumption in the Census Bureau’s fiscal year 2024 budget submission.51 The differential privacy framework is predicated on protecting against any attack no matter how unlikely the scenario, injecting noise into every cell and every item in datasets save for those quantities that are held invariant.52 “Tuning” can increase data utility for some use cases at the cost of decreasing utility for others. These features make the approach useful for one-off applications, such as protecting the output from a researcher’s use of confidential data in a Federal Statistical Research Data Center, but challenging when it comes to providing widespread, equitable data access to multipurpose, multidimensional data sets such as the decennial census or the ACS. Information providers, such as Google and Microsoft, have used differential privacy-based algorithms for confidentiality protection, but their data products do not need to inform redistricting, fund allocation, or program planning for small geographies and population groups and do not need to deal with an essentially unbounded set of queries.

In December 2022, the Census Bureau issued the following statement about confidentiality protection for the ACS, acknowledging that the differential privacy approach is a work in progress for censuses and surveys intended to serve a wide public (Daily, 2022):

Our current assessment is that the science does not yet exist to comprehensively implement a formally private solution for the ACS. We expect a multiyear development period, including data user review and feedback, that will extend beyond 2025. . . . It’s also not clear that differential privacy would ultimately be the best option. Other formally private disclosure avoidance approaches may end up being a better fit for the ACS.

11.6.1 Framework for Balancing Utility and Confidentiality

Given the reality of contemporary threats to confidentiality protection alongside the imperative for statistical agencies to serve the public, the best way forward remains in question. The Foundations for Evidence-Based Policymaking Act of 2018 and the Year 2 Report of the Advisory Committee on Data for Evidence Building (ACDEB, mandated by the Foundations Act)

___________________

51 See https://www.commerce.gov/sites/default/files/2023-03/Census-FY2024-Congressional-Budget-Submission.pdf, p. 157.

52 This text was revised after the prepublication release in order to clarify previous wording on the applicability of noise infusion, regardless of whether particular variables might popularly be deemed noninvasive.

Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×

include useful language and frameworks for making nuanced decisions about disclosure risk and data-utility tradeoffs (see Boxes 11.7 and 11.8). The language in the Foundations Act refers to risk assessments and reasonableness of reidentification, and the ACDEB Year 2 Report has a list of guiding principles that bring a framework of reasonableness to the confidentiality-utility debate. Hotz et al. (2022) call for cost-benefit analysis in decisions about an appropriate DAS for a census or survey, with explicit consideration of the loss to society from data that are unusable or only marginally useful due to noise injection.

The Education Sciences Reform Act of 2002 (see Box 11.9) provides language that speaks to shared responsibility of statistical agencies and users to protect privacy by penalizing any person for disclosing confidential information, which would include staff of other government agencies, the private sector, and indeed any user, as well as statistical agency staff. The National Center for Education Statistics cites this language on its website for specified datasets. It could be possible to expand the applicability of the Education Sciences Reform Act to all federal statistical data as an amendment, for example, to the Foundations Act.

Laws of member states that implement the European Union’s General Data Protection Regulation of 2018 also provide relevant language. For example, Section 171 of the UK Data Protection Act of 2018, “Re-identification of de-identified personal data,” states “(1) It is an offence for a person knowingly or recklessly to re-identify information that is de-identified personal data

Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×

without the consent of the controller [e.g., statistical agency] responsible for de-identifying the personal data.”53

We do not underestimate the complexity of risk-utility analysis or claim the ability of legislation to deter any and all attacks on the confidentiality of federal statistics. The alternative, however, of attempting to use a single technical approach to protect an agency’s data products, chosen for its purported ability to guard against every kind of attack, no matter how implausible, is unlikely to enable a statistical agency to fulfill its mission to provide useful information.

11.6.2 Data Product Plan for 2030

Even as 2020 data products are being developed, it is not too early to begin consultations with users on their data needs for the 2030 Census and to stand up research projects to plan for a timely rollout of 2030 data products. A post hoc analysis could help in developing the 2030 data-products strategy. The Federal State Cooperative on Population Estimates Steering Committee (2022) urged the Census Bureau:

To commission an independent assessment to evaluate: the Census Bureau’s decision processes leading up to the adoption of DP methods for the 2020 Census; its interactions (or lack thereof) with users; the soundness of its reconstruction/reidentification studies, which formed the basis for the move to DP methods and reduction of content; and evidence on the timeliness, accuracy, and relevance (or lack thereof) of the privatized data.

Census Bureau director Robert Santos’ reply to the FSCPE letter said the Census Bureau was “seriously considering this recommendation.”

A useful addition to the evaluation outlined above would be an in-depth assessment of which tables and geographies omitted from the 2020 Census products users would argue to reinstate, and which turned out to be inessential. A discussion with the redistricting community about the race/ethnicity detail on the 2030 Redistricting File is critical, in light of the proposal by the U.S. Office of Management and Budget to adopt a combined race/ethnicity question with Middle Eastern and North African and Hispanic categories (see Chapter 10). A reduction in race/ethnicity detail on the 2030 Redistricting File would pay dividends in confidentiality protection.

Two areas of research for the Census Bureau to pursue vigorously leading up to 2030 are candidate disclosure-protection methods, including differential privacy-based algorithms, and risk-utility, cost-benefit analysis for determining confidentiality-utility tradeoffs. For this work, the Census Bureau could consider issuing one or more challenges to enlist the academic and private-sector research communities, in addition to contracting for specific projects. The Census Bureau could partner on challenges with the National Institute of

___________________

53 See https://www.legislation.gov.uk/ukpga/2018/12/section/171/enacted.

Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×

Standards and Technology (NIST).54 An important element of such research would be constructing understandable measures of risk and accuracy for confidentiality-protected data for each candidate method.

Another area of research would be in-depth assessment of the sensitivity of specific types of information in the census to a range of communities. Such research could consider, for example, whether household relationship and type were appropriate to tabulate at the block level as in 2010, given the differing views in the country about same-sex relationships, or whether to provide that information at a higher level of geography. Similarly, assessment of the actual disclosure risk of reporting unprotected operational metrics at the census tract level could be conducted.

Finally, bolstering communications expertise and resources could have considerable payoff for a post-mortem analysis of 2020, and on research looking toward 2030 in terms of user needs, data sensitivity, cost-benefit analysis, and confidentiality-protection methods. Adequate resources will be essential beginning in the next fiscal year for the Census Bureau to undertake this multiprong program. If the program is well supported, it could improve not only the 2030 Census, but also other Census Bureau programs and the entire federal statistical system to the benefit of policymakers and the public.

Recommendation 11.1: For 2030 Census data products, the U.S. Census Bureau should adopt the risk-utility framework recommended by the Advisory Commission on Data for Evidence Building, which accepts that disclosure risk is a continuum, that not all data items are equally sensitive, and that federal statistical data need to be accessible and useful for a wide range of users and uses.

Recommendation 11.2: At a minimum, the 2030 Census should provide as-collected (i.e., unaltered) total population counts for all governmental units (states, counties and equivalents, minor civil divisions, incorporated places, recognized tribal areas, school districts) and quasi-governmental units (census county divisions, census-designated places, and tribal statistical areas), no matter how small. In addition, census block population totals should add up to block groups, census tracts, counties, and states.

Recommendation 11.3: For the 2030 Census data product plan, the U.S. Census Bureau should begin immediately on a multipronged research program with ample testing and opportunities for feedback and dialogue with the data user and

___________________

54 See Ridgeway et al. (2021) and the overview of NIST differential privacy challenges at https://lish.harvard.edu/nist-differential-privacy-challenges.

Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×

stakeholder community, broadly defined. The goal should be an end-to-end plan by 2027–2028 for producing a suite of 2030 data products that serve user needs, are appropriately protected, and meet the time schedule of the 1990–2010 Censuses.

Recommendation 11.4: U.S. Census Bureau research and dialogue with the user community, broadly defined, on the end-to-end plan for the 2030 Census suite of data products should include:

  • A review of the 2020 Census experience, including rigorous assessment of the Census Bureau’s reconstruction-reidentification studies under plausible scenarios of potential attack and comprehensive debriefing of users.
  • Consultation with the redistricting community on race/ethnicity and geographic detail (accounting for the possibility of a combined race/ethnicity question) that best trade off accuracy and confidentiality protection for the 2030 Redistricting File.
  • Issuance of challenges to the research community, in addition to focused contracts, for research and development on a range of confidentiality-protection methods and understandable metrics of risk and accuracy to accompany those methods.
  • Research on practical methods for users to account for noise injected into 2030 Census data by the selected confidentiality-protection techniques.
  • Research on the application of cost-benefit and risk-utility analysis for making tradeoffs between confidentiality protection and utility of census data products.
  • Research on the sensitivity of individual data items to relevant communities at differing levels of geographic aggregation and the implications for confidentiality protection.

Recommendation 11.5: The U.S. Census Bureau should welcome initiatives to add language to appropriate legislative vehicles, such as the Foundations for Evidence-Based Policymaking Act of 2018, that prescribes responsibilities and penalties for data users, in addition to agency staff, for willful, harmful disclosure of confidential information.

Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 291
Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 292
Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 293
Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 294
Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 295
Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 296
Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 297
Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 298
Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 299
Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 300
Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 301
Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 302
Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 303
Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 304
Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 305
Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 306
Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 307
Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 308
Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 309
Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 310
Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 311
Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 312
Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 313
Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 314
Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 315
Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 316
Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 317
Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 318
Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 319
Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 320
Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 321
Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 322
Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 323
Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 324
Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 325
Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 326
Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 327
Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 328
Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 329
Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 330
Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 331
Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 332
Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 333
Suggested Citation:"11 Impact of New Confidentiality-Protection Methods on 2020 Census Data Products." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.
×
Page 334
Next: 12 Learning from 2020, Preparing for 2030 »
Assessing the 2020 Census: Final Report Get This Book
×
 Assessing the 2020 Census: Final Report
Buy Paperback | $60.00 Buy Ebook | $48.99
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

Since 1790, the U.S. census has been a recurring, essential civic ceremony in which everyone counts; it reaffirms a commitment to equality among all, as political representation is explicitly tied to population counts. Assessing the 2020 Census looks at the quality of the 2020 Census and its constituent operations, drawing appropriate comparisons with prior censuses. The report acknowledges the extraordinary challenges the Census Bureau faced in conducting the census and provides guidance as it plans for the 2030 Census. In addition, the report encourages research and development as the goals and designs for the 2030 Census are developed, urging the Census Bureau to establish a true partnership with census data users and government partners at the state, local, tribal, and federal levels.

READ FREE ONLINE

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    Switch between the Original Pages, where you can read the report as it appeared in print, and Text Pages for the web version, where you can highlight and search the text.

    « Back Next »
  6. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  7. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  8. ×

    View our suggested citation for this chapter.

    « Back Next »
  9. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!