National Academies Press: OpenBook
« Previous: Front Matter
Suggested Citation:"1 Introduction." National Academies of Sciences, Engineering, and Medicine. 2020. 2020 Census Data Products: Data Needs and Privacy Considerations: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25978.
×

– 1 –
image
Introduction

The seed for the December 11–12, 2019, Workshop on 2020 Census Data Products was actually planted 110 years earlier when the act authorizing the 1910 Census was signed into law on July 2, 1909. That act, which enabled the first U.S. decennial census to be conducted by a permanent U.S. Census Bureau, made important provisions for more professional census-taking. In particular, Section 25 of the act (at 36 Stat. 9) outlined three basic tenets that remain pillars of the Census Bureau’s reputation today: census returns are only intended to be used for statistical purposes; only sworn employees are permitted to examine individual census returns; and “no publication shall be made by the Census Office whereby the data furnished by any particular establishment can be identified.” This language was repeated in law authorizing the 1920 Census (40 Stat. 1300), and it would be modified only slightly—protecting any “individual” as well as “establishment” from information disclosure—in the act that would govern the 1930–1950 Censuses (46 Stat. 25). When Congress codified census law in 1954, crafting the ongoing Title 13 of the U.S. Code, the ordering of the tenets was revised but the messages endured. Today’s 13 U.S.C. § 9(a) continues to forbid the secretary of commerce to “make any publication whereby the data furnished by any particular establishment or individual under this title can be identified.”

The practical import of this is that, for over a century and for nearly as long as the Census Bureau has existed in its present form, it has had to balance its inherent, ingrained mission of collecting and producing high-quality statistical information for the public good with a staunch, rigorous mandate to avoid disclosing information about any individual. For decades, the Census Bureau has taken steps to preserve respondent privacy ranging from

Suggested Citation:"1 Introduction." National Academies of Sciences, Engineering, and Medicine. 2020. 2020 Census Data Products: Data Needs and Privacy Considerations: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25978.
×

the crude (suppressing cells in data tables) to the more sophisticated (“data swapping” of records within small areas). These steps were taken to protect the confidentiality of respondent information, yet ironically, the details and extent of the record swapping and other techniques had to remain confidential, lest they be subject to reverse engineering.

The events that precipitated the Census Bureau’s development of a new disclosure avoidance system will be described in more detail below in Chapter 2, but a rough sketch must suffice here: Census Bureau researchers conducted a simulated database reconstruction attack on the bureau itself, using the published tabulations from the 2010 Census. Particularly when combined with commercially available data, the reconstruction attack convinced the Census Bureau that the current disclosure avoidance methodology left open a disturbingly high level of reidentification risk. The old methods, in short, are no longer satisfactory in an era of advanced computing. The Census Bureau researchers turned to the concept of differential privacy (also referred to as formal privacy) that was taking shape in the computer science literature as the anchor of a new Disclosure Avoidance System (DAS), offering proven probabilistic bounds on disclosure risk. Critically, the need was judged to be so dire that the Bureau decided in 2018 to commit to a solution based on differential privacy for the 2020 Census, even though such a solution had not yet been developed on the scale of and designed for the needs of a national census.

The need for a more holistic discussion of the tradeoffs between accuracy and privacy began to emerge when the Census Bureau produced a differential privacy-treated version of 1940 Census returns—the most recent decennial census for which the person-level microdata are released to the public in accordance with the 72-year data embargo rule.1 Though only conducted for a limited subset of 1940 census variables and categories, interrogation of the privatized data by researchers suggested enough quirks and anomalies to begin to heighten concern about the approach. As discussions continued at professional meetings and other forums through the summer of 2019, the Census Bureau pledged to produce a set of 2010 Demonstration Data Products (DDP)—running the proposed 2020 Census DAS on the data from the 2010 Census, creating alternate versions of many of the published 2010 tabulations. The intent was to spur census data users and stakeholders to replicate their analyses using the DDP to determine how the results compare to the 2010 Census tables that were produced using the data swapping and other disclosure avoidance techniques employed in that census. (The Census Bureau alone has the authority and access necessary to compare the results to census returns

___________________

1 The release of personally identifiable information from census records 72 years after enumeration is governed by an October 1952 agreement between the Census Bureau and what is now the National Archives and Records Administration and supported by reference to said agreement in 44 U.S.C. § 2108(b), law enacted in 1978.

Suggested Citation:"1 Introduction." National Academies of Sciences, Engineering, and Medicine. 2020. 2020 Census Data Products: Data Needs and Privacy Considerations: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25978.
×

without disclosure avoidance.) Simultaneously, the Census Bureau began work with the National Academies of Sciences, Engineering, and Medicine on convening a workshop as a major forum for harnessing data user feedback and discussing privacy-accuracy tradeoffs for the coming census.

1.1 ABOUT THE WORKSHOP

Both development tracks came to a head in late 2019. The 2010 DDP were released on October 29, 2019, and this workshop was held on December 11–12. The workshop was formally initiated weeks earlier, with the following statement of task:

A planning committee of the National Academies of Sciences, Engineering, and Medicine will organize and execute a 2-day public workshop for the U.S. Census Bureau to discuss the suite of data products the Census Bureau will generate from the 2020 Census. The workshop will feature presentations by users of decennial census data products to help the Census Bureau better understand the uses of the data products and the importance of these uses. The workshop will focus extensively on data-product use cases outside the legally mandated apportionment and redistricting settings; however, the uses of and demand for the block-level counts included in the data files produced pursuant to P.L. 94-171 are important to elicit in considering the full suite of 2020 census data products. An important consideration of the workshop will be the overall level of noise that is injected into the results, which is being done to preserve the confidentiality of responses, as well as the allocation of that noise across data products. The discussion will be focused to help inform the Census Bureau’s decisions on the final specification of 2020 data products. A proceedings of the presentations and discussions at the workshop will be prepared by a designated rapporteur in accordance with institutional guidelines.

In furtherance of this charge, the Committee on National Statistics (CNSTAT) of the National Academies assembled an eight-member planning committee, cochaired by V. Joseph Hotz, professor of economics at Duke University, and Joseph Salvo, chief demographer of the New York City Department of City Planning. The date of the workshop was chosen by reconciling Census Bureau schedules with the availability of adequate space at the National Academies—and shifted twice, from October to November to December—based on projected availability of the 2010 Demonstration Data Products. Active recruitment for workshop sessions began well before the release of those data products.

Trying to replicate a process that worked well for the 2012 Workshop on the Benefits and Burdens of the American Community Survey (National Research Council, 2013), the workshop committee promulgated as widely as possible a call for input to census data users, asking about the ways in which they use

Suggested Citation:"1 Introduction." National Academies of Sciences, Engineering, and Medicine. 2020. 2020 Census Data Products: Data Needs and Privacy Considerations: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25978.
×

decennial census data and whether they were planning to analyze the 2010 DDP. The idea, like that predecessor workshop, was that the call for input might enable a virtual poster session at the workshop, extending the range of user case studies covered by the workshop. As the workshop presentations would later reinforce, the call for input yielded very little input because for nearly all users, the possible effects of a new 2020 DAS on day-to-day work are very abstract and ethereal until the release of tangible data. Almost all of the responses that were received to the call for input were converted by the planning committee into presentation slots at the workshop proper.

Two brief points that follow from the charge and design of the workshop and that provide some constraint on its content are in order and important to raise at the outset. First, the main thrust of the workshop was to elicit use cases of census data and replicate previous analyses and findings using the 2010 DDP data. It follows that the workshop was necessarily constrained to the content of the 2010 DDP. For instance, the Census Bureau had already determined detailed tabulation of race and ethnicity categories (including tribal affiliation for American Indian and Alaska Native persons) to be out of scope for the core 2020 DAS, so they were accordingly out of scope for the workshop. (These detailed tabulations are to be produced using a separate, as yet unspecified system.) Likewise, as discussed later, the 2020 DAS treats person-level data and housing unit-level data separately, with no direct linkage possible between the two. However, some historical (and planned for 2020) census tabulations rely on “joins” between persons and housing units, such as to study household composition. Because such person-household joins were out of scope for the core 2020 DAS, they could not be covered in the workshop. Second, the charge for the workshop clearly emphasizes that it should elicit use cases external to the Census Bureau, yet there are ways that the Census Bureau uses decennial census data within its own walls that are of keen interest to the broader data user community. Notable among these are the Census Bureau’s program for producing population estimates between decennial censuses, using the most current census as a base. The implications for the differential privacy approach for these intramural uses of census data are potentially profound but they rely on operational decisions not yet made, and so could only be flagged as an important concern at the workshop but not discussed in depth.

The workshop was held in the auditorium of the National Academy of Sciences in Washington, where the in-person attendance list (counting presenters and staff) numbered roughly 140. It was also webcast live, generating approximately 2,200 plays and 2,400 hours of viewing from 450 IP addresses. At the time of this writing, the video clips from the workshop and the presentation slides are available at https://sites.nationalacademies.org/DBASSE/CNSTAT/DBASSE_196518 and generally through the CNSTAT site. The workshop agenda and a list of registered in-person participants appear in Appendix A, and

Suggested Citation:"1 Introduction." National Academies of Sciences, Engineering, and Medicine. 2020. 2020 Census Data Products: Data Needs and Privacy Considerations: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25978.
×

biographical sketches of the planning committee and invited speakers appear in Appendix B.

1.2 STRUCTURE OF THESE PROCEEDINGS

As a workshop synopsis, this proceedings document directly tracks the workshop agenda (Appendix A), with two deviations, the first of which directly follows this section in Chapter 2. In shaping the agenda, a bargain was struck (to maximize time for discussion) that the workshop would not devote significant time to providing background and a step-by-step walkthrough of the Census Bureau’s new disclosure avoidance methodology, provided that such information was made available prior to the workshop. In response, the Census Bureau staff produced a short fact sheet, the main text of which was woven into the FAQs that accompanied the release of the 2010 DDP and the closing section of the technical documentation for the products (U.S. Census Bureau, 2019). Under that arrangement, the Census Bureau’s opening presentation (Section 2.1) by Philip Leclerc skipped briefly through the nuts and bolts of how the Census Bureau’s methodology works. In the following session at the workshop, the extended-length collaborative presentation by David Van Riper and Seth Spielman (Section 3.1) also included some methodological background that introduced the hierarchy of geographic levels that constitutes the “spine” of processing in the Census Bureau’s approach. For clarity of exposition and a better understanding of all the presentations that follow, then, we begin Chapter 2 with a synopsis of what is understood about the Census Bureau’s TopDown Algorithm (TDA), drawing elements from the Leclerc, Van Riper/Spielman, and other presentations, as well as the documentation of the fact sheet and 2010 DDP. The balance of the chapter summarizes the Census Bureau’s presentations on the formulation of the 2010 DDP.

Chapters 29 are the core of the workshop and these proceedings, taking major “use cases” of decennial census data in turn, describing the issues raised by a change in the disclosure avoidance methodology and exploring data user analysis of the 2010 DDP relative to those use cases. The second deviation from the workshop agenda is that it made sense for the sake of pacing to divide the use case of identification of small and special populations into two blocks, split between the two days, with another, shorter presentation block between them. Chapter 7 and its examination of the use of census data as denominators for rates and populations is moved forward a slot, then, so that the two pieces of small and special populations (Chapter 8 on American Indian and Alaska Native lands and Chapter 9 on the young, the elderly, and other special groups) are presented consecutively.

The fundamental tension between privacy and accuracy runs through all the workshop sessions, but the workshop by design emphasized data users and

Suggested Citation:"1 Introduction." National Academies of Sciences, Engineering, and Medicine. 2020. 2020 Census Data Products: Data Needs and Privacy Considerations: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25978.
×

their needs, which tend overwhelmingly to focus on the data accuracy side of the proposition. It was critical, then, to balance things with a lengthy discussion block devoted to aspects of the privacy side of the argument. Chapter 10 summarizes the opening statements of 5 privacy policy experts as well as the subsequent discussion. Census Bureau staff then gave a thorough review of their research plans going forward, in part reacting to what they heard in the preceding workshop presentations, which is summarized in Chapter 11. For the final session of the workshop, the participants and the audience were assigned to three different rooms for a fuller, small-group discussion of the workshop and the takeaways participants gained fro the workshop. Chapter 12 lists the summary points from the three rooms, read and discussed in plenary as a way of closing the workshop.

This proceedings document has been prepared by the workshop rapporteur as a factual summary of what occurred at the workshop. The planning committee’s role was limited to planning and convening the workshop. The views contained in the proceedings are those of individual workshop participants and do not necessarily represent the views of all workshop participants, the planning committee, or the National Academies of Sciences, Engineering, and Medicine.

Suggested Citation:"1 Introduction." National Academies of Sciences, Engineering, and Medicine. 2020. 2020 Census Data Products: Data Needs and Privacy Considerations: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25978.
×
Page 1
Suggested Citation:"1 Introduction." National Academies of Sciences, Engineering, and Medicine. 2020. 2020 Census Data Products: Data Needs and Privacy Considerations: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25978.
×
Page 2
Suggested Citation:"1 Introduction." National Academies of Sciences, Engineering, and Medicine. 2020. 2020 Census Data Products: Data Needs and Privacy Considerations: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25978.
×
Page 3
Suggested Citation:"1 Introduction." National Academies of Sciences, Engineering, and Medicine. 2020. 2020 Census Data Products: Data Needs and Privacy Considerations: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25978.
×
Page 4
Suggested Citation:"1 Introduction." National Academies of Sciences, Engineering, and Medicine. 2020. 2020 Census Data Products: Data Needs and Privacy Considerations: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25978.
×
Page 5
Suggested Citation:"1 Introduction." National Academies of Sciences, Engineering, and Medicine. 2020. 2020 Census Data Products: Data Needs and Privacy Considerations: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25978.
×
Page 6
Next: 2 Disclosure Avoidance in the 2020 Census »
2020 Census Data Products: Data Needs and Privacy Considerations: Proceedings of a Workshop Get This Book
×
 2020 Census Data Products: Data Needs and Privacy Considerations: Proceedings of a Workshop
Buy Paperback | $60.00 Buy Ebook | $48.99
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

The Committee on National Statistics of the National Academies of Sciences, Engineering, and Medicine convened a 2-day public workshop from December 11-12, 2019, to discuss the suite of data products the Census Bureau will generate from the 2020 Census. The workshop featured presentations by users of decennial census data products to help the Census Bureau better understand the uses of the data products and the importance of these uses and help inform the Census Bureau's decisions on the final specification of 2020 data products. This publication summarizes the presentation and discussion of the workshop.

READ FREE ONLINE

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    Switch between the Original Pages, where you can read the report as it appeared in print, and Text Pages for the web version, where you can highlight and search the text.

    « Back Next »
  6. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  7. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  8. ×

    View our suggested citation for this chapter.

    « Back Next »
  9. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!