2020 Census Data Products
Data Needs and Privacy Considerations
Proceedings of a Workshop
Daniel L. Cork, Constance F. Citro, and Nancy J. Kirkendall, Rapporteurs
Committee on National Statistics
Division of Behavioral and Social Sciences and Education
THE NATIONAL ACADEMIES PRESS
Washington, DC
www.nap.edu
THE NATIONAL ACADEMIES PRESS 500 Fifth Street, NW Washington, DC 20001
The project that is the subject of this report was supported by the U.S. Census Bureau through Contract No. YA1323-14-CN-0033. Support of the work of the Committee on National Statistics is provided by a consortium of federal agencies through a grant from the National Science Foundation (No. SES-1024012). Any opinions, findings, conclusions, or recommendations expressed in this publication do not necessarily reflect the views of the organizations or agencies that provided support for the project.
International Standard Book Number-13: 978-0-309-68484-2
International Standard Book Number-10: 0-309-68484-6
Digital Object Identifier: https://doi.org/10.17226/25978
Additional copies of this publication are available from the National Academies Press, 500 Fifth Street, NW, Keck 360, Washington, DC 20001; (800) 624-6242 or (202) 334-3313; http://www.nap.edu.
Copyright 2020 by the National Academy of Sciences. All rights reserved.
Printed in the United States of America
Suggested citation: National Academies of Sciences, Engineering, and Medicine. (2020). 2020 Census Data Products: Data Needs and Privacy Considerations: Proceedings of a Workshop. Washington, DC: The National Academies Press. https://doi.org/10.17226/25978.
The National Academy of Sciences was established in 1863 by an Act of Congress, signed by President Lincoln, as a private, nongovernmental institution to advise the nation on issues related to science and technology. Members are elected by their peers for outstanding contributions to research. Dr. Marcia McNutt is president.
The National Academy of Engineering was established in 1964 under the charter of the National Academy of Sciences to bring the practices of engineering to advising the nation. Members are elected by their peers for extraordinary contributions to engineering. Dr. John L. Anderson is president.
The National Academy of Medicine (formerly the Institute of Medicine) was established in 1970 under the charter of the National Academy of Sciences to advise the nation on medical and health issues. Members are elected by their peers for distinguished contributions to medicine and health. Dr. Victor J. Dzau is president.
The three Academies work together as the National Academies of Sciences, Engineering, and Medicine to provide independent, objective analysis and advice to the nation and conduct other activities to solve complex problems and inform public policy decisions. The National Academies also encourage education and research, recognize outstanding contributions to knowledge, and increase public understanding in matters of science, engineering, and medicine.
Learn more about the National Academies of Sciences, Engineering, and Medicine at www.nationalacademies.org.
Consensus Study Reports published by the National Academies of Sciences, Engineering, and Medicine document the evidence-based consensus on the study’s statement of task by an authoring committee of experts. Reports typically include findings, conclusions, and recommendations based on information gathered by the committee and the committee’s deliberations. Each report has been subjected to a rigorous and independent peer-review process and it represents the position of the National Academies on the statement of task.
Proceedings published by the National Academies of Sciences, Engineering, and Medicine chronicle the presentations and discussions at a workshop, symposium, or other event convened by the National Academies. The statements and opinions contained in proceedings are those of the participants and are not endorsed by other participants, the planning committee, or the National Academies.
For information about other products and activities of the National Academies, please visit nationalacademies.org/whatwedo.
PLANNING COMMITTEE FOR THE WORKSHOP ON 2020 CENSUS DATA PRODUCTS
V. JOSEPH HOTZ (Cochair), Duke University
JOSEPH J. SALVO (Cochair), New York City Department of City Planning
CATHERINE FITCH, Minnesota Population Center
DANIEL GOROFF, Alfred P. Sloan Foundation
EDDIE HUNSINGER, Transamerica
LINDA JACOBSEN, Population Reference Bureau
MICHAEL MCDONALD, University of Florida
C. MATTHEW SNIPP, Stanford University
DANIEL CORK, Study Director
CONTTANCE F. CITRO, Senior Scholar
ANTHONY MANN, Senior Program Associate
BRIAN HARRIT-KOJETIN, Director, Committee on National Statistics
COMMITTEE ON NATIONAL STATISTICS
ROBERT M. GROVET (Chair), Office of the Provost, Department of Mathematics and Statistics, and Department of Sociology, Georgetown University
ANNE C. CATE, Woodrow Wilson School of Public and International Affairs, Princeton University
MICK P. COUPER, Institute for Social Research, University of Michigan
JANET CURRIE, Woodrow Wilson School of Public and International Affairs, Princeton University
DIANA FARRELL, JPMorgan Chase Institute, Washington, D.C.
ROBERT GOERGE, Chapin Hall at the University of Chicago
ERICA L. GROTHEN, The ILR School, Cornell University
HILARY HOYNET, Goldman School of Public Policy and Department of Economics, University of California, Berkeley
DANIEL KIFER, Department of Computer Science, Pennsylvania State University
SHARON LOHR, School of Mathematical and Statistical Sciences, Arizona State University emerita
JEROME P. REITER, Department of Statistical Science, Duke University
JUDITH A. SELTZER, Department of Sociology, University of California, Los Angeles
C. MATTHEW SNIPP, Department of Sociology, Stanford University
ELIZABETH A. STUART, Department of Mental Health, Johns Hopkins Bloomberg School of Public Health
JEANNETTE WING, Data Science Institute and Computer Science Department, Columbia University
BRIAN HARRIT-KOJETIN, Director
CONTTANCE F. CITRO, Senior Scholar
Acknowledgments
These proceedings are the main product of the workshop. This report was prepared by a rapporteur whose charter was to distill the gist of the presentations and the essence of the discussions. The planning committee’s role was limited to planning and convening the workshop. The views contained in this report are those of individual workshop participants and do not necessarily represent the views of all workshop participants, the planning committee, or the National Academies of Sciences, Engineering, and Medicine.
This Proceedings of a Workshop was reviewed in draft form by individuals chosen for their diverse perspectives and technical expertise. The purpose of this independent review is to provide candid and critical comments that will assist the National Academies of Sciences, Engineering, and Medicine in making each published proceedings as sound as possible and ensure that it meets the institutional standards for quality, objectivity, evidence, and responsiveness to the charge. The review comments and draft manuscript remain confidential to protect the integrity of the process.
We thank the following individuals for their review of this proceedings: Linda A. Jacobsen, U.S. Programs, Population Reference Bureau, and Timothy A. Kuhn, Boyd Center for Business and Economic Research, The University of Tennessee, Knoxville.
Although the reviewers listed above provided many constructive comments and suggestions, they were not asked to endorse the content of the proceedings nor did they see the final draft before its release. The review of this proceedings was overseen by Judith A. Seltzer, Department of Sociology, University of California, Los Angeles. She was responsible for making certain that an independent examination of this proceedings was carried out in accordance with the standards of the National Academies and that all review comments were carefully considered. Responsibility for the final content rests entirely with the rapporteur and the National Academies.
This page intentionally left blank.
Contents
1.2 Structure of These Proceedings
2 Disclosure Avoidance in the 2020 Census
2.1.1 Reason for the Change: The Simulated Database Reconstruction Attack
2.1.2 Structure of the TopDown Algorithm
2.1.3 Why Differential Privacy and Why the TopDown Algorithm in Specific?
2.1.4 Choosing a Privacy-Loss Budget ϵ
2.2 Setting the Privacy-Loss Budget for the 2010 Demonstration Data Products
3 Geospatial Analyses of Social and Demographic Conditions
3.1 Geographic Review of Differentially Private Demonstration Data
3.2 Implications for Municipalities and School Enrollment Statistics
4 Redistricting and Related Legal Uses
4.1 Redistricting and the Voting Rights Act
4.1.3 Polarization of Voting by Race
4.1.4 Language Access Determinations
4.2 Impacts on Redistricting: The Case of New Rochelle, New York
4.2.1 Redistricting the City of New Rochelle
4.3 Redistricting and Differential Privacy
5 Delivery of Government Services
5.1 Privatized Data in City Planning
5.2 Decennial Census, Rural Housing Data, and Differential Privacy
5.2.2 Two Use Cases: Affordable Rural Housing and Lending to Underserved Markets
5.3 Importance of Decennial Census for Regional Planning in California
5.3.2 Housing and Population Consistency
6 Business and Private Sector Applications
6.1 Effects of Differentially Private Noise Injection on Survey Operations
6.2 Census Differential Privacy and Private Sector Data Products
6.3 Calculating Floodplain Weights and Benchmarking Needs
6.3.1 Floodplain Housing Estimates
6.3.2 Benchmarking Existing Home Sales
7 Use as Denominators for Rates and Baseline for Estimates
7.1 Public Health and Health Equity Questions
7.1.2 Census-Derived Area-Based Metrics
7.1.4 Population Health and Health Equity
7.2 Rates of Cancer Incidence and Mortality
7.2.1 Surveillance, Epidemiology, and End Result (SEER) Program
7.2.2 Features of Population Estimates
7.2.3 Implications of Differential Privacy
7.3 Impact on Critical Rate Calculations, Particularly for Small Areas and Demographic Communities
7.3.1 Effects of Differential Privacy on Mortality Rates
7.4 Housing and Population Counts: Implications for Local Estimates and Projections
7.4.1 State-Produced Population Estimates
7.4.2 Households with Different Service Needs
7.4.3 Concluding Comments and Questions
8 Identification of Rural and Special Populations: American Indians and Alaska Natives
8.1.1 Differences for American Indians on Reservations
8.1.2 Differences for Alaska Natives in Villages
8.1.3 Differences for Native Hawaiians and Other Pacific Islanders (NHOPI)
8.2 Impact of Differential Privacy on American Indian and Alaska Native Tribes
8.2.2 Comparisons of 2010 Summary File 1 and the Demonstration Data Products
9 Identification of Rural and Special Populations: Small Communities, the Young, and the Elderly
9.1 Privatized Data for Alaska Communities
9.1.2 Impacts of Differential Privacy
9.1.3 Effects on State Programs
9.2 Children Ages 0–4, States and Counties
9.2.5 Uses of Data for Young Children
9.3 Elementary School Enrollment
9.3.1 School District Use Cases
9.4 Uses of Census Data on Age in Local Planning
9.4.1 Emergency Preparedness in New York City
9.4.2 Age Data as Input for Many Local Uses
9.5 Child Poverty by Local School District and Allocation of Federal Title I Funds
9.5.2 Number of School-Aged Children in Poverty
9.5.3 Title I Funding: Eligibility and Amounts
10 Panel Discussion on Key Privacy Issues
10.1 Privacy and Census Participation
10.2 Severity of the Reidentification Threat
10.4 Legal Protections of Privacy
11 Census Bureau’s Responses and Own Analyses of 2010 Demonstration Data Products
11.1 Demographic Findings of the 2010 Census Demonstration Data Products
11.2 Known Issues and Next Steps in Disclosure Avoidance System Development
11.3 Next Steps for User Engagement
12 Summary of Breakout Discussion Sessions
This page intentionally left blank.
List of Figures and Tables
FIGURES
2.5 Effect of privacy-loss budget ϵ on age pyramids, Fairfax County, Virginia.
This page intentionally left blank.
Acronyms and Abbreviations
1 − TVD |
One minus average total variation distance (metric used in assessing TDA runs, scaling between 0 and 1, corresponding roughly to the proportion of table entries in the Microdata Detail File, MDF, that are exactly as enumerated in the Census Edited File, CEF) |
2010 DDP |
See DDP |
ACS |
American Community Survey |
AIAN |
American Indian/Alaska Native |
CEF |
Census Edited File (the compiled set of census returns after performing editing and imputation that is the basis for census tabulation) |
CNSTAT |
Committee on National Statistics |
CQR |
Count Question Resolution |
CUF |
Census Unedited File (the compiled set of raw census returns prior to editing and imputation, the result of which is the Census Edited File, CEF) |
DAS |
Disclosure Avoidance System |
DDP |
Also 2010 DDP; 2010 Census Demonstration Data Products (the set of census tables released in October 2019, applying the proposed 2020 DAS to the 2010 Census CEF) |
DHC |
Demographics and Housing Characteristics (proposed name of core 2020 Census data product, replacing previous Summary File 1–2 nomenclature; differentiated as DHC-P for persons and DHC-H for housing) |
DP |
differential privacy (also described as formal privacy) |
DSEP |
Data Stewardship Executive Policy [Committee, of the U.S. Census Bureau] |
FIPPs |
Fair Information Practice Principles |
|
|
GQ |
group quarters |
HH |
households |
HU |
housing units |
MDF |
Microdata Detail File (synthetic data file that is the output from applying the 2020 Disclosure Avoidance System, DAS, to the Census Edited File, CEF, and that serves as the input to census tabulation); after the workshop, the Census Bureau adopted revised terminology for these output files from the DAS, branding them Privacy-Protected Microdata Files (PPMF) instead |
NHPI |
Native Hawaiian and Pacific Islander |
P.L. 94-171 |
Public Law 94-171 (enacted December 23, 1975, amending Title 13 of the U.S. Code to set the process for producing the decennial census data file used for legislative redistricting, requiring that those data be produced within one year of the census date; “P.L. 94-171 file” or variants are used as shorthand reference to the redistricting data files) |
PLB |
privacy-loss budget (denoted by the parameter ϵ [epsilon]) |
PPMF |
Privacy-Protected Microdata Files (see MDF) |
RHNA |
[Housing Element and] Regional Housing Needs Allocation (fund allocation program to regional planning authorities, mandated by California law) |
SF1, SF2 |
Summary File 1, Summary File 2 (name of major data products from the 2010 Census, the primary difference being that SF2 included more detailed tabulations by race and Hispanic origin categories) |
TDA |
TopDown Algorithm, more completely the 2020 Decennial Census TopDown Disclosure Limitation Algorithm (methodology of the 2020 Census Disclosure Avoidance System, DAS) |
Title 13 |
Title 13 of the U.S. Code (the section of law governing the conduct of the census) |