FEDERAL STATISTICS,
MULTIPLE DATA
SOURCES, AND
PRIVACY PROTECTION
Next Steps
Panel on Improving Federal Statistics for
Policy and Social Science Research Using
Multiple Data Sources and State-of-the-Art Estimation Methods
Robert M. Groves and Brian A. Harris-Kojetin, Editors
Committee on National Statistics
Division of Behavioral and Social Sciences and Education
A Consensus Study Report of
THE NATIONAL ACADEMIES PRESS
Washington, DC
www.nap.edu
THE NATIONAL ACADEMIES PRESS 500 Fifth Street, NW Washington, DC 20001
This activity was supported by a grant from the Laura and John Arnold Foundation with additional support from the National Academy of Sciences Kellogg Fund. Support for the work of the Committee on National Statistics is provided by a consortium of federal agencies through a grant from the National Science Foundation, a National Agricultural Statistics Service cooperative agreement, and several individual contracts. Any opinions, findings, conclusions, or recommendations expressed in this publication do not necessarily reflect the views of any organization or agency that provided support for the project.
International Standard Book Number-13: 978-0-309-46537-3
International Standard Book Number-10: 0-309-46537-0
Digital Object Identifier: https://doi.org/10.17226/24893
Additional copies of this report are available for sale from the National Academies Press, 500 Fifth Street, NW, Keck 360, Washington, DC 20001; (800) 624-6242 or (202) 334-3313; http://www.nap.edu/.
Copyright 2017 by the National Academy of Sciences. All rights reserved.
Printed in the United States of America
Suggested citation: National Academies of Sciences, Engineering, and Medicine. (2017). Federal Statistics, Multiple Data Sources, and Privacy Protection: Next Steps. Washington, DC: The National Academies Press. doi: https://doi.org/10.17226/24893.
The National Academy of Sciences was established in 1863 by an Act of Congress, signed by President Lincoln, as a private, nongovernmental institution to advise the nation on issues related to science and technology. Members are elected by their peers for outstanding contributions to research. Dr. Marcia McNutt is president.
The National Academy of Engineering was established in 1964 under the charter of the National Academy of Sciences to bring the practices of engineering to advising the nation. Members are elected by their peers for extraordinary contributions to engineering. Dr. C. D. Mote, Jr., is president.
The National Academy of Medicine (formerly the Institute of Medicine) was established in 1970 under the charter of the National Academy of Sciences to advise the nation on medical and health issues. Members are elected by their peers for distinguished contributions to medicine and health. Dr. Victor J. Dzau is president.
The three Academies work together as the National Academies of Sciences, Engineering, and Medicine to provide independent, objective analysis and advice to the nation and conduct other activities to solve complex problems and inform public policy decisions. The National Academies also encourage education and research, recognize outstanding contributions to knowledge, and increase public understanding in matters of science, engineering, and medicine.
Learn more about the National Academies of Sciences, Engineering, and Medicine at www.nationalacademies.org.
Consensus Study Reports published by the National Academies of Sciences, Engineering, and Medicine document the evidence-based consensus on the study’s statement of task by an authoring committee of experts. Reports typically include findings, conclusions, and recommendations based on information gathered by the committee and the committee’s deliberations. Each report has been subjected to a rigorous and independent peer-review process and it represents the position of the National Academies on the statement of task.
Proceedings published by the National Academies of Sciences, Engineering, and Medicine chronicle the presentations and discussions at a workshop, symposium, or other event convened by the National Academies. The statements and opinions contained in proceedings are those of the participants and are not endorsed by other participants, the planning committee, or the National Academies.
For information about other products and activities of the National Academies, please visit www.nationalacademies.org/about/whatwedo.
PANEL ON IMPROVING FEDERAL STATISTICS FOR POLICY AND SOCIAL SCIENCE RESEARCH USING MULTIPLE DATA SOURCES AND STATE-OF-THE-ART ESTIMATION METHODS
ROBERT M. GROVES (Chair), Office of the Provost, Department of Mathematics and Statistics, and Department of Sociology, Georgetown University
MICHAEL E. CHERNEW, Department of Health Care Policy, Harvard Medical School
PIET DAAS, Department of Corporate Services, Information Technology and Methodology, Statistics Netherlands
CYNTHIA DWORK, John A. Paulson School of Engineering and Applied Sciences, and Radcliffe Institute for Advanced Study, Harvard University
OPHIR FRIEDER, Department of Computer Science, Georgetown University
HOSAGRAHAR V. JAGADISH, Computer Science and Engineering, University of Michigan
FRAUKE KREUTER, Joint Program in Survey Methodology, University of Maryland, and Statistics and Methodology, University of Mannheim and Institute for Employment Research
SHARON LOHR, Westat, Rockville, MD
JAMES P. LYNCH, Department of Criminology and Criminal Justice, University of Maryland
COLM O’MUIRCHEARTAIGH, Harris School of Public Policy Studies, University of Chicago
TRIVELLORE RAGHUNATHAN, Institute for Social Research, University of Michigan
ROBERTO RIGOBON, Sloan School of Management, Massachusetts Institute of Technology
MARC ROTENBERG, Electronic Privacy Information Center, Washington, DC
BRIAN HARRIS-KOJETIN, Study Director
HERMANN HABERMANN, Senior Program Officer
GEORGE SCHOEFFEL, Research Assistant
AGNES GASKIN, Administrative Assistant
COMMITTEE ON NATIONAL STATISTICS
ROBERT M. GROVES (Chair), Office of the Provost, Department of Mathematics and Statistics, and Department of Sociology, Georgetown University
FRANCINE BLAU, School of Industrial and Labor Relations, Cornell University
MARY ELLEN BOCK, Department of Statistics, Purdue University (emerita)
ANNE C. CASE, Woodrow Wilson School of Public and International Affairs, Princeton University
MICHAEL CHERNEW, Department of Health Care Policy, Harvard Medical School
JANET CURRIE, Woodrow Wilson School of Public and International Affairs, Princeton University
DONALD DILLMAN, Social and Economic Sciences Research Center, Washington State University
CONSTANTINE GATSONIS, Center for Statistical Sciences, Brown University
JAMES HOUSE, Survey Research Center, Institute for Social Research, University of Michigan
THOMAS MESENBOURG, Retired, formerly U.S. Census Bureau
SARAH NUSSER, Office of the Vice President for Research and Department of Statistics, Iowa State University
COLM O’MUIRCHEARTAIGH, Harris School of Public Policy Studies, University of Chicago
JEROME P. REITER, Department of Statistical Science, Duke University
ROBERTO RIGOBON, Sloan School of Management, Massachusetts Institute of Technology
JUDITH A. SELTZER, Department of Sociology, University of California, Los Angeles
EDWARD SHORTLIFFE, Department of Biomedical Informatics, Columbia University/Arizona State University
BRIAN A. HARRIS-KOJETIN, Director
CONSTANCE F. CITRO, Senior Scholar
Acknowledgments
This report of the Panel on Improving Federal Statistics for Policy and Social Science Research Using Multiple Data Sources and State-of-the-Art Estimation Methods is the product of contributions from many colleagues, whom we thank for their generous sharing of their time and expertise.
The panel is grateful to the Laura and John Arnold Foundation for funding this study, and to foundation staff Stuart Buck and Meredith McPhail for their help and guidance throughout the study. The panel also is grateful for the supplemental funding provided by the National Academy of Sciences Kellogg Fund.
The panel thanks the many individuals who participated in the panel’s workshops and open meetings and shared their research, their challenges, and their creative approaches to using administrative and private-sector data sources. We also thank Steve Eglash (Stanford University) for his work examining issues of data access for private-sector companies.
At the National Academies of Sciences, Engineering, and Medicine, the panel would not have been able to complete its work efficiently without a capable staff. Constance F. Citro, former director of the Committee on National Statistics (CNSTAT), had the vision and perseverance to make this study a reality. The division’s Kirsten Sampson-Snyder was extremely helpful in coordinating the review process, and Eugenia Grohman provided meticulous and thorough editing that greatly improved the readability of the report. For CNSTAT, Agnes Gaskin, administrative assistant, provided assistance in managing the logistics of this panel and our meetings. Hermann Habermann, senior program officer, provided valuable feedback
and guidance on drafts of this report. George Schoeffel, research assistant, assisted with every aspect of the study, including creating and managing a database of references, creating figures and tables, researching and drafting items for the report, carefully reviewing drafts, and performing whatever tasks needed to be done for the panel and the report.
This Consensus Study Report was reviewed in draft form by individuals chosen for their diverse perspectives and technical expertise. The purpose of this independent review is to provide candid and critical comments that will assist the National Academies in making each published report as sound as possible and to ensure that it meets the institutional standards for quality, objectivity, evidence, and responsiveness to the study charge. The review comments and draft manuscript remain confidential to protect the integrity of the deliberative process.
We thank the following individuals for their review of this report: Cynthia Z.F. Clark, independent consultant, McLean, VA; Mick P. Couper, Institute for Social Research, University of Michigan; Jeremy Freese, Department of Sociology, Stanford University; Pamela Herd, Robert M. La Follette School of Public Affairs, University of Wisconsin–Madison; Thomas L. Mesenbourg, U.S. Census Bureau (retired); Stephen W. Raudenbush, Department of Sociology, University of Chicago; Jerome P. Reiter, Department of Statistical Science, Duke University; and Larry A. Wasserman, Department of Statistics and Machine Learning Department, Carnegie Mellon University.
Although the reviewers listed above provided many constructive comments and suggestions, they were not asked to endorse the report’s conclusions or recommendations, nor did they see the final draft of the report before its release. The review of this report was overseen by Michael Hout, Department of Sociology, New York University, and Alicia L. Carriquiry, Department of Statistics, Iowa State University. They were responsible for making certain that an independent examination of this report was carried out in accordance with the standards of the National Academies and that all review comments were carefully considered. Responsibility for the final content rests entirely with the authoring panel and the National Academies.
Robert M. Groves, Chair
Panel on Improving Federal Statistics for
Policy and Social Science Research Using Multiple Data Sources and
State-of-the-Art Estimation Methods
and Brian A. Harris-Kojetin, Study Director
Preface
This is the second Consensus Study Report of the Panel on Improving Federal Statistics for Policy and Social Science Research Using Multiple Data Sources and State-of-the-Art Estimation Methods. Our first report, Innovations in Federal Statistics: Combining Data Sources While Protecting Privacy, was released in January 2017. In that report, the panel noted that there has been increasing attention in recent years to using data already collected by government entities for statistical purposes, such as evaluation of government programs. These data include such records as employment and earnings information on state unemployment insurance, income reported on federal tax forms, Social Security earnings and benefits, medical conditions and payments made for services from Medicare and Medicaid records, and food assistance program benefits.
We also noted that after the panel had begun its work, Congress had established an Evidence-Based Policymaking Commission (P.L. 114-140) and charged it with examining arrangements for integrating federal survey and administrative data and making those data available to researchers for program evaluation. The commission issued its final report on September 7, 2017, after the panel had completed its deliberations.
The commission’s focus was somewhat different from that of the panel. It addressed using statistical analysis to evaluate government programs and alternative policy options. The panel was more specifically focused on improvement in federal statistics through the use of multiple data sources. However, there was clearly overlap in the two activities.
Since the panel had completed its work when the commission’s report was released, we could not consider the similarities and differences between
the commission’s recommendations and our own, so we leave that to the readers of the two reports. It is our hope that this report is useful to federal agencies and their stakeholders, as well as to the broader research community. It attempts to identify key challenges to sample surveys, which have long been the mainstay of federal statistics, and offer approaches to using the wealth of administrative and private-sector data that exist and that are being created every day.
Robert M. Groves, Chair
Panel on Improving Federal Statistics for
Policy and Social Science Research Using Multiple Data Sources and
State-of-the-Art Estimation Methods
4 LEGAL AND COMPUTER SCIENCE APPROACHES TO PRIVACY
Personally Identifiable Information and Privacy Law
Legal View of Privacy in the Context of Statistical Data Analysis
Examples Elucidating the PII/Non-PII Issue
Synthesis: A Proposed Liability Rule for PII
Implications for Federal Statistical Agencies
Two Avenues to a Breach of Privacy
Implications for Federal Statistical Agencies
6 QUALITY FRAMEWORKS FOR STATISTICS USING MULTIPLE DATA SOURCES
A Quality Framework for Survey Research
Broader Frameworks for Assessing Quality
Assessing the Quality of Administrative and Private-Sector Data
The Quality of Alternative Data Sources: Two Illustrations
7 A NEW ENTITY TO PROVIDE VITAL INFORMATION THROUGH ENHANCED FEDERAL STATISTICS