EXPANDING ACCESS TO RESEARCH DATA
Reconciling Risks and Opportunities
THE NATIONAL ACADEMIES PRESS
Washington, D.C.
www.nap.edu
THE NATIONAL ACADEMIES PRESS
500 Fifth Street, NW Washington, DC 20001
NOTICE: The project that is the subject of this report was approved by the Governing Board of the National Research Council, whose members are drawn from the councils of the National Academy of Sciences, the National Academy of Engineering, and the Institute of Medicine. The members of the committee responsible for the report were chosen for their special competences and with regard for appropriate balance.
This study was supported by Contract No. NO1-OD-4-2149 between the National Academy of Sciences and the National Institute on Aging. Support of the work of the Committee on National Statistics is also provided by a consortium of federal agencies through Grant No. SBR-0112521 from the National Science Foundation. Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the views of the organizations or agencies that provided support for the project.
Library of Congress Cataloging-in-Publication Data
Expanding access to research data : reconciling risks and opportunities.—1st ed.
p. cm.
Includes bibliographical references.
ISBN 0-309-10012-7 (pbk.) — ISBN 0-309-65340-1 (pdf) 1. Privacy, Right of—United States. 2. Public records—Access control—United States. 3. Freedom of information—United States. 4. Research—Information services. I. National Academies Press (U.S.)
JK468.S4E96 2005
323.44′830973—dc22
2005030054
Additional copies of this report are available from The
National Academies Press,
500 Fifth Street, NW, Lockbox 285, Washington, DC 20055; (800) 624-6242 or (202) 334-3313 (in the Washington metropolitan area); Internet, http://www.nap.edu
Printed in the United States of America
Copyright 2005 by the National Academy of Sciences. All rights reserved.
Suggested citation: National Research Council. (2005). Expanding Access to Research Data: Reconciling Risks and Opportunities. Panel on Data Access for Research Purposes, Committee on National Statistics, Division of Behavioral and Social Sciences and Education. Washington, DC: The National Academies Press.
THE NATIONAL ACADEMIES
Advisers to the Nation on Science, Engineering, and Medicine
The National Academy of Sciences is a private, nonprofit, self-perpetuating society of distinguished scholars engaged in scientific and engineering research, dedicated to the furtherance of science and technology and to their use for the general welfare. Upon the authority of the charter granted to it by the Congress in 1863, the Academy has a mandate that requires it to advise the federal government on scientific and technical matters. Dr. Ralph J. Cicerone is president of the National Academy of Sciences.
The National Academy of Engineering was established in 1964, under the charter of the National Academy of Sciences, as a parallel organization of outstanding engineers. It is autonomous in its administration and in the selection of its members, sharing with the National Academy of Sciences the responsibility for advising the federal government. The National Academy of Engineering also sponsors engineering programs aimed at meeting national needs, encourages education and research, and recognizes the superior achievements of engineers. Dr. Wm. A. Wulf is president of the National Academy of Engineering.
The Institute of Medicine was established in 1970 by the National Academy of Sciences to secure the services of eminent members of appropriate professions in the examination of policy matters pertaining to the health of the public. The Institute acts under the responsibility given to the National Academy of Sciences by its congressional charter to be an adviser to the federal government and, upon its own initiative, to identify issues of medical care, research, and education. Dr. Harvey V. Fineberg is president of the Institute of Medicine.
The National Research Council was organized by the National Academy of Sciences in 1916 to associate the broad community of science and technology with the Academy’s purposes of furthering knowledge and advising the federal government. Functioning in accordance with general policies determined by the Academy, the Council has become the principal operating agency of both the National Academy of Sciences and the National Academy of Engineering in providing services to the government, the public, and the scientific and engineering communities. The Council is administered jointly by both Academies and the Institute of Medicine. Dr. Ralph J. Cicerone and Dr. Wm. A. Wulf are chair and vice chair, respectively, of the National Research Council.
PANEL ON DATA ACCESS FOR RESEARCH PURPOSES
ELEANOR SINGER (Chair),
Survey Research Center, Institute for Social Research, University of Michigan
JOHN M. ABOWD,
School of Industrial and Labor Relations, Cornell University
JOE S. CECIL,
Division of Research, Federal Judicial Center, Washington, DC
GEORGE T. DUNCAN,
Heinz School of Public Policy and Management, Carnegie Mellon University
V. JOSEPH HOTZ,
Department of Economics, University of California at Los Angeles
MICHAEL HURD,
RAND Corporation, Santa Monica, CA
DIANE LAMBERT,
Bell Labs, Lucent Technologies, Murray Hill, NJ
KENNETH PREWITT,
School of International and Public Affairs, Columbia University
RICHARD ROCKWELL,
The Roper Center for Public Opinion Research, University of Connecticut
EUGENIA GROHMAN, Study Director
CHRISTOPHER MACKIE, Study Director (through October 2004)
MARISA GERSTEIN, Research Associate
AGNES GASKIN, Senior Program Assistant
ALLISON SHOUP, Senior Program Assistant (through November 2004)
COMMITTEE ON NATIONAL STATISTICS 2004-2005
WILLIAM F. EDDY (Chair),
Department of Statistics, Carnegie Mellon University
KATHARINE ABRAHAM,
Department of Economics, University of Maryland, and Joint Program in Survey Methodology
ROBERT BELL,
AT&T Research Laboratories, Florham Park, NJ
LAWRENCE D. BROWN,
Department of Statistics, Wharton School, University of Pennsylvania
ROBERT M. GROVES,
Survey Research Center, Institute for Social Research, University of Michigan, and Joint Program in Survey Methodology
JOHN HALTIWANGER,
Department of Economics, University of Maryland
PAUL W. HOLLAND,
Educational Testing Service, Princeton, NJ
JOEL L. HOROWITZ,
Department of Economics, Northwestern University
DOUGLAS MASSEY,
Department of Sociology, Princeton University
VIJAY NAIR,
Department of Statistics and Department of Industrial and Operations Engineering, University of Michigan
DARYL PREGIBON,
Google, Inc., New York
KENNETH PREWITT,
School of International and Public Affairs, Columbia University
LOUISE RYAN,
Department of Biostatistics, Harvard University
NORA CATE SCHAEFFER,
Department of Sociology, University of Wisconsin–Madison
CONSTANCE F. CITRO,
Director
Preface
Neither the issue of access to research data nor that of privacy and confidentiality is new; it is not even new to the National Research Council, which has considered one topic or the other, or the two in conjunction, on numerous occasions in the past. Why, then, this reconsideration?
Chapters 1 and 2 offer several answers to this question, based on what has changed in the external environment, especially since the 1993 publication of Private Lives and Public Policies: Confidentiality and Accessibility of Government Statistics. But perhaps the most immediate cause is discontent on the part of researchers with the speed and scope of access to the very rich research data that have been collected by federal statistical and research agencies.
Juxtaposed against researchers’ demands for increased access are heightened concerns on the part of the agencies and their grantees and contractors about maintaining the confidentiality of their data files. These concerns arise in part from convictions, borne out by research, that perceived risks to privacy and confidentiality reduce survey participation.
In this report, the Panel on Data Access for Research Purposes has tried to reconcile the risks and opportunities arising from increased access by urging a variety of rational solutions: making data available through multiple modes, tailored to the needs of different types of users; undertaking research to improve both the utility and the confidentiality protections of some newer access modes; and measuring the level of research data use as well as the frequency with which confidentiality breaches occur, for example.
Given the panel’s relatively narrow charge and limited resources, however, the report does not address certain less obvious, but not necessarily less important, contributors to the problem. One of these is the lack of resources and structural incentives for making data more readily available. At present, outside researchers appear to gain more than agencies do from the prompt and generous release of confidential data, and they stand to lose less than agencies do if such releases lead to documented breaches of confidentiality. Nor does the report examine the reward structures within the statistical agencies, though we suspect that those structures favor data collection over data dissemination. In short, the report proposes solutions that do not require an in-depth look at, and perhaps change of, the motivational structures that undergird the current system of data collection and dissemination.
Nor does the report attempt to decide how much disclosure risk is acceptable in order to achieve the benefits of greater access to research data. Such a decision involves weighing the potential harm posed by disclosure against the benefits potentially foregone. The panel believes that this decision appropriately belongs to the wider community of those potentially affected by it—users, data collectors, and the people who provide the data.
In framing the response to its charge, the panel drew heavily on existing reports and supplemented these reports by commissioning a series of papers on outstanding issues, written by experts and presented at a workshop open to the public. The workshop was held in October 2003 at the National Academies. A summary of the papers presented, together with a list of participants, is included as Appendix A to our report, which we tried to keep quite brief.
Even brief reports, however, make substantial demands on panel members and staff. The panel thanks Christopher Mackie, who served as study director for much of the panel’s life, for his critical role in guiding the discussions during its four meetings, for organizing the workshop and writing the summary of the presentations, and for his initial drafting of Chapter 3 of the report. We also thank Eugenia Grohman, associate executive director of the Division of Social and Behavioral Sciences and Education (DBASSE), who was the study director for the final stages of the panel’s work and without whose skill, experience, and patience the final report could not have been written. Connie Citro, director of the Committee on National Statistics, provided invaluable guidance and help during the entire process. We are especially indebted to her for incisive contributions to Chapters 2 and 5. We also appreciate the interest and support of Michael Feuer, executive director of DBASSE, and Miron Straf, its deputy director.
Much appreciation is due to the many people who wrote and pre-
sented papers at the Workshop on Access to Research Data: Assessing Risks and Opportunities. Their work contributed substantially to our formulation of both the problems and the proposed solutions, as is evident from the citations to their work throughout the report. We are also appreciative of a letter from the Task Force on Confidentiality of the Association of Public Data Users (March 17, 2004) that raised important issues of methods for protecting confidentiality that facilitate data access.
Richard Suzman, associate director of Behavioral and Social Research at the National Institute on Aging, commissioned this study in order to stimulate more creative approaches to the dissemination of research data. We gratefully acknowledge the financial support of the National Institute on Aging.
Finally, I thank the members of the Panel on Data Access for Research Purposes, themselves an extraordinarily knowledgeable, engaged, and vocal group. Together, they represent many disciplines intimately involved in the use and production of research data—economics, political science, statistics, sociology, survey methodology, and law. They represent, as well, differing perspectives, with some being more concerned about expanding access, others about maintaining confidentiality. The panel’s discussions reflected these differing experiences and perspectives, and the report tries to balance the competing demands. I appreciate the contributions of all the panel members to this report, but three, in particular, generously contributed to its writing: I thank Joe Cecil, George Duncan, and Kenneth Prewitt for their substantial and indispensable help.
This report has been reviewed in draft form by individuals chosen for their diverse perspectives and technical expertise, in accordance with procedures approved by the Report Review Committee of the National Research Council. The purpose of this independent review is to provide candid and critical comments that will assist the institution in making the published report as sound as possible and to ensure that the report meets institutional standards for objectivity, evidence, and responsiveness to the study charge. The review comments and draft manuscript remain confidential to protect the integrity of the deliberative process.
We thank the following individuals for their participation in the review of this report: Martin David, The Urban Institute; Gerald W. Gates, Policy Office, U.S. Census Bureau; Douglas Massey, Woodrow Wilson School of Public and International Affairs, Princeton University; Trivellore Raghunathan, Department of Biostatistics and Institute for Social Research, University of Michigan; and Avi C. Singh, Methodology Research, Statistics Canada, Ottawa, Ontario.
Although the reviewers listed above provided many constructive comments and suggestions, they were not asked to endorse the conclusions or recommendations nor did they see the final draft of the report
before its release. The review of this report was overseen by Richard A. Kulka, Center for Demographic Studies, Duke University. Appointed by the National Research Council, he was responsible for making certain that an independent examination of this report was carried out in accordance with institutional procedures and that all review comments were carefully considered. Responsibility for the final content of this report rests entirely with the authoring panel and the institution.
Eleanor Singer, Chair
Panel on Data Access for Research Purposes