ENSURING THE INTEGRITY, ACCESSIBILITY, AND STEWARDSHIP OF RESEARCH DATA IN THE DIGITAL AGE

Committee on Ensuring the Utility and Integrity of Research Data in a Digital Age

Committee on Science, Engineering, and Public Policy

NATIONAL ACADEMY OF SCIENCES, NATIONAL ACADEMY OF ENGINEERING, AND INSTITUTE OF MEDICINE OF THE NATIONAL ACADEMIES

THE NATIONAL ACADEMIES PRESS

Washington, D.C.
www.nap.edu



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page R1
ENSURING THE INTEGRITY, ACCESSIBILITY, AND STEWARDSHIP OF RESEARCH DATA IN THE DIGITAL AGE Committee on Ensuring the Utility and Integrity of Research Data in a Digital Age Committee on Science, Engineering, and Public Policy

OCR for page R1
THE NATIONAL ACADEMIES PRESS 500 Fifth Street, N.W. Washington, DC 20001 NOTICE: The project that is the subject of this report was approved by the Governing Board of the National Research Council, whose members are drawn from the councils of the National Academy of Sciences, the National Academy of Engineering, and the Institute of Medicine. The members of the committee responsible for the report were chosen for their special competences and with regard for appropriate balance. This study was supported by the National Research Council, United States Department of Agri - culture, National Aeronautics and Astronautics Administration, United States Geological Survey, United States Department of Health and Human Services, United States Department of Energy, Eli Lilly and Company, Burroughs Wellcome Fund, Nature Publishing Group, The Rockefeller University Press, New England Journal of Medicine, American Chemical Society, Federation of American Societies for Experimental Biology, American Association for the Advancement of Sci - ence, American Geophysical Union and IEEE. The material is based upon work supported by NASA under award #NNX07AP21G. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Aeronautics and Space Administration. This material is also based upon work supported by the Department of Energy [Office of Science] under Award Number DE-FG02-08ER15926. Disclaimer: This report was prepared as an account of work sponsored by an agency of the United States government. Neither the United States govern - ment nor any agency thereof, nor any of their employees, makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States government or any agency thereof. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States government or any agency thereof. Library of Congress Cataloging-in-Publication Data Committee on Science, Engineering, and Public Policy (U.S.). Committee on Ensuring the Utility and Integrity of Research Data in a Digital Age. Ensuring the integrity, accessibility, and stewardship of research data in the digital age / Committee on Ensuring the Utility and Integrity of Research Data in a Digital Age, Committee on Science, Engineering, and Public Policy. p. cm. Includes bibliographical references and index. ISBN-13: 978-0-309-13684-6 (pbk.); ISBN-10: 0-309-13684-9 (pbk.); ISBN-13: 978-0-309-13685-3 (pdf); ISBN-10: 0-309-13685-7 (pdf) 1. Research—Technological innovations. 2. Information technology—Scientific applications. 3. Electronic information resources—Management—United States. 4. Electronic information resources—Access control. I. Title. Q180.55.I45C66 2009 001.40285′58—dc22 2009036322 Cover graphic provided by Well-Formed.Eigenfactor (http://well-formed.eigenfactor.org/), a cooperation between Moritz Stefaner (visualization design) and the Eigenfactor Project (data analysis). Additional copies of this report are available from the National Academies Press, 500 Fifth Street, N.W., Lockbox 285, Washington, DC 20055; (800) 624-6242 or (202) 334-3313 (in the Washington metropolitan area); Internet, http://www.nap.edu. Copyright 2009 by the National Academy of Sciences. All rights reserved. Printed in the United States of America

OCR for page R1
The National Academy of Sciences is a private, nonprofit, self-perpetuating society of distinguished scholars engaged in scientific and engineering research, dedicated to the furtherance of science and technology and to their use for the general welfare. Upon the authority of the charter granted to it by the Congress in 1863, the Academy has a mandate that requires it to advise the federal government on scientific and technical matters. Dr. Ralph J. Cicerone is president of the National Academy of Sciences. The National Academy of Engineering was established in 1964, under the charter of the National Academy of Sciences, as a parallel organization of outstanding engineers. It is autonomous in its administration and in the selection of its members, sharing with the National Academy of Sciences the responsibility for advising the federal government. The National Academy of Engineering also sponsors engineering programs aimed at meeting national needs, encourages education and research, and recognizes the superior achievements of engineers. Dr. Charles M. Vest is president of the National Academy of Engineering. The Institute of Medicine was established in 1970 by the National Academy of Sciences to secure the services of eminent members of appropriate professions in the examina - tion of policy matters pertaining to the health of the public. The Institute acts under the responsibility given to the National Academy of Sciences by its congressional charter to be an adviser to the federal government and, upon its own initiative, to identify issues of medical care, research, and education. Dr. Harvey V. Fineberg is president of the Institute of Medicine. The National Research Council was organized by the National Academy of Sciences in 1916 to associate the broad community of science and technology with the Academy’s purposes of furthering knowledge and advising the federal government. Functioning in accordance with general policies determined by the Academy, the Council has become the principal operating agency of both the National Academy of Sciences and the National Academy of Engineering in providing services to the government, the public, and the scientific and engineering communities. The Council is administered jointly by both Academies and the Institute of Medicine. Dr. Ralph J. Cicerone and Dr. Charles M. Vest are chair and vice chair, respectively, of the National Research Council. www.national-academies.org

OCR for page R1

OCR for page R1
COMMITTEE ON ENSURING THE UTILITY AND INTEGRITY OF RESEARCH DATA IN A DIGITAL AGE DANIEL KLEPPNER (Co-Chair), Lester Wolfe Professor of Physics, Emeritus, Massachusetts Institute of Technology, Cambridge PHILLIP A. SHARP (Co-Chair), Institute Professor, The David H. Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge MARGARET A. BERGER, Professor of Law, Brooklyn Law School, Brooklyn, New York NORMAN M. BRADBURN, Tiffany & Margaret Blake Distinguished Service Professor Emeritus, University of Chicago, Washington, DC JOHN BRAUMAN, J. G. Jackson–C. J. Wood Professor Emeritus, Department of Chemistry, Stanford University, Stanford, California JENNIFER T. CHAYES, Managing Director, Microsoft Research New England, Cambridge, Massachusetts ANITA JONES, Lawrence R. Quarles Professor of Engineering and Applied Sciences, School of Engineering and Applied Sciences, University of Virginia, Charlottesville LINDA P. B. KATEHI, Provost and Vice Chancellor for Academic Affairs, University of Illinois, Urbana-Champaign NEAL F. LANE, Malcolm Gillis University Professor and Senior Fellow of the James A. Baker III Institute for Public Policy, Rice University, Houston, Texas W. CARL LINBERGER, E.U. Condon Distinguished Professor of Chemistry and Fellow, Joint Institute for Laboratory Astrophysics, University of Colorado, Boulder RICHARD LUCE, Vice Provost and Director of University Libraries, Robert W. Woodruff Library, Emory University, Atlanta, Georgia THOMAS O. MCGARITY, Joe R. and Teresa Lozano Long Endowed Chair in Administrative Law, School of Law, University of Texas, Austin STEvEN M. PAUL, Executive Vice President, S&T and President, Lilly Research Laboratories, Eli Lilly & Company, Indianapolis, Indiana TERESA A. SULLIvAN, Provost and Executive Vice President for Academic Affairs and Professor of Sociology, University of Michigan, Ann Arbor MICHAEL S. TURNER, Bruce V. Diana M. Rauner Distinguished Service Professor and Chair, Department of Astronomy and Astrophysics, University of Chicago, Chicago, Illinois J. ANTHONY TYSON, Distinguished Professor of Physics, Department of Physics, University of California, Davis 

OCR for page R1
STEvEN C. WOFSY, Abbott Lawrence Rotch Professor of Atmospheric and Environmental Sciences, Department of Earth and Planetary Sciences, Harvard University, Cambridge, Massachusetts Principal Project Staff THOMAS ARRISON, Study Director (after July 2007) DEBORAH D. STINE, Study Director (up to July 2007) STEvE OLSON, Consultant Writer NEERAJ P. GORKHALY, Senior Program Assistant ALBERT SWISTON, Christine Mirzayan Science & Technology Policy Graduate Fellow SAGE ARBOR, Christine Mirzayan Science & Technology Policy Graduate Fellow i

OCR for page R1
COMMITTEE ON SCIENCE, ENGINEERING AND PUBLIC POLICY GEORGE M. WHITESIDES (Chair), Woodford L. and Ann A. Flowers Professor of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts CLAUDE R. CANIZARES, Vice President for Research, Associate Provost, Bruno Rossi Professor of Physics, Massachusetts Institute of Technology, Cambridge RALPH J. CICERONE (Ex officio), President, National Academy of Sciences, Washington, DC EDWARD F. CRAWLEY, Professor of Aeronautics and Astronautics and of Engineering Systems, Department of Aeronautics and Astronautics, Massachusetts Institute of Technology, Cambridge RUTH A. DAvID, President and CEO of ANSER Institute for Homeland Security (Analytic Services, Inc.), Arlington, Virginia HAILE T. DEBAS, Chancellor Emeritus, University of California, San Francisco HARvEY FINEBERG (Ex officio), President, Institute of Medicine, Washington, DC JACQUES S. GANSLER, Roger C. Lipitz Chair in Public Policy and Private Enterprise, School of Public Policy, University of Maryland, College Park ELSA M. GARMIRE, Sydney E. Junkins Professor of Engineering, Dartmouth College, Hanover, New Hampshire M. R. C. GREENWOOD (Ex officio), Chair, PGA, and Professor of Nutrition and Internal Medicine, University of California, Davis W. CARL LINEBERGER, Professor of Chemistry, University of Colorado, Boulder C. DAN MOTE, JR. (Ex officio), President, University of Maryland, College Park ROBERT M. NEREM, Professor and Director, Parker H. Petit Institute for Bioengineering and Bioscience, Georgia Institute of Technology, Atlanta LAWRENCE T. PAPAY, CEO and Principal, PQR, LLC, Maineville, Ohio ANNE C. PETERSEN, Deputy Director, Center for Advanced Study in the Behavioral Sciences, Stanford University, Palo Alto, California SUSAN C. SCRIMSHAW, Interim President, Sage Colleges, Troy, New York WILLIAM J. SPENCER, Chairman Emeritus, SEMATECH, Austin, Texas LYDIA THOMAS (Ex officio), Co-Chair, GUIRR, and Chairman and CEO, Mitretek Systems, Falls Church, Virginia CHARLES M. vEST (Ex officio), President, National Academy of Engineering, Washington, DC NANCY S. WEXLER, Higgins Professor of Neuropsychology, Columbia University, New York, New York ii

OCR for page R1
MARY LOU ZOBACK, Vice President for Earthquake Risk Applications, Risk Management Solutions, Inc., Newark, California Staff WILLIAM OSTENDORFF, Director MARION RAMSEY, Administrative Associate PETER HUNSBERGER, Financial Associate

OCR for page R1
Preface Data are the foundation on which scientific, engineering, and medical knowledge is built. The generation, analysis, communication, and preservation of data are in a period of profound change, and research is being similarly transformed. The development and rapid advance of digital technologies have enabled immense quantities of data to be created, processed, and disseminated around the world. These data can capture the characteristics of phenomena in far greater detail and with a dynamic verisimilitude never before possible. Data from different fields are being combined, yielding deep insights into formerly intractable problems. The open sharing of data, tools, and services over the Internet is creating new ways of carrying out research and new relationships among researchers. New research topics and fields are emerging between the boundaries of traditional disciplines, and the questions that investigators can address are rapidly expanding. These changes in the nature and conduct of research are greatly enhancing the capabilities of researchers. However, these changes also are posing chal - lenges, and in some cases they have had negative consequences. A major impetus for this study was a letter sent from the editors of several leading journals to National Academy of Sciences President Ralph Cicerone (see Appendix C) pointing out that the improper manipulation of digital images submitted to scholarly journals has become a significant issue for editors and publishers. More broadly, changes in the use of research data have raised the stakes for the methods traditionally used to ensure the integrity and utility of data. Research data and results are increasingly critical inputs to a widening variety of policy debates and decisions. Transparency on the part of investigators with regard to the collection of data, methods of analysis, and presentation of results is essential for the research enterprise to serve the public as an objective source of unbiased ix

OCR for page R1
x PREFACE information. In that regard, another major impetus for this report was the recent controversy over the interpretation and use of data to reconstruct historical changes in global temperatures. In this case, the combination of an important policy topic, differences in data-sharing expectations between fields, and unclear expectations among researchers and members of the public opened researchers to heightened scrutiny, skepticism, and even harassment. As plans for this study took shape, it became clear that the issues involving research data extend well beyond the most immediate connotations of the term “data integrity.” Thus, the charge issued to our committee asked us to look at several critical issues: An ad hoc committee will conduct a study of issues that have arisen from the evolution of practices in the collection, processing, oversight, publishing, ownership, accessing, and archiving of research data. The key questions to be addressed are: 1. What are the growing varieties of research data? In addition to issues concerned with the direct products of research, what issues are involved in the treatment of raw data, prepublication data, materials, algorithms, and computer codes? 2. Who owns research data, particularly that which results from federally funded research? Is it the public? The research institution? The lab? The researcher? 3. To what extent is a scientist responsible for supplying research data to other scientists (including those who seek to reproduce the research) and to other parties who request them? Is a scientist responsible for supplying data, algorithms, and computer codes to other scientists who request them? 4. What challenges do the science and technology community face arising from actions that would compromise the integrity of research data? What steps should be taken by the science and technology community, research institutions, journal publishers, and funders of research in response to these challenges? 5. What are the current standards for accessing and maintaining research data, and how should these evolve in the future? How might such standards differ for federally funded and privately funded research, and for research conducted in academia, government, nongovernmental organizations, and industry? The study will not address privacy issues and other issues related to human subjects. At our committee’s first meeting, it quickly became apparent that even this wide-ranging charge did not encompass the full range of pressing issues

OCR for page R1
xi PREFACE involving research data. Digital technologies have been changing research at a pace that would have been hard to predict even a decade ago. Practices and expectations for data sharing vary considerably from field to field and are rapidly evolving. National and homeland security concerns affect the policy environment governing access to various types of data. In some areas the costs of maintaining collections and transferring them to new digital media raise questions about who is responsible for undertaking and financing long-term stewardship. A growing variety of investigators and research fields face difficult choices involving trade-offs between sustaining existing data collections and performing new research. The purpose of this report is to explore the evolving roles and responsibili- ties of researchers, research institutions, research sponsors, journals, publishers, and others in generating, analyzing, disseminating, and preserving research data. Many of the methods used to validate the quality of data, make data available to other researchers, and preserve data for future uses are unique to specific disciplines. Focusing on these discipline-specific methods would yield a report that is both too narrow and too transitory given the transformative influence of rapidly changing technologies. Instead, we decided to base our report on the broad principles that have characterized science and engineering research for hundreds of years and will continue to do so in the future. In particular, we decided to focus on three broad and intertwined issues that we have characterized as integrity, access, and stewardship. For each of these issues, we state a general principle that applies throughout the research enterprise. We then use these three broad principles to formulate recommendations that apply in more specific circumstances. We have also highlighted, within the text and in sidebars in each chapter, useful efforts by researchers, institutions, research fields, research sponsors, professional societies, and journals to facilitate the realization of our broad objectives. And we have identified issues—some new and some old—that will need continued attention as technology continues to reshape the research enterprise. Although this report addresses all of the components of the research enter- prise, its primary focus is on the roles and responsibilities of the investigator. This is appropriate, given the composition of the committee and the nature of the task. The actions of researchers inevitably influence all the other parts of the research enterprise, and each of these parts also has responsibilities in main- taining the integrity, accessibility, and stewardship of research data. However, researchers must take the lead in addressing new and pressing issues involving research data. In general, the report attempts to reflect the perspectives of indi - vidual researchers in different fields with respect to the generation, preserva - tion, and sharing of research data in science as a whole and in specific fields. Following the Executive Summary, Chapter 1 introduces the main issues covered in the report by examining the terms used in the report and the vari - eties of research data. Chapter 2, on the integrity of research data, looks at

OCR for page R1
xii PREFACE the challenges to data integrity created by rapidly changing technologies and at responses to those challenges. Chapter 3 discusses the responsibility for researchers to make publicly available the data on which research results are based, and the variety of challenges this poses in different fields and settings. And Chapter 4 describes the long-term value of research data and methods to preserve data for future uses. The changes in the daily practices and activities of researchers due to the rapidly changing technologies provide a unique opportunity to reinforce and extend the traditional openness and collaborative nature of science. In preparing this report, our committee has taken advantage of a number of studies by the National Academy of Sciences, the National Academy of Engineering, the Institute of Medicine and the National Research Council. Appendix B provides a list of recent reports on relevant subjects. For example, the committee spent some time reviewing and discussing a recent controversy over the interpretation and use of data to reconstruct historical changes in global temperatures, as described in the 2006 NRC report Surface Temperature Reconstruction for the Last 2,000 Years. The importance of data in research and in societal decisions will continue to increase as science and engineering exert an ever greater influence on society and as digital technologies continue to remake our world. The committee and the members of the Committee on Science, Engineering, and Public Policy hope and trust that this report will stimulate further dialogue to strengthen science and engineering in a data-rich world. Phillip A. Sharp Daniel Kleppner Massachusetts Institute of Technology Massachusetts Institute of Technology

OCR for page R1
Acknowledgment of Reviewers This report has been reviewed in draft form by individuals chosen for their diverse perspectives and technical expertise, in accordance with procedures approved by the National Academies’ Report Review Committee. The purpose of this independent review is to provide candid and critical comments that will assist the institution in making its published report as sound as possible and to ensure that the report meets institutional standards for objectivity, evidence, and responsiveness to the study charge. The review comments and draft manu - script remain confidential to protect the integrity of the process. We wish to thank the following individuals for their review of this report: Frederick Anderson, McKenna, Long & Aldridge LLP; Michael Carroll, American University; Ian Foster, Argonne National Laboratory; John Graham, Indiana University; Myron Gutmann, Inter-University Consortium for Political and Social Research; Henry Horbaczewski, Reed Elsevier, Inc.; Jerome Kassirer, Tufts University; Michael Keller, Stanford University; Joan Lippincott, Coalition for Networked Information; David Moorman, Social Sciences and Humanities Research Council of Canada; James Ostell, National Library of Medicine; Robert Pike, Google; David Robinson, Rutgers University; Sandford Shattil, University of California, San Diego; and John White, University of Arkansas. Although the reviewers listed above have provided many constructive comments and suggestions, they were not asked to endorse the conclusions or recommendations, nor did they see the final draft of the report before its release. The review of this report was overseen by William Press, University of Texas, Austin and Warren Washington, National Center for Atmospheric Research. Appointed by the National Academies, they were responsible for making certain that an independent examination of this report was carried out in accordance with institutional procedures and that all review comments were carefully considered. Responsibility for the final content of this report rests entirely with the authoring committee and the institution. xiii

OCR for page R1

OCR for page R1
Contents SUMMARY 1 1 RESEARCH DATA IN THE DIGITAL AGE 11 Challenges Posed by Research Data in a Digital Age, 19 Descriptions of Terms Used in the Report, 22 The Varieties of Research Data, 27 Structure of the Report, 29 2 ENSURING THE INTEGRITY OF RESEARCH DATA 33 The Roles of Data Producers, Providers, and Users, 40 The Collective Scrutiny of Research Data and Results, 41 Peer Review and Other Means for Ensuring the Integrity of Data, 43 Data Integrity in the Digital Age and the Role of Data Professionals, 50 General Principle for Ensuring the Integrity of Research Data, 51 The Obligations of Researchers to Ensure the Integrity of Research Data, 51 The Importance of Training, 54 Producing Clear, Up-to-Date Standards for Data Integrity: A Shared Responsibility of the Research Enterprise, 56 The Roles of Data Professionals, 57 3 ENSURING ACCESS TO RESEARCH DATA 59 Barriers to Sharing Data, 63 The Costs of Limiting Access to Data, 70 Data Access Issues in Research Affecting Public Policy or Private Interests, 71 Ownership of Research Data and Related Products, 73 x

OCR for page R1
xi CONTENTS Legal and Policy Requirements for Access to Data, 80 The International Dimensions of Access to Research Data, 83 General Principle for Enhancing Access to Research Data, 84 Responsibilities of Researchers, 86 Responsibilities of Research Fields, 88 Responsibilities of Research Institutions, Research Sponsors, Professional Societies, and Journals, 90 4 PROMOTING THE STEWARDSHIP OF RESEARCH DATA 95 The Loss and Underutilization of Research Data, 96 Infrastructure and Incentives for the Stewardship of Data, 99 Annotating Data for Long-Term Use, 106 Fostering Data Stewardship for the Broad Research Enterprise, 107 General Principle for Enhancing the Stewardship of Research Data, 109 Responsibilities of Researchers, 109 Responsibilities of Research Institutions, Research Sponsors, and Journals, 112 5 DEFINING ROLES AND RESPONSIBILITIES 115 Assigning Roles and Responsibilities, 115 Researchers, 115 Research Institutions, 118 Research Sponsors, 119 Professional Societies and Journals, 119 Conclusion, 120 APPENDIXES A Biographical Information on the Committee Members 121 B Relevant National Academy of Sciences, National Academy of Engineering, Institute of Medicine, and National Research Council Reports 133 C Letters from Journals 143 INDEX 155