Cover Image

PAPERBACK
$48.00



View/Hide Left Panel

For Attribution—

Developing Data Attribution and
Citation Practices and Standards

Summary of an International Workshop

Paul F. Uhlir, Rapporteur

Board on Research Data and Information

Policy and Global Affairs

NATIONAL RESEARCH COUNCIL
OF THE NATIONAL ACADEMIES

THE NATIONAL ACADEMIES PRESS
Washington, D.C.
www.nap.edu



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page R1
For Attribution-- Developing Data Attribution and Citation Practices and Standards Summary of an International Workshop

OCR for page R1

OCR for page R1
For Attribution-- Developing Data Attribution and Citation Practices and Standards Summary of an International Workshop Paul F. Uhlir, Rapporteur Board on Research Data and Information Policy and Global Affairs

OCR for page R1
THE NATIONAL ACADEMIES PRESS 500 Fifth Street, NW Washington, DC 20001 NOTICE: The project that is the subject of this report was approved by the Governing Board of the National Research Council, whose members are drawn from the councils of the National Academy of Sciences, the National Academy of Engineering, and the Institute of Medicine. The members of the committee responsible for the report were chosen for their special competences and with regard for appropriate balance. This project was supported by the Alfred P. Sloan Foundation under Grant No. 2011-3-19, and by the Institute of Museum and Library Services under Grant No. 1042078. This report was prepared as an account of work sponsored by an agency of the United States government. Neither the United States government nor any agency thereof, nor any of their employees, makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States government or any agency thereof. Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the authors and do not necessarily reflect the views of the National Academies or the organizations or agencies that provided support for the project. International Standard Book Number-13: 978-0-309-26728-1 International Standard Book Number-10: 0-309-26728-5 Additional copies of this report are available for sale from the National Academies Press, 500 Fifth Street, NW, Keck 360, Washington, DC 20001; (800) 624-6242 or (202) 334-3313; Internet, http://www.nap.edu/. Copyright 2012 by the National Academy of Sciences. All rights reserved. Printed in the United States of America

OCR for page R1
The National Academy of Sciences is a private, nonprofit, self-perpetuating society of distinguished scholars engaged in scientific and engineering research, dedicated to the furtherance of science and technology and to their use for the general welfare. Upon the authority of the charter granted to it by the Congress in 1863, the Academy has a mandate that requires it to advise the federal government on scientific and technical matters. Dr. Ralph J. Cicerone is president of the National Academy of Sciences. The National Academy of Engineering was established in 1964, under the charter of the National Academy of Sciences, as a parallel organization of outstanding engineers. It is autonomous in its administration and in the selection of its members, sharing with the National Academy of Sciences the responsibility for advising the federal government. The National Academy of Engineering also sponsors engineering programs aimed at meeting national needs, encourages education and research, and recognizes the superior achievements of engineers. Dr. Charles M. Vest is president of the National Academy of Engineering. The Institute of Medicine was established in 1970 by the National Academy of Sciences to secure the services of eminent members of appropriate professions in the examination of policy matters pertaining to the health of the public. The Institute acts under the responsibility given to the National Academy of Sciences by its congressional charter to be an adviser to the federal government and, upon its own initiative, to identify issues of medical care, research, and education. Dr. Harvey V. Fineberg is president of the Institute of Medicine. The National Research Council was organized by the National Academy of Sciences in 1916 to associate the broad community of science and technology with the Academy's purposes of furthering knowledge and advising the federal government. Functioning in accordance with general policies determined by the Academy, the Council has become the principal operating agency of both the National Academy of Sciences and the National Academy of Engineering in providing services to the government, the public, and the scientific and engineering communities. The Council is administered jointly by both Academies and the Institute of Medicine. Dr. Ralph J. Cicerone and Dr. Charles M. Vest are chair and vice chair, respectively, of the National Research Council. www.national-academies.org .

OCR for page R1

OCR for page R1
Steering Committee, Developing Data Attribution and Citation Practices and Standards: An International Workshop Christine Borgman (Chair) Professor and Presidential Chair Graduate School of Education and Information Studies University of California, Los Angeles Steven Jackson Assistant Professor, School of Information, and Director, Technology Policy Culture Research Lab University of Michigan Gary King Albert J. Weatherhead, III. Professor, Department of Government, and Director, Institute for Quantitative Social Science Harvard University David Kochalko Vice President, Business Strategy and Development, IP & Science Thomson Reuters Allen Renear Associate Dean for Research University of Illinois at Urbana-Champaign Graduate School of Library and Information Science Herbert van de Sompel Research Scientist Los Alamos National Lab John Wilbanks Vice President, Creative Commons, Director, Science Commons Creative Commons Project Staff at the National Academies Paul F. Uhlir, Director, Board on Research Data and Information Daniel Cohen Program Officer (on detail from Library of Congress) Cheryl Williams Levey Senior Program Associate v

OCR for page R1
BOARD ON RESEARCH DATA AND INFORMATION MEMBERSHIP (as of the date of this workshop) Michael Lesk, Chair, Rutgers University Roberta Balstad, Vice Chair, Columbia University Maureen Baginski, Serco Francine Berman, Rensselaer Polytechnic Institute R. Steven Berry, University of Chicago Christine Borgman, University of California, Los Angeles Norman Bradburn, University of Chicago Bonnie Carroll, Information International Associates Michael Carroll, American University, Washington College of Law Paul A. David, Stanford Institute for Economic Policy Department of Economics Barbara Entwisle, University of North Carolina Michael Goodchild, University of California, Santa Barbara Alyssa Goodman, Harvard University Margaret Hedstrom, University of Michigan Michael Keller, Stanford University Michael R. Nelson, Georgetown University Daniel Reed, Microsoft Research Cathy H. Wu, University of Delaware and Georgetown University Medical Center vi

OCR for page R1
BOARD ON RESEARCH DATA AND INFORMATION MEMBERSHIP (as of the date of this report) Francine Berman, Cochair, Rensselaer Polytechnic Institute Clifford Lynch, Cochair, Coalition for Networked Information Laura Bartolo, Kent State University Philip Bourne, University of California, San Diego Henry Brady, University of California, Berkeley Mark Brender, GeoEye Foundation Bonnie Carroll, Information International Associates Michael Carroll, Washington College of Law, American University Sayeed Choudhury, Johns Hopkins University Keith Clarke, University of California, Santa Barbara Paul David, Stanford Institute for Economic Policy Research Kelvin Droegemeier, University of Oklahoma Clifford Duke, Ecological Society of America Barbara Entwisle, University of North Carolina Stephen Friend, Sage Bionetworks Margaret Hedstrom, University of Michigan Alexa McCray, Harvard Medical School Alan Title, Lockheed Martin Advanced Technology Center Ann Wolpert, Massachusetts Institute of Technology EX OFFICIO Robert Chen, Columbia University Michael Clegg, University of California, Irvine vii

OCR for page R1
Sara Graves, University of Alabama in Huntsville John Faundeen, Earth Resources Observation and Science Center Eric Kihn, National Geophysical Data Center Chris Lenhardt, Oak Ridge National Laboratory Kathleen Robinette, Air Force Research Laboratory Alex de Sherbinin, Columbia University Board on Research Data and Information Staff Paul F. Uhlir, Board Director Subhash Kuvelker, Senior Program Officer Daniel Cohen, Program Officer (on detail from Library of Congress) Cheryl Williams Levey, Senior Program Associate viii

OCR for page R1
Preface and Acknowledgments The growth of electronic publishing of literature has created new challenges, such as the need for mechanisms for citing online references in ways that can assure discoverability and retrieval for many years into the future. The growth in online datasets presents related, yet more complex challenges. It depends upon the ability to reliably identify, locate, access, interpret and verify the version, integrity, and provenance of digital datasets. Data citation standards and good practices can form the basis for increased incentives, recognition, and rewards for scientific data activities that in many cases are currently lacking in many fields of research. The rapidly-expanding universe of online digital data holds the promise of allowing peer-examination and review of conclusions or analysis based on experimental or observational data, the integration of data into new forms of scholarly publishing, and the ability for subsequent users to make new and unforeseen uses and analyses of the same data either in isolation, or in combination with other datasets. The problem of citing online data is complicated by the lack of established practices for referring to portions or subsets of data. As funding sources for scientific research have begun to require data management plans as part of their selection and approval processes, it is important that the necessary standards, incentives, and conventions to support data citation, preservation, and accessibility be put into place. There are, in fact, a number of initiatives in different organizations, countries, and disciplines already underway. An important set of technical and policy approaches have already been launched by the U.S. National Information Standards Organization (NISO) and other standards bodies regarding persistent identifiers and online linking. Another important group is DataCite. The World Data System is also focusing on these issues, but other initiatives remain ad hoc and uncoordinated. The workshop summarized here was organized by a steering committee under the National Research Council's (NRC's) Board on Research Data and Information, in collaboration with an international CODATA-ICSTI Task Group on Data Citation Standards and Practices. The purpose of the symposium was to examine a number of key issues related to data identification, attribution, citation and linking, to help coordinate activities in this area internationally, and to promote common practices and standards in the scientific community. More specifically, the statement of task for this project asked the following questions: 1. What is the status of data attribution and citation practices in the natural and social (economic and political) sciences in United States and internationally? 2. Why is the attribution and citation of scientific data important and for what types of data? Is there substantial variation among disciplines? 3. What are the major scientific, technical, institutional, economic, legal, and socio-cultural issues that need to be considered in developing and implementing scientific data citation ix

OCR for page R1
standards and practices? Which ones are universal for all types of research and which ones are field or context specific? 4. What are some of the options for the successful development and implementation of scientific data citation practices and standards, both across the natural and social sciences and in major contexts of research? The workshop that was organized pursuant to these questions was held in Berkeley, CA on August 22-23, 2011. The presentations and discussions that are summarized from this meeting in the volume that follows are part of this effort. This report has been prepared by the workshop rapporteur as a factual summary of what occurred at the workshop. The committee's role was limited to planning and convening the workshop. The views contained in the report are those of the individual workshop participants and do not necessarily represent the views of all workshop participants, the planning committee, or the National Academies. Acknowledgments We are grateful to the following for support of this project: Institute of Museum and Library Services, grant number IMLS LG-00-11-0123-11; Sloan Foundation, grant number 2011-3-19; the Committee on Data for Science and Technology (CODATA); and Microsoft Research. Any views, findings, conclusions or recommendations expressed in this publication do not necessarily represent those of the Institute of Museum and Library Services, or the other sponsors. This report has been reviewed in draft form by individuals chosen for their diverse perspectives and technical expertise, in accordance with procedures approved by the National Academies' Report Review Committee. The purpose of this independent review is to provide candid and critical comments that will assist the institution in making its published report as sound as possible and to ensure that the report meets institutional standards for quality and objectivity. The review comments and draft manuscript remain confidential to protect the integrity of the process. We wish to thank the following individuals for their review of this report: Suzanne Allard, University of Tennessee; Anne Fitzgerald, Queensland University, Australia; Charles Humphrey, University of Alberta; Brian McMahon, International Union of Crystallography, United Kingdom; and John Rumble, Information International Associates (retired). Although the reviewers listed above have provided many constructive comments and suggestions, they were not asked to endorse the content of the report, nor did they see the final draft before its release. Responsibility for the final content of this report rests entirely with the rapporteur and the institution. Many people devoted many months of effort to organizing this event. Dan Cohen and Cheryl Levey of the staff of the Board on Research Data and Infrastructure spent much of their 2011 x

OCR for page R1
summer working on the Workshop project. Christine Borgman, Paul Uhlir, and Dan Cohen had conference calls with each session panel to ensure synthesis and continuity. The Workshop was coordinated with the activities of the CODATA-ICSTI Task Group on Data Citation Standards and Practices, whose co-chairs are Bonnie Carroll, Jan Brase, and Sarah Callaghan. Members of that Task Group are (in alphabetical order) Micah Altman, Elisabeth Arnaud, Christine Borgman, Dora Ann Lange Canhos, Todd Carpenter, Vishwas Chavan, Michael Diepenbroek, John Helly, Jianhui Li, Brian McMahon, Karen Morgenroth, Yasuhiro Murayama, Helge Sagen, Eefke Smit, Martie van Deventer, John Wilbanks, and Koji Zettsu. Paul Uhlir, Dan Cohen, and Franciel Linares are staff consultants to the Task Group. Special thanks also are due to the Workshop Steering Committee, consisting of Christine Borgman (Chair), Allen Renear, Herbert van de Sompel, Gary King, Steven Jackson, David Kochalko, and John Wilbanks, as well as to the young scientists who served as rapporteurs in the final afternoon sessions: Franciel Linares, Matthew Mayernick, Jillian Wallis, and Laura Wynholds. Christine Borgman Paul F. Uhlir Steering Committee Chair Project Director xi

OCR for page R1

OCR for page R1
Contents 1- Why Are the Attribution and Citation of Scientific Data Important? ............................................................... 1 Christine Borgman PART ONE - TECHNICAL CONSIDERATIONS .................................................................................................. 9 2- Formal Publication of Data: An Idea Whose Time Has Come? ....................................................................... 11 Jean-Bernard Minster 3- Attribution and Credit: Beyond Print and Citations ......................................................................................... 15 Johan Bollen 4- Data Citation--Technical Issues --Identification .............................................................................................. 23 Herbert Van de Sompel 5- Maintaining the Scholarly Value Chain: Authenticity, Provenance, and Trust.............................................. 31 Paul Groth DISCUSSION BY WORKSHOP PARTICIPANTS .............................................................................................. 35 Moderated by John Wilbanks PART TWO - DISCIPLINE-SPECIFIC ISSUES .................................................................................................. 41 6- Towards Data Attribution and Citation in the Life Sciences ............................................................................ 43 Philip Bourne 7- Data Citation in the Earth and Physical Sciences .............................................................................................. 49 Sarah Callaghan 8- Data Citation for the Social Sciences ................................................................................................................... 55 Mary Vardigan 9- Data Citation in the Humanities: What's the Problem? .................................................................................... 59 Michael Sperberg-McQueen DISCUSSION BY WORKSHOP PARTICIPANTS .............................................................................................. 65 Moderated by Herbert van de Sompel PART THREE - LEGAL, INSTITUTIONAL, AND SOCIO-CULTURAL ASPECTS ..................................... 69 10- Three Legal Mechanisms for Sharing Data ...................................................................................................... 71 Sarah Hinchliff Pearson 11- Institutional Perspective on Credit Systems for Research Data ..................................................................... 77 MacKenzie Smith 12- Issues of Time, Credit, and Peer Review ........................................................................................................... 81 Diane Harley xiii

OCR for page R1
DISCUSSION BY WORKSHOP PARTICIPANTS .............................................................................................. 89 Moderated by Paul F. Uhlir PART FOUR - EXAMPLES OF DATA CITATION INTITIATIVES ................................................................ 93 13- The DataCite Consortium .................................................................................................................................. 95 Jan Brase 14- Data Citation in the Dataverse Network ....................................................................................................... 99 Micah Altman 15- Microsoft Academic Search: An Overview and Future Directions .............................................................. 107 Lee Dirks 16- Data Center-Library Cooperation in Data Publication in Ocean Science ................................................... 109 Roy Lowry 17- Data Citation Mechanism and Service for Scientific Data: Defining a Framework for Biodiversity Data Publishers ...................................................................................................................................................... 113 Vishwas Chavan 18- How to Cite an Earth Science Dataset?........................................................................................................... 117 Mark Parsons 19- Citable Publications of Scientific Data ............................................................................................................ 125 John Helly 20- The SageCite Project ........................................................................................................................................ 131 Monica Duke DISCUSSION BY WORKSHOP PARTICIPANTS ............................................................................................ 137 Moderated by David Kochalko PART FIVE - INSTITUTIONAL PERSPECTIVES ........................................................................................... 141 21- Developing Data Attribution and Citation Practices and Standards: An Academic Institution Perspective ............................................................................................................................................................. 143 Deborah L. Crawford 22- Data Citation and Data Attribution: A View from the Data Center Perspective ........................................ 147 Bruce E. Wilson 23- Roles for Libraries in Data Citation ................................................................................................................ 151 Michael Witt 24- Linking Data to Publications: Towards the Execution of Papers ................................................................. 157 Anita De Waard 25- Linking, Finding, and Citing Data in Astronomy .......................................................................................... 161 Michael J. Kurtz xiv

OCR for page R1
DISCUSSION BY WORKSHOP PARTICIPANTS ............................................................................................ 167 Moderated by Bonnie Carroll 26- Standards and Data Citations .......................................................................................................................... 173 Todd Carpenter 27- Data Citation and Attribution: A Funder's Perspective ............................................................................... 177 Sylvia Spengler DISCUSSSION BY WORKSHOP PARTICIPANTS .......................................................................................... 179 Moderated by Christine Borgman PART SIX SUMMARY OF BREAKOUT SESSIONS ........................................................................................ 187 Breakout Session on Technical Issues.................................................................................................................... 189 Moderator: Martie van Deventer Rapporteur: Franciel Linares Breakout Session on Scientific Issues .................................................................................................................... 193 Moderator: Sarah Callaghan Rapporteur: Matthew Mayernik Breakout Session on Institutional, Financial, Legal, and Socio-cultural Issues................................................. 199 Moderator: Vishwas Chavan Rapporteur: Laura Wynholds Breakout Session on Institutional Roles and Perspectives................................................................................... 209 Moderator: Bonnie Carroll Rapporteur: Jillian Wallis Appendix A: Agenda ............................................................................................................................................... 211 Appendix B: Speaker and Moderator Biographical Information ....................................................................... 217 xv

OCR for page R1