For Attribution—
Developing Data Attribution and
Citation Practices and Standards
Summary of an International Workshop
Paul F. Uhlir, Rapporteur
Board on Research Data and Information
Policy and Global Affairs
NATIONAL RESEARCH COUNCIL
OF THE NATIONAL ACADEMIES
THE NATIONAL ACADEMIES PRESS
Washington, D.C.
www.nap.edu
THE NATIONAL ACADEMIES PRESS 500 Fifth Street, NW Washington, DC 20001
NOTICE: The project that is the subject of this report was approved by the Governing Board of the National Research Council, whose members are drawn from the councils of the National Academy of Sciences, the National Academy of Engineering, and the Institute of Medicine. The members of the committee responsible for the report were chosen for their special competences and with regard for appropriate balance.
This project was supported by the Alfred P. Sloan Foundation under Grant No. 2011-3-19, and by the Institute of Museum and Library Services under Grant No. 1042078. This report was prepared as an account of work sponsored by an agency of the United States government. Neither the United States government nor any agency thereof, nor any of their employees, makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States government or any agency thereof. Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the authors and do not necessarily reflect the views of the National Academies or the organizations or agencies that provided support for the project.
International Standard Book Number-13: 978-0-309-26728-1
International Standard Book Number-10: 0-309-26728-5
Additional copies of this report are available for sale from the National Academies Press, 500 Fifth Street, NW, Keck 360, Washington, DC 20001; (800) 624-6242 or (202) 334-3313; Internet, http://www.nap.edu/.
Copyright 2012 by the National Academy of Sciences. All rights reserved.
Printed in the United States of America
THE NATIONAL ACADEMIES
Advisers to the Nation on Science, Engineering and Medicine
The National Academy of Sciences is a private, nonprofit, self-perpetuating society of distinguished scholars engaged in scientific and engineering research, dedicated to the furtherance of science and technology and to their use for the general welfare. Upon the authority of the charter granted to it by the Congress in 1863, the Academy has a mandate that requires it to advise the federal government on scientific and technical matters. Dr. Ralph J. Cicerone is president of the National Academy of Sciences.
The National Academy of Engineering was established in 1964, under the charter of the National Academy of Sciences, as a parallel organization of outstanding engineers. It is autonomous in its administration and in the selection of its members, sharing with the National Academy of Sciences the responsibility for advising the federal government. The National Academy of Engineering also sponsors engineering programs aimed at meeting national needs, encourages education and research, and recognizes the superior achievements of engineers. Dr. Charles M. Vest is president of the National Academy of Engineering.
The Institute of Medicine was established in 1970 by the National Academy of Sciences to secure the services of eminent members of appropriate professions in the examination of policy matters pertaining to the health of the public. The Institute acts under the responsibility given to the National Academy of Sciences by its congressional charter to be an adviser to the federal government and, upon its own initiative, to identify issues of medical care, research, and education. Dr. Harvey V. Fineberg is president of the Institute of Medicine.
The National Research Council was organized by the National Academy of Sciences in 1916 to associate the broad community of science and technology with the Academy’s purposes of furthering knowledge and advising the federal government. Functioning in accordance with general policies determined by the Academy, the Council has become the principal operating agency of both the National Academy of Sciences and the National Academy of Engineering in providing services to the government, the public, and the scientific and engineering communities. The Council is administered jointly by both Academies and the Institute of Medicine. Dr. Ralph J. Cicerone and Dr. Charles M. Vest are chair and vice chair, respectively, of the National Research Council.
This page intentionally left blank.
Steering Committee, Developing Data Attribution and Citation Practices and Standards: An International Workshop
Christine Borgman (Chair)
Professor and Presidential Chair
Graduate School of Education and Information Studies
University of California, Los Angeles
Steven Jackson
Assistant Professor, School of Information, and
Director, Technology Policy Culture Research Lab
University of Michigan
Gary King
Albert J. Weatherhead, III. Professor, Department of Government, and
Director, Institute for Quantitative Social Science
Harvard University
David Kochalko
Vice President, Business Strategy and Development, IP & Science
Thomson Reuters
Allen Renear
Associate Dean for Research
University of Illinois at Urbana-Champaign
Graduate School of Library and Information Science
Herbert van de Sompel
Research Scientist
Los Alamos National Lab
John Wilbanks
Vice President, Creative Commons,
Director, Science Commons
Creative Commons
Project Staff at the National Academies
Paul F. Uhlir, Director, Board on Research Data and Information
Daniel Cohen Program Officer
(on detail from Library of Congress)
Cheryl Williams Levey
Senior Program Associate
BOARD ON RESEARCH DATA AND INFORMATION
MEMBERSHIP (as of the date of this workshop)
Michael Lesk, Chair, Rutgers University
Roberta Balstad, Vice Chair, Columbia University
Maureen Baginski, Serco
Francine Berman, Rensselaer Polytechnic Institute
R. Steven Berry, University of Chicago
Christine Borgman, University of California, Los Angeles
Norman Bradburn, University of Chicago
Bonnie Carroll, Information International Associates
Michael Carroll, American University, Washington College of Law
Paul A. David, Stanford Institute for Economic Policy Department of Economics
Barbara Entwisle, University of North Carolina
Michael Goodchild, University of California, Santa Barbara
Alyssa Goodman, Harvard University Margaret Hedstrom, University of Michigan
Michael Keller, Stanford University
Michael R. Nelson, Georgetown University
Daniel Reed, Microsoft Research
Cathy H. Wu, University of Delaware and Georgetown University Medical Center
BOARD ON RESEARCH DATA AND INFORMATION
MEMBERSHIP (as of the date of this report)
Francine Berman, Cochair, Rensselaer Polytechnic Institute
Clifford Lynch, Cochair, Coalition for Networked Information
Laura Bartolo, Kent State University
Philip Bourne, University of California, San Diego
Henry Brady, University of California, Berkeley
Mark Brender, GeoEye Foundation
Bonnie Carroll, Information International Associates
Michael Carroll, Washington College of Law, American University
Sayeed Choudhury, Johns Hopkins University
Keith Clarke, University of California, Santa Barbara
Paul David, Stanford Institute for Economic Policy Research
Kelvin Droegemeier, University of Oklahoma
Clifford Duke, Ecological Society of America
Barbara Entwisle, University of North Carolina
Stephen Friend, Sage Bionetworks
Margaret Hedstrom, University of Michigan
Alexa McCray, Harvard Medical School
Alan Title, Lockheed Martin Advanced Technology Center
Ann Wolpert, Massachusetts Institute of Technology
EX OFFICIO
Robert Chen, Columbia University
Michael Clegg, University of California, Irvine
Sara Graves, University of Alabama in Huntsville
John Faundeen, Earth Resources Observation and Science Center
Eric Kihn, National Geophysical Data Center
Chris Lenhardt, Oak Ridge National Laboratory
Kathleen Robinette, Air Force Research Laboratory
Alex de Sherbinin, Columbia University
Board on Research Data and Information Staff
Paul F. Uhlir, Board Director
Subhash Kuvelker, Senior Program Officer
Daniel Cohen, Program Officer (on detail from Library of Congress)
Cheryl Williams Levey, Senior Program Associate
Preface and Acknowledgments
The growth of electronic publishing of literature has created new challenges, such as the need for mechanisms for citing online references in ways that can assure discoverability and retrieval for many years into the future. The growth in online datasets presents related, yet more complex challenges. It depends upon the ability to reliably identify, locate, access, interpret and verify the version, integrity, and provenance of digital datasets.
Data citation standards and good practices can form the basis for increased incentives, recognition, and rewards for scientific data activities that in many cases are currently lacking in many fields of research. The rapidly-expanding universe of online digital data holds the promise of allowing peer-examination and review of conclusions or analysis based on experimental or observational data, the integration of data into new forms of scholarly publishing, and the ability for subsequent users to make new and unforeseen uses and analyses of the same data - either in isolation, or in combination with other datasets.
The problem of citing online data is complicated by the lack of established practices for referring to portions or subsets of data. As funding sources for scientific research have begun to require data management plans as part of their selection and approval processes, it is important that the necessary standards, incentives, and conventions to support data citation, preservation, and accessibility be put into place.
There are, in fact, a number of initiatives in different organizations, countries, and disciplines already underway. An important set of technical and policy approaches have already been launched by the U.S. National Information Standards Organization (NISO) and other standards bodies regarding persistent identifiers and online linking. Another important group is DataCite. The World Data System is also focusing on these issues, but other initiatives remain ad hoc and uncoordinated.
The workshop summarized here was organized by a steering committee under the National Research Council’s (NRC’s) Board on Research Data and Information, in collaboration with an international CODATA-ICSTI Task Group on Data Citation Standards and Practices. The purpose of the symposium was to examine a number of key issues related to data identification, attribution, citation and linking, to help coordinate activities in this area internationally, and to promote common practices and standards in the scientific community. More specifically, the statement of task for this project asked the following questions:
1. What is the status of data attribution and citation practices in the natural and social (economic and political) sciences in United States and internationally?
2. Why is the attribution and citation of scientific data important and for what types of data? Is there substantial variation among disciplines?
3. What are the major scientific, technical, institutional, economic, legal, and socio-cultural issues that need to be considered in developing and implementing scientific data citation
standards and practices? Which ones are universal for all types of research and which ones are field or context specific?
4. What are some of the options for the successful development and implementation of scientific data citation practices and standards, both across the natural and social sciences and in major contexts of research?
The workshop that was organized pursuant to these questions was held in Berkeley, CA on August 22-23, 2011. The presentations and discussions that are summarized from this meeting in the volume that follows are part of this effort.
This report has been prepared by the workshop rapporteur as a factual summary of what occurred at the workshop. The committee’s role was limited to planning and convening the workshop. The views contained in the report are those of the individual workshop participants and do not necessarily represent the views of all workshop participants, the planning committee, or the National Academies.
Acknowledgments
We are grateful to the following for support of this project: Institute of Museum and Library Services, grant number IMLS LG-00-11-0123-11; Sloan Foundation, grant number 2011-3-19; the Committee on Data for Science and Technology (CODATA); and Microsoft Research. Any views, findings, conclusions or recommendations expressed in this publication do not necessarily represent those of the Institute of Museum and Library Services, or the other sponsors.
This report has been reviewed in draft form by individuals chosen for their diverse perspectives and technical expertise, in accordance with procedures approved by the National Academies’ Report Review Committee. The purpose of this independent review is to provide candid and critical comments that will assist the institution in making its published report as sound as possible and to ensure that the report meets institutional standards for quality and objectivity. The review comments and draft manuscript remain confidential to protect the integrity of the process.
We wish to thank the following individuals for their review of this report: Suzanne Allard, University of Tennessee; Anne Fitzgerald, Queensland University, Australia; Charles Humphrey, University of Alberta; Brian McMahon, International Union of Crystallography, United Kingdom; and John Rumble, Information International Associates (retired).
Although the reviewers listed above have provided many constructive comments and suggestions, they were not asked to endorse the content of the report, nor did they see the final draft before its release. Responsibility for the final content of this report rests entirely with the rapporteur and the institution.
Many people devoted many months of effort to organizing this event. Dan Cohen and Cheryl Levey of the staff of the Board on Research Data and Infrastructure spent much of their 2011
summer working on the Workshop project. Christine Borgman, Paul Uhlir, and Dan Cohen had conference calls with each session panel to ensure synthesis and continuity. The Workshop was coordinated with the activities of the CODATA-ICSTI Task Group on Data Citation Standards and Practices, whose co-chairs are Bonnie Carroll, Jan Brase, and Sarah Callaghan. Members of that Task Group are (in alphabetical order) Micah Altman, Elisabeth Arnaud, Christine Borgman, Dora Ann Lange Canhos, Todd Carpenter, Vishwas Chavan, Michael Diepenbroek, John Helly, Jianhui Li, Brian McMahon, Karen Morgenroth, Yasuhiro Murayama, Helge Sagen, Eefke Smit, Martie van Deventer, John Wilbanks, and Koji Zettsu. Paul Uhlir, Dan Cohen, and Franciel Linares are staff consultants to the Task Group. Special thanks also are due to the Workshop Steering Committee, consisting of Christine Borgman (Chair), Allen Renear, Herbert van de Sompel, Gary King, Steven Jackson, David Kochalko, and John Wilbanks, as well as to the young scientists who served as rapporteurs in the final afternoon sessions: Franciel Linares, Matthew Mayernick, Jillian Wallis, and Laura Wynholds.
Christine Borgman | Paul F. Uhlir |
Steering Committee Chair | Project Director |
This page intentionally left blank.
Contents
1- Why Are the Attribution and Citation of Scientific Data Important?
Christine Borgman
PART ONE - TECHNICAL CONSIDERATIONS
2- Formal Publication of Data: An Idea Whose Time Has Come?
Jean-Bernard Minster
3- Attribution and Credit: Beyond Print and Citations
Johan Bollen
4- Data Citation—Technical Issues—Identification
Herbert Van de Sompel
5- Maintaining the Scholarly Value Chain: Authenticity, Provenance, and Trust
Paul Groth
DISCUSSION BY WORKSHOP PARTICIPANTS
Moderated by John Wilbanks
PART TWO - DISCIPLINE-SPECIFIC ISSUES
6- Towards Data Attribution and Citation in the Life Sciences
Philip Bourne
7- Data Citation in the Earth and Physical Sciences
Sarah Callaghan
8- Data Citation for the Social Sciences
Mary Vardigan
9- Data Citation in the Humanities: What’s the Problem?
Michael Sperberg-McQueen
DISCUSSION BY WORKSHOP PARTICIPANTS
Moderated by Herbert van de Sompel
PART THREE - LEGAL, INSTITUTIONAL, AND SOCIO-CULTURAL ASPECTS
10- Three Legal Mechanisms for Sharing Data
Sarah Hinchliff Pearson
11- Institutional Perspective on Credit Systems for Research Data
MacKenzie Smith
12- Issues of Time, Credit, and Peer Review
Diane Harley
DISCUSSION BY WORKSHOP PARTICIPANTS
Moderated by Paul F. Uhlir
PART FOUR - EXAMPLES OF DATA CITATION INTITIATIVES
Jan Brase
14- Data Citation in the Dataverse Network ®
Micah Altman
15- Microsoft Academic Search: An Overview and Future Directions
Lee Dirks
16- Data Center-Library Cooperation in Data Publication in Ocean Science
Roy Lowry
Vishwas Chavan
18- How to Cite an Earth Science Dataset?
Mark Parsons
19- Citable Publications of Scientific Data
John Helly
Monica Duke
DISCUSSION BY WORKSHOP PARTICIPANTS
Moderated by David Kochalko
PART FIVE - INSTITUTIONAL PERSPECTIVES
Deborah L. Crawford
22- Data Citation and Data Attribution: A View from the Data Center Perspective
Bruce E. Wilson
23- Roles for Libraries in Data Citation
Michael Witt
24- Linking Data to Publications: Towards the Execution of Papers
Anita De Waard
25- Linking, Finding, and Citing Data in Astronomy
Michael J. Kurtz
DISCUSSION BY WORKSHOP PARTICIPANTS
Moderated by Bonnie Carroll
26- Standards and Data Citations
Todd Carpenter
27- Data Citation and Attribution: A Funder’s Perspective
Sylvia Spengler
DISCUSSSION BY WORKSHOP PARTICIPANTS
Moderated by Christine Borgman
PART SIX SUMMARY OF BREAKOUT SESSIONS
Breakout Session on Technical Issues
Moderator: Martie van Deventer
Rapporteur: Franciel Linares
Breakout Session on Scientific Issues
Moderator: Sarah Callaghan
Rapporteur: Matthew Mayernik
Breakout Session on Institutional, Financial, Legal, and Socio-cultural Issues
Moderator: Vishwas Chavan
Rapporteur: Laura Wynholds
Breakout Session on Institutional Roles and Perspectives
Moderator: Bonnie Carroll
Rapporteur: Jillian Wallis
This page intentionally left blank.
DISCUSSION BY WORKSHOP PARTICIPANTS
Moderated by Bonnie Carroll
26- Standards and Data Citations
Todd Carpenter
27- Data Citation and Attribution: A Funder’s Perspective
Sylvia Spengler
DISCUSSSION BY WORKSHOP PARTICIPANTS
Moderated by Christine Borgman
PART SIX SUMMARY OF BREAKOUT SESSIONS
Breakout Session on Technical Issues
Moderator: Martie van Deventer
Rapporteur: Franciel Linares
Breakout Session on Scientific Issues
Moderator: Sarah Callaghan
Rapporteur: Matthew Mayernik
Breakout Session on Institutional, Financial, Legal, and Socio-cultural Issues
Moderator: Vishwas Chavan
Rapporteur: Laura Wynholds
Breakout Session on Institutional Roles and Perspectives
Moderator: Bonnie Carroll
Rapporteur: Jillian Wallis