National Academy of Sciences | 150 Year Anniversary

Questions? Call 800-624-6242

| Items in cart [0]

The National Academies Press

PAPERBACK
price:$48.00
add to cart

Rights & Permissions

topleft topright

For Attribution -- Developing Data Attribution and Citation Practices and Standards: Summary of an International Workshop (2012)
Board on Research Data and Information (BRDI)

Citation Manager

. "17- Data Citation Mechanism and Service for Scientific Data: Defining a Framework for Biodiversity Data Publishers." For Attribution -- Developing Data Attribution and Citation Practices and Standards: Summary of an International Workshop. Washington, DC: The National Academies Press, 2012.

Please select a format:

BibTeX EndNote RefMan


Page
113
bottomleft bottomright
Page
113
Front Matter (R1-R18)
Why Are the Attribution and Citation of Scientific Data Important? (1-10)
2- Formal Publication of Data: An Idea Whose Time Has Come? (11-14)
3- Attribution and Credit: Beyond Print and Citations (15-22)
4- Data Citation - Technical Issues - Identification (23-30)
5- Maintaining the Scholarly Value Chain: Authenticity, Provenance, and Trust (31-42)
6- Towards Data Attribution and Citation in the Life Sciences (43-48)
7- Data Citation in the Earth and Physical Sciences (49-54)
8- Data Citation for the Social Sciences (55-58)
9- Data Citation in the Humanities: What's the Problem? (59-70)
10- Three Legal Mechanisms for Sharing Data (71-76)
11- Institutional Perspective on Credit Systems for Research Data (77-80)
12- Issues of Time, Credit, and Peer Review (81-94)
13- The DataCite Consortium (95-98)
14- Data Citation in the Dataverse Network (99-106)
15- Microsoft Academic Search: An Overview and Future Directions (107-108)
16- Data Center-Library Cooperation in Data Publication in Ocean Science (109-112)
17- Data Citation Mechanism and Service for Scientific Data: Defining a Framework for Biodiversity Data Publishers (113-116)
18- How to Cite an Earth Science Dataset? (117-124)
19- Citable Publications of Scientific Data (125-130)
20- The SageCite Project (131-142)
21- Developing Data Attribution and Citation Practices and Standards: An Academic Institution Perspective (143-146)
22- Data Citation and Data Attribution: A View from the Data Center Perspective (147-150)
23- Roles for Libraries in Data Citation (151-156)
24- Linking Data to Publications: Towards the Execution of Papers (157-160)
25- Linking, Finding, and Citing Data in Astronomy (161-172)
26- Standards and Data Citations (173-176)
27- Data Citation and Attribution: A Funder's Perspective (177-188)
Breakout Session on Technical Issues (189-192)
Breakout Session on Scientific Issues (193-198)
Breakout Session on Institutional, Financial, Legal, and Socio-cultural Issues (199-208)
Breakout Session on Institutional Roles and Perspectives (209-210)
Appendix A: Agenda (211-216)
Appendix B: Speaker and Moderator Biographical Information (217-220)

Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 113
17- Data Citation Mechanism and Service for Scientific Data: Defining a Framework for Biodiversity Data Publishers Vishwas Chavan1 Global Biodiversity Information Facility I am going to focus on how we are working to resolve the issue of data citation for biodiversity data at the Global Biodiversity Information Facility (GBIF), located in Copenhagen, Denmark. For those who have not heard about GBIF, it is a multilateral intergovernmental initiative established in 2001 with 52 countries as members and 47 international organizations. GBIF's main objective is to facilitate free and open access to biodiversity data. Our data is available through a portal and currently, includes 312,000,000 data records about existence of lands and animals across the globe from over 1800 data resources that has been contributed by 342 data publishers. Why do we think that data citation is important? We believe that data citation will encourage our data publishers to publish more and more datasets. Therefore, it will improve data discovery. It will also provide some kind of encouragement for data preservation. Furthermore, it will provide incentives to those who use the data through improving the credibility of the interpretations that are based on the data. What is the current practice of data citation in the GBIF network? Let me explain this with an example. A user comes to GBIF's portal and searches for the term "Panthera Tigris." She gets 696 records from 37 different datasets, which are published by 31 different publishers. The current citation style just says "access through GBIF data portal" and lists out all the access points of those 37 datasets. The problem with this practice is that it doesn't tell me what was the search string unless and until I can make an explicit statement about it, how many records were retrieved, how many data publishers contributed to the retrieved data, when search was carried out, who are the original contributors of the data, and who plays what role in the process from collection to publishing of the data? So, certainly there is a need to work around these challenges. What is needed is a data citation mechanism with a defined citation style that can provide recognition to all stakeholders involved with their roles, such as who is the producer of the data, who is the publisher, who is the aggregator, and who provided curation service to the data. Given the complexity of our network, we require cascading citations, which are citations within the citations. Furthermore, we need a data citation service whereby a publisher can go and register its citation and all documents of metadata. Finally, we need a discovery service, which resolves to the full-text citation and links to the underlying data. One of the first things that we think we need is a best practices guide for how to cite data. For that, we require two types of recommended styles. One is related to publisher supplied dataset citation and the other is related to query based citations. The publisher supplied dataset citation would obviously need to consider the types of publishers (e.g., an individual, a group of 1 Presentation slides are available at http://sites.nationalacademies.org/PGA/brdi/PGA_064019. 113

OCR for page 114
114 DEVELOPING DATA ATTRIBUTION AND CITATION PRACTICES AND STANDARDS individuals, or an institution). We also need to recognize individual's role in creating the dataset. We need also to identify when it was first released or whether it is a one-time release or frequently updated. Also, the citation should link back to the primary URI of the dataset and then the citation itself needs to have a persistent identifier (preferably DOI) so the entire citation string can be resolved. Also, we need to consider the date of the first release, the latest updates, and the number of data records that we can actually access from a particular dataset. Table 17-1 provides a sampling of GBIF's styles for potential citation strings or styles for the publisher supplied citations. Complete formulation Short formulation Style 1 Publisher (individual) with one-time release of dataset Publisher (YEAR), , , published , . point>, released on, . Style 2 Publisher (individual) with frequent update or release of dataset Publisher (YEAR). , , published , , first released on, , . (date)>, Persistent Identifier. Style 3 Publisher (group of individuals) with one time release of dataset Publisher 1, ..... and Publisher n (YEAR). , , published <modes of Persistent Identifier. publishing>, , released on <release date>, . Style 4 Publisher (group of individuals) with frequent update or release of dataset Publisher 1, ..... and Publisher n <YEAR). <Title of the data Publisher 1 et.al. <YEAR (Year resource>, , published . publishing>, , first released on<release <Version no., or last date>, , (date)>, . Persistent Identifier. Style 5 Institute/consortium (multiple contributors) with one time release of dataset <Publisher as Institution / (YEAR), , , Research Group / Consortium> contributor n(role)>, , , released on, . Style 6 Institute/consortium (multiple contributors) with frequent update or release of dataset. <Publisher as Institution / , resource>, , <Contributed by <YEAR (Year first published / contributor 1(role), contributor 2 (role)..... contributor released -)>, <Version no., or n(role)>, , , point>,, . <p style='font-size:6pt'> OCR for page 115<br> DATA CITATION MECHANISM AND SERVICE FOR SCIENTIFIC DATA 115 In the case of the query based citations, where we need to have citations within citations, there are two types of citations that we think are required. One is query based citations and the other publisher supplied dataset citations. Such a citation needs to have multiple types of persistent identifiers that have been assigned or used by publishers themselves. So, going back to the example of the user who searched for the term "Panthera Tigris", a hypothetical exemplification of this search is presented in Figure 17-1. This query based citation will resolve to complete computer citation and it can also link back to the snapshot of the retrieved data, which are cited. This is how it will look like when you resolve the DOI: http://data.gbif.net (2010). user doi:09.1111/gbif.9.11.444. Full text composite citation http://data.gbif.net (2010). Search string:Panthera tigris, 696 records, contributed by 37 data resources, user doi: 09.1111/gbif.9.11.444, accessed on 04/11/2010, 10:03:30. User driven citation 1.Louisian State University (2007), Museum of Natural Science: Collection of Mammal, 36000 records. Contributed by Patterson DN (Principal Investigator, Institutional architect, dataset,author), onetimeSandeep PK doi release, (author, curator), Fieldman LN (author, developer), Remsen D (curator, validator), published online http://www.museum.lsu.edu/MNS/mammcoll.hml, released on October 2007, doi: 09.1111/lsu.9.11.559. 2.Michigan State University (2001 -), MSU Vertebrate Collection, 76523 records. Contributed by Cook DK (Principal Investigator, author, curator, validator), Institutional Hirsh L (author, dataset, architect, frequent update, lsid developer), Lane MP (manager, author, curator)............, Morris JH (curator), published online http://musuem.msu.edu/ResearchandCollections/DVNH, first released on 01/10/2001, last updated on 18/01/2010, urn:lsid:msu.org:observation:541. 3.Cursada PK, Bello J, and AJK Moelicker (2006), Natural History Museum Rotterdam: Mammal collection, 1123 records, published online, http://www.nlbif.nl/nhmr_mc/, released on 7 July 2006, http://nhmr.nl/ark:/1205/693xz693. Multiple authors, frequent update, ARK .............................................................................................................................................. ...................................................................................................................................................... ...................... Single author, frequent update, handel 37. Rumble KJ (1998 -). Vertebarte collection of Rumble 1960-1999. 786 records, published online, http://www.sbnature.org/rumble_collection/, first released on 13/09/1998, last updated on 27/01/2010, http://hdl.oclc.gov/sbnature/5678. FIGURE 17-1 Hypothetical search result. Let me conclude with a summary of the implementation and next steps. The main challenge to implement this mechanism is the complexity of data management itself. How do we make sure that all our data publishers are going to follow through the citation style that is being proposed? There is also the complexity of the data network because many publishers publish the data through more than one access point. Therefore, we urgently need to have all these citation styles propagated in the form of a best practice guide. However, we also need to remember that there are social challenges related to updating the current practices. Finally, somebody has to come forward to run the data citation service. These are some of the challenges that we are currently trying to address. 115 <p style='font-size:6pt'> OCR for page 116<br> <p style='font-size:6pt'> </body> </html>