23- Roles for Libraries in Data Citation
Michael Witt1
Purdue University
As a practicing librarian, I will be focusing on the roles for librarians and information professionals in data citation and attribution. I would like to start by answering the question, Why are librarians involved in data, and why are they interested in data citation? If we go back to the workshop on “New Collaborative Relationships: The Role of Academic Libraries in the Digital Data Universe” that was sponsored by the Association For Research Libraries (ARL) and the National Science Foundation (NSF) in September 2006, an important need was identified “… for new partnerships and collaborations among domain scientists, librarians, and data scientists to better manage digital data collections; necessary infrastructure development to support digital data; and the need for sustainable economic models to support long-term stewardship of scientific and engineering digital data for the nation’s cyberinfrastructure.”2
To follow up, in August 2010, the ARL did a survey of its member institutions (approximately 130) and 57 of them responded. Some of the findings include: (1) 21 of them currently provide infrastructure and services for e-Science and data support, and (2) 23 members are in the planning stages.3
This shows that libraries are involved in this area of data curation, at least in the context of academic and research libraries. That is not to say that any of these issues are exclusive to those libraries. In fact, I think that a lot of these needs extend to public libraries and citizen science, and other libraries outside of the university context.
I propose that data citation has “a last mile problem.” In communication networks it is usually easier to connect countries and cities than it is to connect to individual end-nodes, such as houses, especially in rural areas. In the data citation arena, the challenge is: how do we reach and affect a change in practice among end-users of data? How can we reach people who will be writing papers and citing the data? Those users could be students, faculty researchers, citizens, or government agencies, etc.
I believe that a role that librarians can play here is rooted in libraries’ tradition of information literacy outreach and instruction. Information literacy is a set of abilities requiring individuals to recognize when information is needed and have the ability to locate, evaluate, and use effectively the needed information.4 This includes the proper citation and attribution of sources.
______________________
1 Presentation slides are available at http://www.sites.nationalacademies.org/PGA/brdi/PGA_064019.
2 Available at: http://www.arl.org/pp/access/nsfworkshop.shtml.
3 C. Soehner, C. Steeves, and J. Ward, E-Science and Data Support Services: A Study of ARL Member Institutions Association of Research Libraries, 2010. http://www.arl.org/bm~doc/escience_report2010.pdf.
4 American Library Association. 1989, Presidential Committee on Information Literacy. Final Report.
If you look at the Information Literacy Competency Standards for Higher Education5 from the Association of College and Research Libraries (ARCL), you can replace the word “information” with “data” and the competencies make sense and remain relevant.
Where can users look for information on how to cite data? One natural place to turn would be style guides. I did a study with two colleagues, where we looked at 20 different style guides and performed content analysis to see what kind of instructions they are providing users explicitly to cite digital data. The answer is: they do not consistently address data citation and attribution.
FIGURE 24-1 A Description of Data Citation Instructions in Style Guides.
SOURCE: International Digital Curation Conference, Chicago, IL. Retrieved from http://www.docs.lib.purdue.edu/lib_research/121/. Newton, Mooney, & Witt. (2010).
If you look at the above grid, it covers instructions for digital data, data in other formats (e.g., paper-based tables), and other electronic resources. The dark purple indicates the areas where the style guide provides explicit instructions for citation. The light colors (i.e., aqua or white) indicate that there are no explicit instructions. So, generally speaking, some style guides do a better job than others—but if this is where students and others are turning for instructions to properly cite data, they will undoubtedly be frustrated.
______________________
5 Available at: http://www.ala.org/ala/mgrps/divs/acrl/standards/informationliteracycompetency.cfm.
One thing that we see happening on our university campuses is that librarians are stepping in to address this need by creating resource guides. This is a common practice of librarians to develop bibliographies and path-finders to introduce topics and tools to users. Here are some examples of resource guides on data citation that are appearing at universities from their libraries:
• MIT: http://www.libraries.mit.edu/guides/subjects/data/access/citing.html
• MSU: http://www.libguides.lib.msu.edu/citedata
• Minnesota: http://www.lib.umn.edu/datamanagement/cite
• Purdue: http://www.guides.lib.purdue.edu/datacitation
• Oregon: http://www.libweb.uoregon.edu/datamanagement/citingdata.html
• Cambridge: http://www.lib.cam.ac.uk/dataman/pages/citations.html
• Virginia: http://www2.lib.virginia.edu/brown/data/citing.html
These guides are written by librarians in most cases and tailored for their particular audience. They may be tailored for undergraduate or graduate students, faculty researchers, or others.
One project that I would like to briefly talk about is Databib.6 This project was funded through the Institute for Museum and Library Services (IMLS). Here is the description of the project:
The libraries of Purdue University and Penn State University will partner to create a new online information resource for research data producers, users, publishers, librarians, and funding agencies. This resource, Databib, will be an annotated online bibliography of research data repositories, created and maintained by an online community of librarians. Databib will be an important focal point for connecting librarians more closely with other research data stakeholders and demonstrating the significant contributions libraries can make to solving the challenges posed by digital datasets. The Databib platform will also serve as a testbed for linking, integrating, and presenting information about datasets in new ways.7
Databib is essentially a bibliography that describes data repositories. What we are doing is creating a platform for librarians to submit and enhance bibliographic entries that describe these data repositories and do it in a way that is maximally open, using the Creative Commons Zero public domain protocol. If someone wants the list or the metadata, they are free to download and use them. Also, if someone wants to enhance the metadata or annotate them, that is also possible.
We are creating this resource for the community to help users find data as well as to help data producers identify repositories where they can submit their data, to share this information with funding agencies that mandate data management and tell them where data have been submitted, because these directions are unclear in many cases. We also want to test the notion of a bibliography. We will have bibliographic records that can be exported as MARC records, so if someone wants to download them into their library catalogue, they can. Also, if someone wants to integrate them with other Web 2.0 tools, such as social tagging and social bookmarking, Databib will facilitate sharing links and citations. Finally, we want to use this platform to experiment with linked data. We want to create a platform where the descriptions of these data
______________________
6 Databib website, http://databib.lib.purdue.edu.
7 IMLS press release, http://www.imls.gov/grant_awards_announcement_sparks_ignition_grants.aspx.
repositories can be linked in as many ways as possible to other things, whether it is in the same subject area, same agency that supports the data repository, or any other level of linkage. This project is a nine-month project, and Databib will be going online in the spring of 2012.
Going back to the potential role of libraries and librarians, libraries are a primary actor in the scholarly communication chain. I believe that libraries can promote persistence for links to data. Jan Brase talked about DataCite yesterday. There are many libraries that are participating in this effort. I think that libraries need to adopt URI policies. We are creating a lot of digital content and making it available in a lot of different ways with links that break. So, in addition to minting and maintaining unique, global, and persistent identifiers, we can have more general URI policies, which we can advocate for web content across our institutions.
Are libraries presenting our own data in ways that facilitate or encourage citation? Libraries maintain institutional repositories and other digital libraries where they are presenting digital objects, but do we have supporting documentation and FAQs that give users instructions for citation? Do we provide embedded, structured metadata within the web page, such as COinS, micro-formats, or RDF? Do we facilitate exportable citations? Many of our libraries have data services that are doing outreach to faculty members to help them understand data management plans. Before projects are funded and data are generated, there is the opportunity to have a conversation about data-sharing with the different stakeholders. There is an opportunity for advocacy.
I would like to raise awareness of the work being done by the International Association of Social Science Information Services and Technology (IASSIST). I co-chair a special interest group on data citation with Mary Vardigan. Among the over 300 members that IASSIST has, about 40 or 50 of them are involved in this special interest group. Some of the activities that we have been engaged in include an effort to derive a common set of user instructions for citing data. We realize that we would not necessarily be able to use a perfect set of instructions for all cases, but if we can come up with a core set of instructions, that would be very useful. Also, there has been some work to integrate datasets as a resource type in citation management software such as EndNote or RefWorks. Moreover, we are doing some advocacy. We have been writing letters to style guides editors and publishers to encourage them to articulate policies and instructions for data citation to their authors. Also, like many other special interest groups, we are generating resources such as a website and brochure that are publicly available for use.
To conclude, librarians and information professionals can play important roles in advocacy and outreach, and in the integration and citation of data. This includes data citation in reference services and information literacy instruction and standards. Librarians should ask themselves: if we are publishing data, are we making our data citable, and are we incorporating data into information literacy?
One last observation: many libraries are creating new data services units that can help raise awareness of and address issues related to data attribution and citation for their communities. Promoting proper data use and citation should be a part of what we normally do in libraries, a part of our regular practices. There seems to be a trend of libraries addressing research data in a specialized manner, e.g., “data reference” and “data information literacy”. I suggest that, after a period of time, the library profession will become more comfortable with data and will not need
to qualify “data” services as such. The same principles of library science that apply to traditional formats can be applied to data.
The timing seems to be perfect for people to connect and collaborate to address data citation and attribution issues.
This page intentionally left blank.