Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 209
Breakout Session on Institutional Roles and Perspectives
Moderator: Bonnie Carroll
Rapporteur: Jillian Wallis
Several participants began by focusing on the stakeholders and low-level details about the
interaction between the stakeholders and the data citations. Others then raised several questions:
Who is cited: the data center hosting the data, the data producer, or anyone who has added value
to the data? This is really a question of whether the citation is for assigning credit or finding data.
It should be noted that there are many stakeholders who add value to the data and it may not be
feasible to acknowledge everyone. Who is responsible for generating a citation: the data center
hosting the data, some collaboration between the producer and archivist, or the data user
consulting with the data producer to create a citation? The credit aspects of citation thus may
conflict with the location and discoverability aspects, which have very different sets of
requirements.
A number of the participants identified issues that pulled apart the roles of data citation
stakeholders. Who should be the citation creator: the data creator responsible for providing a
citable thing, or the data user responsible for citing that thing? Who is responsible for collecting
metrics? This led to plotting out the events that happen during the life of a data citation and
assigning responsible parties. Figure S-1 presents one understanding of how data citations will
come to be. Rather than being a representation of the life-cycle of an individual data citation, it
instead depicts the life cycle of how data citation practices in general will be created. In this case,
life-cycle is perhaps a misnomer, and instead what is captured in the figure is a timeline for
organizing all of the interested parties.
It is important to further define the data citation lifecycle and the roles and responsibilities of
institutions and people who act at each stage, in order to determine who is missing from this
discussion and how we can get them involved.
FIGURE S-1 Data citation lifecycle.
209
OCR for page 210
210 DEVELOPING DATA ATTRIBUTION AND CITATION PRACTICES AND STANDARDS
Prior to the actual creation and adoption of data citations, several participants suggested, one
option is to develop an understanding of the social ramifications of the data citation and the
frameworks with which data citations would need to interact. This understanding could come
from academic research on data practices. At the top level, research funders, universities, and
journal publishers could think about developing a data citation policy that supports their
respective needs and creates incentives to encourage data citation.
Using such a base of understanding and policy, many parties may wish to work in parallel to
make data citation a reality. Research communities can define the data citation elements that are
meaningful to them. Journal publishers and standards bodies can define general data citation
layouts that are both machine and human-readable. In order for a data citation to be created: (i)
the data need to have been generated by someone, and (ii) the data need to be available with
enough information attached in order to create the data citation. The data generator or the data
center hosting the data will then make the actual citation content available. The data users are
responsible for actually using the data citation in their publications. The derivative data cycle
here refers to the practice of creating derivative datasets from other datasets. A new form of data
citation could be developed in order to take this practice into account, and can involve some
combination of the original data generators or hosts and the data users in a new data citation or a
data citation that expands into multiple data citations.
Once the various standards are in play, several participants remarked that training and education
would be useful about how and when data citations can be used. The university libraries are
perhaps well positioned to reach out to the academic communities they support. Finally,
commercial parties can aggregate data citations, much like citations are aggregated to
characterize scholarly communication in the literature.