Breakout Session on Institutional Roles and Perspectives
Moderator: Bonnie Carroll
Rapporteur: Jillian Wallis
Several participants began by focusing on the stakeholders and low-level details about the interaction between the stakeholders and the data citations. Others then raised several questions: Who is cited: the data center hosting the data, the data producer, or anyone who has added value to the data? This is really a question of whether the citation is for assigning credit or finding data. It should be noted that there are many stakeholders who add value to the data and it may not be feasible to acknowledge everyone. Who is responsible for generating a citation: the data center hosting the data, some collaboration between the producer and archivist, or the data user consulting with the data producer to create a citation? The credit aspects of citation thus may conflict with the location and discoverability aspects, which have very different sets of requirements.
A number of the participants identified issues that pulled apart the roles of data citation stakeholders. Who should be the citation creator: the data creator responsible for providing a citable thing, or the data user responsible for citing that thing? Who is responsible for collecting metrics? This led to plotting out the events that happen during the life of a data citation and assigning responsible parties. Figure S-1 presents one understanding of how data citations will come to be. Rather than being a representation of the life-cycle of an individual data citation, it instead depicts the life cycle of how data citation practices in general will be created. In this case, life-cycle is perhaps a misnomer, and instead what is captured in the figure is a timeline for organizing all of the interested parties.
It is important to further define the data citation lifecycle and the roles and responsibilities of institutions and people who act at each stage, in order to determine who is missing from this discussion and how we can get them involved.
FIGURE S-1 Data citation lifecycle.
Prior to the actual creation and adoption of data citations, several participants suggested, one option is to develop an understanding of the social ramifications of the data citation and the frameworks with which data citations would need to interact. This understanding could come from academic research on data practices. At the top level, research funders, universities, and journal publishers could think about developing a data citation policy that supports their respective needs and creates incentives to encourage data citation.
Using such a base of understanding and policy, many parties may wish to work in parallel to make data citation a reality. Research communities can define the data citation elements that are meaningful to them. Journal publishers and standards bodies can define general data citation layouts that are both machine and human-readable. In order for a data citation to be created: (i) the data need to have been generated by someone, and (ii) the data need to be available with enough information attached in order to create the data citation. The data generator or the data center hosting the data will then make the actual citation content available. The data users are responsible for actually using the data citation in their publications. The derivative data cycle here refers to the practice of creating derivative datasets from other datasets. A new form of data citation could be developed in order to take this practice into account, and can involve some combination of the original data generators or hosts and the data users in a new data citation or a data citation that expands into multiple data citations.
Once the various standards are in play, several participants remarked that training and education would be useful about how and when data citations can be used. The university libraries are perhaps well positioned to reach out to the academic communities they support. Finally, commercial parties can aggregate data citations, much like citations are aggregated to characterize scholarly communication in the literature.