1. Scholarly Communication
One participant noted the usefulness of integrating data citation practices with an existing system of scholarly publications, which themselves are used to measure and track scholarly output.There has been increasing awareness of the importance of data publication, and increasing pressure from funders to make research data available. However, while there are a number of models of data publication in existence, the practices are still unstable. Some journals are investing in supporting data in conjunction with the articles, while others are discontinuing the supplemental submissions after a trial period of a few years. Institutions are also acting as publishers via institutional repositories, and have a need to get credit, but they cannot enforce compliance in the same way that journals can. The importance of the disciplinary community defining data citation policies came up again and again. The degree of uptake and implementation varies across disciplines, and cross-disciplinary issues lack attention. It was also noted that getting the buy-in from key editors would be important.
It was posited that currently the transaction costs are too high for data publishing, requiring too much work from too few users. In cases where network effects could be realized from aggregating data, then it could become worthwhile for journals or societies to archive data.
Data citation and publication themselves are metaphors taken from scholarly publication, some participants mentioned. There are tensions around applying print publication models to data, especially since IP rights are different for data and the protections offered vary significantly between countries. Moreover, the law does not match what is being done in practice. In order for the metaphors of data citation and publishing to be useful, it can be useful to understand what it is that we want to count and how it is different from other kinds of publications.
2. Data Sharing
Understanding who shares what data and why is an underlying factor for understanding data citation practices. Christine Borgman’s “Conundrum” paper (JASIST, in press?) discusses these incentives and disincentives. It was observed by some participants in the group that there was a fair amount of good will towards sharing across the domains, with comments such as “scholars will share because it is the right thing to do, as long as it is not too much work or too risky” and “every time I share data I learn something”. Data sharing is seen as part of moving the field forward, although funding agencies are requiring it as well.
A large part of the discussion on data sharing was airing concerns about disincentives to such sharing. Foremost were concerns expressed about the cost of curating data. A part of this was the observation that not all data are equal, nor should all data be shared. Scientists have a general fear that their data will be misused, misrepresented, misconstrued, or used for purposes that are antithetical to the scientist. One of the discussants noted that there is currently a public relations attack going on about chronic fatigue syndrome that has escalated to threats against personal safety.
Within the issues about data sharing are also concerns about incentivizing data reuse to drive demand. Data intensive fields may have more incentives to reuse data. There are some common issues across many disciplines, but as one approaches the next level of detail, the constraints for