Center for Studies in Higher Education
University of California at Berkeley
I will be speaking today as an anthropologist who has spent a large part of the last decade thinking deeply about and conducting research on issues of scholarly communication, the future of publishing, and academic values and traditions in a digital age. I am a social science scholar. I am not an advocate for particular approaches nor an administrator or librarian. That being said, I will put my comments today in the context of our research findings regarding the drivers of faculty behavior, the importance of peer review in academic life, and the various incentives and barriers to scholars regarding where and when to share and publish the result of research (including data) over the entire scholarly communication life cycle (not just in final archival publications such as journal articles and books).
First, an important note about our research and methods. Our work is based upon the rigorous qualitative interview, observational, and textual data collected during the six-year Future of Scholarly Communication Project (2005-2011),2 funded by the Andrew W. Mellon Foundation. More detailed information on our sample population and research design can be found in Harley et al. (2010: 13-15) and Harley and Acord (2011: 12-13); I include more specific references throughout this paper. I refer readers to the “thick descriptions” of 12 disciplinary case studies3 and the more extensive literature reviews in these publications. In brief, our sample spans 45 elite research institutions, more than 12 disciplines, and includes the results of more than 160 formal interviews with scholars, administrators, publishers, and others. 4 My comments today will be almost exclusively focused on an elite class of research institutions. One of our motivations has been to analyze what roles universities and faculties play in the resolution of the perceived “crises” in scholarly communication. (And there are of course a number of crises that are field-dependent.) Our premise is that disciplinary traditions and culture matter significantly in both predicting possible futures and the success or failure of policies that attempt to dictate scholarly behavior.
1 Presentation slides are available at http://www.sites.nationalacademies.org/PGA/brdi/PGA_064019.
2 The Future of SC Project Website and Associated Links:
Project site: http://www.cshe.berkeley.edu/research/scholarlycommunication.
Many of the arguments around sharing, time, and credit made here are given in more detail in Acord and Harley (in press), Harley et al. 2010, Harley and Acord, 2011).
3 Disciplines included Anthropology, Biostatistics, Chemical Engineering, Law and Economics, English-language Literature, Astrophysics, Archaeology, Biology, Economics, History, Music, and Political Science. All the interviews have been published as part of thickly described disciplinary case studies. The entire research output is online and open access at http://www.escholarship.org/uc/cshe_fsc.
4 Interview protocols covered a variety of broad questions: Tenure and promotion, making a name; Criteria for disseminating research at various stages (publication practices, new publication outlets, new genres); Sharing (what, with whom, when, why or why not?); Collaboration (with whom, when, why or why not?); Resources created and consumed: needs, discoverability, priorities, data creation and preservation; Public engagement; The future.
As I was exploring the issues related to this particular workshop and the various references the organizers had assembled for the meeting, I found this quote from the Australian National Data Service website of particular interest:
Data citation refers to the practice of providing a reference to data in the same way as researchers routinely provide a bibliographic reference to printed resources. The need to cite data is starting to be recognised as one of the key practices underpinning the recognition of data as a primary research output rather than as a by-product of research. While data has often been shared in the past, it is seldom cited in the same way as a journal article or other publication might be. This culture is, however, gradually changing. If datasets were cited, they would achieve a validity and significance within the cycle of activities associated with scholarly communications and recognition of scholarly effort.5
The last statement (highlighted in bold) is actually quite complex and fraught. I will argue today why it has at least two questionable underlying assumptions. The first, is that by virtue of being citable, data achieve an equal footing with traditional publications in institutional merit review of scholars. The second, is that data standing alone, without an interpretive layer (such as an article or book) and without having been peer reviewed, will be weighted in tenure and promotion decisions the same as traditional publication.
The centrality of career advancement in a scholar’s life
My argument (which I hope is not too circuitous) is that these two assumptions are contrary to what our research would suggest. As we have demonstrated (Harley et al., 2010), the primary drivers of faculty scholarly communication behavior in competitive institutions are career selfinterest, advancing the field, and receiving credit and attribution. Although the institutional peer- review process allows flexibility for differences of discipline and scholarly product, a stellar record of high-impact peer reviewed publications continues to be the most important criterion for judging a successful scholar in tenure and promotion decisions. The formal process of converting research findings into academic discourse through publishing is the concrete way in which research enters into scholarly canons that record progress in a field. And, as the formal version “of record,” peer-reviewed publication establishes proof of concept, precedence, and credit to scholars for their work and ideas in a way that can be formally tracked and cited by others. Accordingly, data sets, exhibitions, tools/instruments, and other ‘subsidiary’ products are awarded far less credit than standard publications unless they are themselves ‘discussed’ in a an interpretive peer-reviewed publication.
The importance placed by tenure and promotion committees, grant review committees, and scholars themselves, on publication in the top peer-reviewed outlets is growing, not decreasing, in competitive research universities (Harley et al., 2010: 7; Harley and Acord 2011). There is a concomitant pressure on all in the academy, including scholars at aspirant institutions globally, to model this singular focus on publish or perish, which we and others would argue translates into a growing glut of low-quality publications and publication outlets. This proliferation of outlets has placed a premium on separating prestige outlets (with their imprimatur as proxies for quality) from those that are viewed as less stringently refereed. Consequently, most scholars choose outlets to publish their work based on three factors: (1) prestige (perceptions of rigor in
5 Australian National Data Service: http://www.ands.org.au/guides/data-citation-awareness.pdf
peer review, selectivity, and “reputation”), (2) relative speed to publication, and (3) highest visibility within a target audience (Harley et al., 2010: 10). As we determined, this system is not likely to disappear soon and certainly will not be overturned by the adoption of new practices by young scholars, who hew, often slavishly, to the norms in their discipline in the interests of personal career advancement.
Securing credit and attribution
We note a continuum by field in how scholars receive attribution for their ideas.. In highly competitive fields like molecular biology, which have a race to publish and a fear of being scooped, early sharing of ideas, data, and working papers is almost unheard of. The archival journal article takes precedence and seals a scholar’s claim on ideas. In relatively smaller (and often high paradigm and/or emerging) fields, with low internal competition, informal mechanisms for reputation management can enforce attribution because academic communities are centrally organized and maintained through face-to-face interaction (often via conferences and workshops). This sharing culture can change, however, with funding and other exigencies of a field. To give just one example, although economics is commonly described as a big sharing group where we are very open about ideas (and has a thriving environment of working papers, Harley et al., 2010: 357), the subfield of neuroeconomics is moving towards less sharing and mirrors practices in some of the biological sciences.
The importance of filters and managing time
As scholars prioritize their core research activities, they struggle to keep up to date and they look for more filters, not fewer, in determining what to pay attention to. In fact, time, and the related need for filters, is cited as one of the most influential variables in a scholar’s decision whether to adopt new scholarly communication practices (Harley et al., 2007, 2010: 97). Most scholars turn to the familiar filters of peer review, perceived selectivity, reputation, and personal networks to filter what they pay attention to.
We would argue that, given this background, and an exceptionally heavy reliance on peer review publications to aid the tenure and promotion committees, it comes as no surprise, that competitive scholars divert much of their energies and activities toward the development and production of archival publications and the circulation of the ideas contained within them rather than focusing, for example, on curating and citing data sets.
Variation in how different scholarly outputs are weighed
Lest you think the way in which scholarship is credited in tenure and promotion decisions in research universities is binary, it is important to note that teaching, service, and the creation of non-peer reviewed scholarship such as data bases or tools, are most certainly credited, but they do not receive as much emphasis as peer reviewed articles or books published in prestigious outlets. Some things are weighted more heavily than others (and what is weighted is field- dependent). For example, people can get credit for developing software or an important database, but that would rarely be the sole criteria in most fields and not given equal weight as publications. We heard again and again that new genres of scholarship in a field are acceptable as long as they are peer reviewed. Examples abound on the emphasis in tenure and promotion
based on interpretive work versus cataloging or curating. Data curation and preservation alone are simply not considered to be a high level of scholarship.
Let me give you some examples. In biology, new and emerging forms of scholarship are not valued much in their own right, but supplemental articles describing a novel tool or a database may be considered positively in review. “Just” developing software or data resources can be perceived as less valuable tool development, rather than scholarship. In history, a scholar who published only footnoted sources, but no interpretation of the sources in the form of a peer reviewed book published by a prestigious press, would not be promoted. That is, new and emerging forms of scholarship (e.g., curating data sets, creating Web-based resources, blogs) are valued only insofar as they are ancillary to the book/monograph.
In political science, scholars often create and publish datasets. Similar to the biological sciences, these efforts can earn a scholar increased visibility when other researchers use their data. Significant institutional credit is only received for this work, however, if a strong peer reviewed publication record, based on the dataset, accompanies it. In archaeology, developing and maintaining databases or resource websites, which is common, is considered a research technique or a service to scholarship but not a substitute for the monograph. In astrophysics (and despite the reliance on arXiv for circulation of early drafts, data, and so on), developing astronomical instrumentation, software, posting announcements, and database creation are considered support roles and are usually ascribed a lower value in advancement decisions than peer reviewed publications.
How will datasets be peer reviewed?
What, do you ask, does any of this discussion have to with motivating scholars to abandon their traditional data creation, citation, and sharing practices in the face of calls (and sometimes mandates) from some journals and funding bodies to publish data sets, particularly in the sciences and quantitative social sciences? These are powerful calls that are motivated by the desire for more transparency in research practice, greater returns on funders’ investments, as well as claims that the growing availability of digital primary source material is creating novel opportunities for research that is significantly different than traditional forms of scholarship. We predict, however, that despite this power, changes in data management practices, as with inprogress scholarly communication, will be heavily influenced by matters of time, credit, personality, and discipline.
I would suggest that the most important question for motivating scholars to conform fully to developing new practices is, How will data creation and curation ever be weighted similarly to traditional publications if not peer reviewed? And who will do the peer review and how? I cannot emphasize enough that we just do not know how or when data will be formally peer reviewed in the same way that journals and books are currently.
Some presume that “open” peer review, a free-for-all, crowd-sourced system, will solve this problem.6 I would reply that such a system would be loaded with intractable problems, not least of which is that scholars, perhaps especially senior scholars, already spend an enormous amount of their time conducting peer review in its myriad forms: evaluating grants, writing letters of reference, mentoring graduate students, responding to emails for feedback on work, and so on. The result is that even established publishers have an exceptionally difficult time recruiting competent reviewers (Harley and Acord, 2011: 25), and most scholars find it difficult to spare the time to conduct these formal reviews, let alone engage in “optional” volunteer and open reviews that will not likely contribute to career advancement.
It is our opinion (which we review at length in Harley and Acord, 2011: 45-48 and Acord and Harley 2011), the lack of uptake of open peer review in a variety of experiments, where commentary is openly solicited and shared by random readers, colleagues, and sometimes editor- invited reviewers (rather than exclusively organized by editors), is not likely to be embraced by the academic community anytime soon. The results of only a few of these experiments indicate that open peer review might have the potential to add value to the traditional closed peer-review process, but that it also exacts large (probably unsustainable) costs in terms of editor, author, and reviewer time. Scholars are likely to avoid en masse such experiments because many do not have the time to sort through existing publisher-vetted material, let alone additional “unvetted” material or material vetted by unknown individuals. In sum, open peer review simply adds one more time consuming circle of activity to a scholar’s limited time budget. Telling support is provided by the recent policy shift of The Journal of Neuroscience (Maunsell, 2010) and the Journal of Experimental Medicine (Borowski, 2011), which recently announced their decisions to cease the publication of supplementary data because reviewers cannot realistically spend the time necessary to review that material closely, and critical information on data or methods needed by readers can be lost in a giant, time-consuming “data dump.” An editorial in the Journal of Experimental Medicine, titled “Enough is Enough"7 makes the case:
Complaints about the overabundance of supplementary information in primary research articles have increased in decibel and frequency in the past several years and are now at cacophonous levels. Reviewers and editors warn that they do not have time to
scrutinize it. Authors contend that the effort and money needed to produce it exceeds that reasonably spent on a single publication. How often readers actually look at supplemental information is unclear, and most journal websites offer the supplement as an optional (item to) download…
6 Or, many will argue that alternative “bibliometrics” measuring popularity and use is the solution. A consequence of the ‘inflationary currency’ in scholarly communication is a growing reliance on bibliometrics, such as the impact factor, and an increasing ‘arms race’ among scholars to publish in the highest impact outlets. As detailed by Harley and Acord (2011: 48-53), there is widespread concern that at this time, and taken alone, alternative (quantitative) metrics for judging scholarly work are much more susceptible to gaming and popularity contests than traditional peer-review processes.
7 Enough is Enough” Christine Borowski, July 4, 2011 Editorial published in the Journal of Experimental Medicine (JEM).
In sum, data sharing is greatly impeded by scholars’ lack of personal time to prepare the data and necessary metadata for deposit and reuse (which includes the sometimes Herculean efforts of converting analog data to digital formats, migrating old digital formats to new ones, or standardizing messy data). For scholars focused on personal credit and career advancement, narrowly defined, there is no advantage to spending time (and grant funding) curating or peer reviewing data, when that same time can be applied to garnering support for the next research project and/or publishing and peer reviewing books and articles. While data sharing may be facilitated by development of new tools and instruments that ensure standardization (such as in gene sequencing), the idiosyncratic ways in which scholars work, and the extreme heterogeneity of data types in most non-computational fields, do not lend themselves to one-size-fits-all models of data sharing. The escalation of funder requirements (e.g., NSF, NIH) for sharing data management plans points to an important space for social scientists to track. We, and others, predict that faculty will not be doing the actual work, but rather a new professional class and academic track (perhaps akin to museum curators, specialist librarians, or tool-builders) may emerge to take on these new scholarly roles (cf: Borgman, 2007; Nature, 2008; Science, 2011; Waters, 2004). They of course will need to be paid and regularized in some fashion. In sum, until issues of time, credit, and peer review are worked out, we predict an uneven and slow adoption by scholars of sharing, curating, and publishing data openly, and hence the citation and attribution of same.
Acord, S.K. and D. Harley (in press). “Credit, Time, and Personality", invited by New Media and Society. Available for open peer review at nms-theme.ehumanities.nl/manuscript/credit-time- and-personality-acord-and-harley
Borgman, C.L. (2007) Scholarship in the Digital Age: Information, Infrastructure, and the Internet Cambridge, MA: The MIT Press.
Borgman, C.L. (2011) The conundrum of sharing research data. Journal of the American Society for Information Science and Technology. (accessed 19 October 2011)
Borowski, C. (2011) Enough is enough. Journal of Experimental Medicine 208 (7): 1337.
Harley, D. and Acord, S.K. (2011) Peer Review in Academic Promotion and Publishing: Its Meaning, Locus, and Future. University of California, Berkeley: Center for Studies in Higher Education. Available at: http://www.escholarship.org/uc/item/1xv148c8.
Harley, D., Acord, S.K., Earl-Novell S., Lawrence, S., and King, C.J. (2010) Assessing the Future Landscape of Scholarly Communication: An Exploration of Faculty Values and Needs in Seven Disciplines, University of California, Berkeley: Center for Studies in Higher Education. Available at: http://www.escholarship.org/uc/cshe_fsc.
Maunsell, J. (2010) Announcement regarding supplemental material. The Journal of Neuroscience 30 (32): 10599-10600.
Nature (2008) Special Issue: Big Data. Nature 455 (7209).
Science (2011) Special Online Collection: Dealing with Data. Science 331 (6018). (Accessed 19 October 2011) http://www.sciencemag.org/site/special/data/.
Waters, D.J. (2004) Building on success, forging new ground: The question of sustainability. First Monday 9 (5).
This page intentionally left blank.
Moderated by Paul F. Uhlir
PARTICIPANT: I was pleased that Sarah Callaghan mentioned the European database directive. What is the status of the database protection legislation in the United States? In the House of Representatives there were the Moorehead Bill in 1996, and the Coble Bill in 1997 and again in 1999, and in the Senate, the Hatch bill in 1998. Then such legislation just disappeared. What happened?
MR. UHLIR: From 1996, there was a coalition of internet service providers, the universities, the libraries, and the academics opposed to the database protection bills. They were just barely hanging on in terms of keeping the legislation from becoming enacted. In 2001, I believe, the Chamber of Commerce made an assessment of the costs and benefits of the law and, since they are not primarily database providers but users, they determined that it would raise the cost of doing business. They said they would keep track of every vote in favor and keep that in mind when they dole out the re-election money. So, that was the end of those legislative proposals and I have not heard about any new proposal since.
PARTICIPANT: I want to ask a question about credit and why anyone would want to share their data. There are real dangers involved in sharing data, not just the work. I thought I would raise it and let someone expound.
DR. HARLEY: Not all data are created equal and not all data want to be shared. I think Christine Borgman has done good research of the reasons why data are not shared.
We think personality has a lot to do with it and that is why some people are sharing and some people are not. It has to do with preprints and early writings as well. When you submit a paper and it is accepted, you have to have some kind of data management plan. Even when journals require data to be available and to be shared, however, we found that less than 10 percent of requests were honored for sharing data. It all goes back to the attitude of: “when I am finished with it, I am happy to share it.” In archeology, for example, that can go on for twenty years.
MS. SMITH: I agree. I think there are strong disciplinary differences in attitudes related to the sharing of data. Some fields have the habit of sharing more than others. There are certainly some fields where sharing is not even discussed. The other issue is that even if it is something researchers are willing to do, they are not going to do it if they have to spend time on it. So, if we made it easy, they might be more inclined to do it.
PARTICIPANT: I have asked investigators from biology if they think their data will be cited if they are forced by journal publishing policies to make them available. Most of them said yes, they thought that they would be cited. My next question was: do you think that citation will be valued by your institutions in peer review of your work? The answer was no.
MS. SMITH: My comment is not based on any actual research, but just my impression from talking to a lot of people. They are sharing their data just because it is the right thing to do. When you can share and there are no negative consequences, you will do it as long as you do not have
to spend a huge amount of time on it. I see this culture becoming more common and policies like those of the NSF are helping. Maybe it will just happen naturally and that we are overemphasizing citation as an incentive. It may be that they just want easier ways to share data because they realize it is the right thing to do.
DR. CHAVAN: As I keep going through all these presentations and especially in this particular session, it becomes clearer to me what the challenges for data citations are. Technology can certainly help in moving in the right direction, but I think that the mindset of those who publish the data and those who are involved in the data management life cycle itself is very critical in moving forward. It is a social and cultural consideration and there is no one single solution to that. I think it has to be worked on every level of the data life cycle.
MS. SMITH: I do not disagree with that, but I think the technology does have a big effect on the costs of this process, and social practices depend a lot on costs.
PARTICIPANT: How much of the data that we care about to drive science is not generated by people who are academic faculty looking for tenure and promotion? Coming from a Federally Funded Research and Development Center (FFRDC), the intellectual property issues are drastically different than those at a university. What are the social factors that affect those generating data in other types of institutions? Those people also want credit.
MS. SMITH: There are many other players in this ecosystem, such as libraries, but they do not depend on citation that much, which I think is why we are not talking about it here. There are different reward mechanisms for librarians and for data curators in national archives.
PARTICIPANT: Do you have any examples of people saying “we have these citations norms, but we do not want to enforce them?” Are there societies with articulated policies that are not legally binding so that they can get around the notion of enforcing credit via the law and instead, have clearer norms?
MS. SMITH: That is how citation works now. I am not legally obliged to cite your article when I use your ideas. There is no intellectual property law that requires me to do that.
DR. GROTH: So then why do we need to discuss these issues?
MS. SMITH: There nonetheless are a lot of researchers using contracts, Creative Commons (CC) licenses, and data usage license agreements who want to be able to mandate credit.
MR. PARSONS: Just a couple of comments on the issue of open data citation as an incentive. I spent too much of my life working on a big international project called the “International Polar Year,” and it had what I think was a landmark data policy pushing for not only open data but also timely release of data. This caused quite a bit of controversy in a lot of different disciplines. We developed citation norms and there is a document associated with the Polar Information Commons, where we have actually documented some of these norms, but that is not what motivates people to share. I think there are two things that motivate people to share. Every time I share data, I learn something and I think that is partly the personality aspect that was mentioned earlier. Sharing is also moving the field forward. So if people can share in a way that makes them feel that they are collaborators, they will be happy to share. However, the most effective way in
getting people to share was when the funding agencies required it by saying that researchers will not get their next year’s funding or the next grant if their data are not available.
DR. HARLEY: I think you are right. I think the stick is probably your best tool, given the way people react, and given their personal motivations, concerns, and fears.
MS. SMITH: If there is a requirement from an agency to share the data in order to get the next grant, researchers will need to prove to the agency that they met the requirement. So, citation does come back as maybe a simpler way of proving that the data were actually shared, because the data have an URI or are deposited at a reputable data archive.
MR. WILBANKS: The main reason why we are talking about these issues is that there is a growing push from the open science movement to put CC waivers or licenses on data, and to draft other licenses for data and databases.
DR. MINSTER: This notion that data work is good science is just fine, but does not obviate the need to address other issues that Diane talked about. We have to change the mindset of those engaged in the scientific process and give proper recognition and advancement to people who spend their life doing this, if they do it well.
DR. HARLEY: I think it starts with the investigators mentoring of their graduate students and post docs. It is the senior scientists setting an example of good practice for those younger individuals in their labs and projects. For example, I know a researcher at UCSF who makes very clear his guidelines with regard to good scholarship in his lab: what he will and will not publish, what are good and bad practices. I also want to emphasize that it is not a black and white issue. It should be acceptable to have different gradations of credit for different cases.
DR. HELLY: The issue about carrots and sticks is kind of a false dichotomy in some sense. The sticks are helpful in encouraging proper data practices and they indicate that the agency values and gives credit for delivering the data products. However, it does not in any way do enough to get the data products to a quality great enough to ensure that other people can actually use those data constructively. People will just take their data and dump them in these repositories. The real motivation that was discussed here is the basic scientific realization that there has to be good practice and that this practice has to be learned and taught. This has not been done. Once we get to that stage, the community minded people would adhere to these policies, especially in the sciences where community data resources are a very powerful tool for people doing individual science. I think in the earth sciences, in particular, this is a very strong motivation, where there are global issues and data have to be put together across the globe.
DR. BORGMAN: I just want to add two policy issues that I am surprised have not come up yet, and I would like the panel to address them. One is the role of embargoes, which are in place in many fields. Employees protect the investigators and the funding agencies. Those who impose embargo periods generally pick a time period that is long enough for people to get their publications out, but short enough to encourage getting the data out. The other point is about data registries. We have found in our research that even if people are not willing to deposit their data, they are willing to register their data with metadata and then one can at least find them. The pointer might be a telephone number or an URI, but the data registry turns out to be a lower bar.
PARTICIPANT: To the extent embargoes are imposed from outside or internally depends on the discipline itself.
MS. SMITH: Yes, embargoes are very important to people. I agree that is key, but the registry is an interesting point because I actually cannot remember a case in which that was an option for people. If registries exist, they are new. I know there are efforts to build them, but it is not on the radar of most of the researchers I have talked to. They think that sharing means putting their data on the web or giving them to an archivist. That might change the culture a little bit, but we have to have better examples of it that people can see.
PARTICIPANT: I am just wondering if in any of your studies whether you have found any correlation between how much data sharing there is versus how much money is in the system? Because the funding levels fluctuate over the years, my intuition would tell me that people are less likely to share when there is less money.
MS. SMITH: This is a very interesting question, I would have guessed the same thing, but that is not what I see in practice, in the health sciences for example, where there is a lot of money. In practice, we are seeing more data sharing in the health sciences than other fields where there is less funding.
DR. HARLEY: One of the things we talked to people about in our research is what do you see in the future? There appear to be different trajectories, depending on the history and economics of the discipline. They can be rather divergent. Whether or not the value systems are going to change with the way data are captured and described in these fields, is unknown at this point.
PARTICIPANT: There is an important question here about data citation and whether that is what we are asking people to do? I think we should keep in mind that data publication and data citation are both metaphors taken from the journal publication system, and I think metaphors are tricky. The danger with any metaphor is to say that it is exactly the same. We want to define what is it that we want to consider about data and how we want the data to be used differently from journals, but still using that same sort of metaphor.
This page intentionally left blank.