A Contractually Reconstructed Research Commons for Scientific Data: International Considerations1
Jerome Reichman and Paul Uhlir
Duke University Law School, United States, and The National Academies, United States
The importance of public-domain data for scientific research is taken so much for granted that it is difficult to identify the precise boundaries of this domain, to describe its operations, and to evaluate the normative and legal infrastructure that supports it. Individual researchers know a little about the public domain as it affects their discipline, but very few understand how the overall system works.
Two new challenges emerge in this context. The first is the advent of digital networks, which transforms the traditional modes of exchanging scientific data and increases the payoffs that both science and industry can draw from open access. The other challenge is a growing array of economic, legal, and technical pressures that threaten to impede and even to disrupt the continued operations of this same public domain for scientific data.2
Our investigation reveals that the policy of open access to public research data rests on a surprisingly fragile foundation in both legal and normative senses. As universities and scientists increasingly aim to commercialize their research products and profit from them, their willingness to exchange data, along with other research tools, has been seriously compromised.
There is evidence that informal exchanges of data in some fields, such as biomedical research, between individual scientists and laboratories, usually at the prepublication stage, have become severely compromised, with as much as 50 percent denials of requests for information.3 At the same time, interuniversity exchanges of data are subject increasingly to high transaction costs, delays, and a growing risk of anticommons effects,4 that is, too many intellectual property rights and commercial interests making it difficult to put together complex transactions. As relations between universities and industry become more intense there is a growing ability of the industrial partner to impose the terms of exchange on the universities.
The advent of strong new intellectual property rights, such as the European Commission’s directive on the legal protection of databases, strain this very delicate situation and can disproportionately affect scientists’ access to and use of factual data historically in the public domain. This database right allows the scientist or the university to publish the article and still retain ownership of the data after publication. Similarly, this database right makes it possible for scientists or universities to apply for patents and disclose the data in the patent but still retain ownership of the supporting data even after the patent expires.
Such a database right when combined with other economic and legal pressures could become the hub of a growing enclosure movement that could progressively fence off the public domain for scientific data. It could reduce the flow of data as an input into national systems of innovation, and into the international system. Policy makers therefore must take steps now to address these challenges and to ward off this threat of undue enclosure in order to be able to exploit the new benefits of digitally linked databases.
THE CHALLENGE FOR SCIENCE
These trends could elicit one of two types of responses. One response is essentially reactive, in which the scientific community adjusts to the pressures as best it can without organizing a response to the increasing encroachment of a commercial ethos upon its upstream data resources. The other would require science policy to confront the challenge by formulating a strategy that would enable the scientific community to take charge of its basic data supply and to manage the resulting research commons in ways that preserve its public-good functions without impeding socially beneficial commercial opportunities.
Under the first alternative the research community can join the enclosure movement and profit from it. Thus both universities and independent laboratories or investigators that already transfer publicly funded technology to the private sector can also profit from the licensing of databases. In that case data flows supporting public science will have to be constructed deal by deal with all the transaction costs this entails and with the further risk of bargaining to impasse. The ability of researchers to access and aggregate the information they need to produce discoveries and innovations may be compromised both by the shrinking dimensions of the public domain and by the demise of the sharing ethos in the nonprofit research community, as these same universities and research centers increasingly see each other as competitors rather than partners in a common venture. Carried to an extreme this competition of research entities against one another, conducted by their respective legal offices, could obstruct and disrupt the scientific data commons.
To avoid these outcomes the other option is for the scientific community to take its own data management problems in hand. The idea is to reinforce and recreate, by voluntary means, a public space in which the traditional sharing ethos can be preserved and insulated from the commoditizing trends identified above. In approaching this option the community’s assets are the formal structures that surround government-funded data and the ability of the funding agencies to regulate the terms on which data are disseminated and used. The first programmatic response would look to the strengthening of existing institutional, cultural, and contractual mechanisms that already support the research commons, with a view to better addressing the new threats to the public domain identified above. The second logical response is to react collectively to new information laws and related economic and technical pressures by negotiating contractual agreements between stakeholders to preserve and enhance the research commons.
In the United States the government generates a vast public domain for its own data by a creative use of three instruments: intellectual property rights, contracts, and new technologies of communication and delivery. By long tradition the federal government has used these instruments differently from the rest of the world. It waives its property rights in government-generated information, it contractually mandates that such information should be provided at the marginal cost of dissemination, and it has been a major proponent and user of the Internet to make its information as widely available as possible. In other words, it has deliberately made use of existing intellectual property rights, contracts, and technologies to construct a research commons for the flow of scientific data as a public good. The unique combination of these instruments is a key aspect of the success of our national research enterprise.
Now that the research commons has come under attack in the United States and elsewhere, the challenge is not only to strengthen a demonstrably successful system at the governmental level but is also to extend and adapt this methodology to the changing university environment and to the new digitally networked research environment. In other words, universities, not-for-profit research institutes, and academic investigators, all of whom depend on the sharing of data, should stipulate in their own treaties or contractual arrangements to ensure unimpeded access to and unrestricted use of commonly needed raw materials in a public or quasi-public space, even though many such institutions or actors may separately engage in transfers of information for economic gain. This initiative in turn will require the federal government as the primary funder—acting through the science agencies—to join with the universities and scientific organizations to develop suitable contractual templates that can be used to regulate or influence the research commons.
Implementing these proposals will require nuanced solutions tailor-made to the needs of government, academia, and industry in general and to the specific exigencies of different scientific disciplines. The following sections briefly summarize my proposals for preserving and promoting the public-domain status of government-generated scientific data and of government-funded scientific data, respectively. In this scenario data flows between scientists will have to be constructed deal by deal, with all the risks of bargaining to impasse that we already see in biotechnology today.
PROPOSALS FOR THE GOVERNMENT SECTOR
To preserve and maintain the traditional public-domain functions of government-generated data the United States will have to adjust its existing policies and practices to take account of new information regimes and the growing pressures for privatization. At the same time government agencies will have to find ways of coping with bilateral data exchanges with other countries that exercise intellectual property rights in their own data collections.
We do not mean to imply a need to totally reinvent or reorganize the existing universe in which scientific data are disseminated and exchanged. The opposite is true. A vast system of institutional mechanisms for the diffusion of scientific data as a public good—especially government-generated data—exists and continues to operate, and much government-funded data emerging from the academic communities continues to be disseminated through these well-established mechanisms.
In the European Union a strong database right exists and governments already exercise this right. Wise statesmanship would require these governments to renounce part of the right to better promote scientific progress. Governments in all countries, however, should consider imposing contractual templates in their own relations with the private sector. When they license data to the private sector for exploitation, they should ensure that the private sector will not harm scientific activities when it exploits the data. Contractual templates are needed to govern those kinds of relations and to help ensure the more efficient operations of the public-sector research community. These underlying contractual templates could implement the following research-friendly guidelines:
A general prohibition against legally or technically hindering access to the data in question for nonprofit scientific research and educational purposes;
A further prohibition against hindering or restricting the reuse of data lawfully obtained in the furtherance of nonprofit scientific research activities; and
An obligation to make data available for nonprofit research and educational purposes on fair and reasonable terms and conditions, subject to impartial review and arbitration of the rates and terms actually applied.
Governments from high-protectionist areas and from low-protectionist areas will have to come to some sort of international treaty that will allow minimum protection of databases without requiring every country to adopt the highest form of protection while keeping in mind the special requirements for the circulation between countries with different levels of scientific data.5
PROPOSALS FOR THE ACADEMIC SECTOR
The primary focus of this presentation is on proposals for the academic sector, because so much of the data are government funded and go through the universities. Therefore, they already benefit from a rudimentary regulatory regime. We suggest that science policy should treat data produced with government funds as a collective resource for research purposes and offer proposals that deal with how to do that.6
It is possible to differentiate between two “zones” of government-funded data. The first is a zone of formally regulated data exchanges, for which the regulations are imposed by the funding agency and generally kick in at the time of publication. The second is a zone of informal data exchanges, which typically run in the prepublication research phase, as well as in situations in which the terms of making data available are not formally specified in a research contract or grant. In Europe it would also be in the postpublication phase because of the existing database right. The ability of government funding agencies to influence data exchange practices will be much greater in the formal zone rather than in the informal one.
The Zone of Formal Data Exchanges
When no significant proprietary interests come into play, the optimal solution for government-generated data and for data produced by government-funded research is a formally structured archival data center also supported by government. Many such data centers have already been formed around large facility research projects. Building on the opportunities afforded by digital networks, it has now become possible to extend this time-tested model to highly distributed research operations conducted by groups of academics in different countries.
The traditional model entails a “bricks-and-mortar” centralized facility into which researchers deposit their data unconditionally. In addition to academics, contributors may include government and even private-sector scientists, but in most cases the true public-domain status of any data deposited is usually maintained. Examples in the United States include the National Center for Biotechnology Information, directly operated by the National Institutes of Health, and the National Center for Atmospheric Research, operated by a university consortium and funded primarily by the National Science Foundation. Hundreds of such data centers already exist.
A second, more recent model enabled by improved Internet capabilities also envisions a centralized administrative entity, but this entity governs a network of highly distributed smaller data repositories, sometimes referred to as “nodes.” Together the nodes constitute a virtual archive whose relatively small central office oversees agreed technical, operational, and legal standards to which all member nodes adhere. Examples of such decentralized networks, which operate on a public-domain basis in the United States, are the National Aeronautics and Space Administration’s Distributed Active Archive Centers under the Earth Observing System Program and the Long Term Ecological Research Network funded by the National Science Foundation.
These virtual archives, known as “federated” data management systems, extend the benefits and practices of a centralized bricks-and-mortar repository to the outlying districts and suburbs of the scientific enterprise. They help to reconcile practice with theory in the sense that the investigators—most of whom are funded by government anyway—are encouraged to deposit their data in such networked facilities. The very existence of these formally constituted networks thus helps to ensure that the resulting data are effectively made available to the scientific community as a whole, which means that the social benefits of public funding are more perfectly captured and the sharing ethos is more fully implemented.
At the same time some of the existing “networks of nodes” have already adopted the practice of providing conditional availability of their data, a feature of considerable importance for our proposals. “Conditional availability” means that the members of the network have agreed to make their data available for public science purposes on mutually acceptable terms, but the members also permit suppliers to restrict uses of their data for other purposes, typically with a view to preserving their commercial opportunities.
The networked systems thus provide prospective suppliers with a mix of options to accommodate deposits ranging from true public-domain status to fully proprietary data that have been made available subject to rules the
member nodes have adopted. The element of flexibility that conditional deposits afford make these federated data management systems particularly responsive to the realities of current university research in areas of scientific investigation where commercial opportunities are especially prominent.
Several proposals are suggested for universities, and include the following:
Develop interinstitutional agreements and cooperative institutional approaches to ensure unimpeded access to and liberal uses of scientific data and information in a not-for-profit research commons, while allowing for commercial exploitation of those resources in the private sector, when this is considered necessary and appropriate.
Develop model contractual provisions for interuniversity and interresearcher relationships and for cooperative research with the private sector that protect access to and unrestricted use of publicly funded research data by not-for-profit scientists.
Vigorously promote nonexclusive licensing by authors of their scientific articles to scientific, technical, and medical journals rather than assigning exclusive copyrights.
Initiate and review pilot projects for certain disciplines or categories of data to test the results.
We are aware that considerable thought has recently been given to the construction of voluntary social structures to support the production of large, complex information projects. Particularly relevant in this regard are the open-source software movement that has collectively developed and managed the GNU/Linux Operating System and the Creative Commons, which seeks to encourage authors and artists to dedicate some or all of their exclusive rights to the public domain. In both these pioneering movements, agreed contractual templates have been experimentally developed to reverse or constrain the exclusionary effects of strong intellectual property rights. Although neither of these models was developed with the needs of public science in mind, both provide helpful examples of how universities, federal funding agencies, and scientific bodies might contractually reconstruct a research commons for scientific data that could withstand the legal, economic, and technological pressures on the public domain.7