A CONTRACTUALLY RECONSTRUCTED RESEARCH COMMONS FOR SCIENCE AND INNOVATION
The presentations in Session 2 of this symposium described the growing efforts under way to privatize and commercialize scientific and technical information that was heretofore freely available from the public domain or on an “open-access” basis. If these pressures continue unabated, they will result in the disruption of long-established scientific research practices and in the loss of new research opportunities that digital networks and related technologies make possible. We do not expect these negative synergies to occur all at once, however, but rather to manifest themselves incrementally, and the lost opportunity costs they are certain to engender will be difficult to discern.
Particularly problematic is the uncertainty regarding the specific type of database protection that Congress may enact and any exceptions favoring scientific research and education that such a law might contain. As we have tried to demonstrate, however, the economic pressures to privatize and commercialize upstream data resources will continue to grow in any event. Moreover, legal means of implementing these pressures already exist, regardless of the adoption of a sui generis database right. Therefore, given enough economic pressure, that which could be done to promote strategic gains will likely be done by some combination of legal and technical means.
If one accepts this premise, then the enactment of some future database law could make it easier to impose restrictions on access to and use of scientific data than at present, but the absence of a database law or the enactment of a lower protectionist version would not necessarily avoid the imposition of similar restrictions by other means. In such an environment, the existing elements of risk or threat to the sharing norms of public science can only increase, unless the scientific community adopts countervailing measures.
We, accordingly, foresee a transitional period in which the negative trends identified above will challenge the cooperative traditions of science and the public institutions that have reinforced those traditions in the past, with uncertain results. In this period, a new equilibrium will emerge as the scientific community becomes progressively more conflicted between their private interests and their communal needs for data and technical information as a public resource. This transitional period provides a window of opportunity that should be used to analyze the potential effects of a shrinking public domain and to take steps to preserve the functional integrity of the research commons.
The Challenge to Science
The trends described above could elicit one of two types of responses. One is essentially reactive, in which the scientific community adjusts to the pressures as best it can without organizing a response to the increasing encroachment of a commercial ethos upon its upstream data resources. The other would require science policy to address the challenge by formulating a strategy that would enable the scientific community to take charge of its basic data supply and to manage the resulting research commons in ways that preserve its public-good functions without impeding socially beneficial commercial opportunities.
Under the first alternative, the research community can join the enclosure movement and profit from it. Thus, both universities and independent laboratories or investigators that already transfer publicly funded technology to the private sector can also profit from the licensing of databases. In that case, data flows supporting public science will have to be constructed deal by deal with all the transaction costs this entails and with the further risk of bargaining to impasse. The ability of researchers to access and aggregate the information they need to produce discoveries and innovations may be compromised both by the shrinking dimensions of the public domain and by the demise of the sharing ethos in the nonprofit community, as these same universities and research centers increasingly see each other as competitors rather than partners in a common venture. Carried to an extreme, this competition of research entities against one another, conducted by their respective legal offices, could obstruct and disrupt the scientific data commons.
To avoid these outcomes, the other option is for the scientific community to take its own data management problems in hand. The idea is to reinforce and recreate, by voluntary means, a public space in which the traditional sharing ethos can be preserved and insulated from the commoditizing trends identified above. In approaching this option, the community’s assets are the formal structures that surround federally funded data and the ability of federal funding agencies to regulate the terms on which data are disseminated and used. The first programmatic response would look to the strengthening of existing institutional, cultural, and contractual mechanisms that already support the research commons, with a view to better addressing the new threats to the public domain identified above. The second logical response is collectively to react to new information laws and related economic and technical pressures by negotiating contractual agreements between stakeholders to preserve and enhance the research commons.
As matters stand, the U.S. government generates a vast public domain for its own data by a creative use of three instruments: intellectual property rights, contracts, and new technologies of communication and delivery. By long tradition, the federal government has used these instruments differently from the rest of the world. It waives its property rights in government-generated information, it contractually mandates that such information should be provided at the marginal cost of dissemination, and it has been a major proponent and user of the Internet to make its information as widely available as possible. In other words, it has deliberately made use of existing intellectual property rights, contracts, and technologies to construct a research commons for the flow of scientific data as a public good. The unique combination of these instruments is a key aspect of the success of our national research enterprise.
Now that the research commons has come under attack, the challenge is not only to strengthen a demonstrably successful system at the governmental level, but also to extend and adapt this methodology to the changing university environment and to the new digitally networked research environment. In other words, universities, not-for-profit research institutes, and academic investigators, all of whom depend on the sharing of data, will have to stipulate their own treaties or contractual arrangements to ensure unimpeded access to, and unrestricted use of, commonly needed raw materials in a public or quasi-public space, even though many such institutions or actors may separately engage in transfers of information for economic gain. This initiative, in turn, will require the federal government as the primary funder—acting through the science agencies—to join with the universities and scientific bodies in an effort to develop suitable contractual templates that could be used to regulate or influence the research commons.
Implementing these proposals would require nuanced solutions tailor-made to the needs of government, academia, and industry in general and to the specific exigencies of different scientific disciplines. The following sections describe our proposals for preserving and promoting the public-domain status of government-generated
scientific data and of government-funded and private-sector scientific data, respectively. We do not, however, develop detailed proposals for separate disciplines and subdisciplines here, as these would require additional research and analysis.
THE GOVERNMENT SECTOR
To preserve and maintain the traditional public-domain functions of government-generated data, the United States will have to adjust its existing policies and practices to take account of new information regimes and the growing pressures for privatization. At the same time, government agencies will have to find ways of coping with bilateral data exchanges with other countries that exercise intellectual property rights in their own data collections.
We do not mean to imply a need to totally reinvent or reorganize the existing universe in which scientific data are disseminated and exchanged. The opposite is true. As we have explained, a vast public domain for the diffusion of scientific data—especially government-generated data—exists and continues to operate, and much government-funded data emerging from the academic communities continues to be disseminated through these well-estab-lished mechanisms.
Facilities for the curation and distribution of government-generated data are well organized in a number of research areas. They are governed by long-established protocols that maintain the function of a public domain, and in most cases ensure open access (either free or at marginal cost) and unrestricted use of the relevant data collections. These collections are housed in brick-and-mortar data repositories, many of which are operated directly by the government, such as the NASA National Space Science Data Center. Other repositories are funded by the government to carry out similar functions, such as the archives of the Hubble Space Telescope Science Institute at Johns Hopkins University.
Under existing protocols, most government-operated or government-funded data repositories do not allow the conditional deposits that look to commercial exploitation of the data in question. Anyone who uses the data deposited in these holdings can commercially exploit their own versions and applications of them without needing any authorization from the government. However, no such uses, including costly value-adding uses, can remove the original data from the public repositories. In this sense, the value-adding investor obtains no exclusive rights in the original data, but is allowed to protect the creativity and investment in the derived information products.
The ability of these government institutions to make their data holdings broadly available to all potential users, both scientific and other, has been greatly increased by direct online delivery. However, this potential is undermined by a perennial and growing shortage of government funds for such activities; by technical and administrative difficulties that impede long-term preservation of the exponentially increasing amounts of data to be deposited; and by pressures to commoditize data, which are reducing the scope of government activity and tend to discourage academic investigators from making unconditional deposits of even government-funded data to these repositories.
The long-term health of the scientific enterprise depends on the continued operations of these public data repositories and on the reversal of the negative trends identified earlier in this chapter. Here the object is to preserve and enhance the functions that government data repositories have always played, notwithstanding the mounting pressures to commoditize even government-generated data.
Implementing any recommendations concerning government-generated data will, of course, require adequate funding, and this remains a major problem. In most cases, however, it is not the big allocations needed to collect or create data that are lacking; it is the relatively small but crucial amounts to properly manage, disseminate, and archive data already collected that are chronically insufficient. These shortsighted practices deprive taxpayers of the long-term fruits of their investments in the scientific enterprise. Science policy must give higher priority to formulating workable measures to redress this imbalance than it has in the past.
Policymakers should also react to the pressures to privatize government-generated research data by devising objective criteria for ascertaining when and how privatization truly benefits the public interest. At times, privatization will advance the public interest because the private sector can generate particular datasets more efficiently or because other considerations justify this approach. Very often, however, the opposite will be true: especially when the costs of generating the data are high in relation to known, short-term payoffs. Two recent
National Research Council studies have attempted to formulate specific criteria for evaluating proposed privatization initiatives concerning scientific data.1 The science agencies should make the formulation of such criteria for different areas of research a top agenda item. In so doing, the agencies also need to analyze the results of past privatization initiatives with a view to assessing their relative costs and benefits.
Once the validity of any given privatization proposal has been determined by appropriate evaluative criteria, the next crucial step is to build appropriate, public-interest contractual templates into that deal to ensure the continued operations of a research commons. The public research function is too important to be left as an afterthought. It must figure prominently in the planning stage of every legitimate privatization initiative precisely because the data would previously have been generated at public expense for a public purpose. After all, the process of privatization aims to shift the commercial risks and opportunities of data production or dissemination to private enterprise under specified conditions that promote efficiency and economic growth. However, the process should not pin the functions of the research enterprise to the success of any given commercial venture; and it must not allow such ventures to otherwise compromise these functions by the charging of unreasonable prices or by the imposition of contractual conditions unduly restricting public, scientific uses of the data in question.
There are two situations in which model contractual templates, developed through interagency consultations, could play a critical role. One is where data collection and dissemination activities previously conducted by a government entity are transferred to a private entity. The other is where the government licenses data collected by a private entity for public research purposes. In both cases, the underlying contractual templates should implement the following research-friendly legal guidelines:
a general obligation not to legally or technically hinder access to the data in question for nonprofit scientific research and educational purposes;
a further obligation not to hinder or restrict the reuse of data lawfully obtained in the furtherance of nonprofit scientific research activities; and
an obligation to make data available for nonprofit research and educational purposes on fair and reasonable terms and conditions, subject to impartial review and arbitration of the rates and terms actually applied, to avoid research disasters such as the Landsat deal in the 1980s.
In cases where the public data collection activity is transferred to the private sector, care must be taken to ensure that the private entity exercises any underlying intellectual property rights, especially some future database right, in a manner consistent with the public interest—including the interests of science. To this end, a model contractual template should also include a comprehensive misuse provision like that embodied in Section 106 of H.R. 1858:
SEC. 106. LIMITATIONS ON LIABILITY
(b) MISUSE—A person or entity shall not be liable for a violation of section 102 if the person or entity benefiting from the protection afforded a database under section 102 misuses the protection. In determining whether a person or entity has misused the protection afforded under this title, the following factors, among others, shall be considered:
the extent to which the ability of persons or entities to engage in the permitted acts under this title has been frustrated by contractual arrangements or technological measures;
the extent to which information contained in a database that is the sole source of the information contained therein is made available through licensing or sale on reasonable terms and conditions;
the extent to which the license or sale of information contained in a database protected under this title has been conditioned on the acquisition or license of any other product or service, or on the performance of any action, not directly related to the license or sale;
the extent to which access to information necessary for research, competition, or innovation purposes have been prevented;
the extent to which the manner of asserting rights granted under this title constitutes a barrier to entry into the relevant database market; and
the extent to which the judicially developed doctrines of misuse in other areas of the law may appropriately be extended to the case or controversy.
The larger principle is that, in managing its own public research data activities, the government can and should develop its own database law in a way that promotes science without unduly impeding commerce. This principle is not new; the government already has a workable information regime, as described in the first session of this symposium. However, the government will need to adapt that regime to the pressures arising from the new high-protectionist legal environment and ensure that its agencies are consistently applying rational and harmonized public-interest principles. Otherwise, the traditional public-domain functions of government-generated data could be severely compromised. This in turn would violate the government’s fiduciary responsibilities to taxpayers and raise conflicts of interest and questions concerning sham transactions.
THE ACADEMIC SECTOR
In putting forward our proposals concerning the preservation of a research commons for government-funded data, it is useful to follow the distinction between a zone of formal data exchanges and a zone of informal data exchanges previously discussed in Session 1.2 Consistent with our earlier analysis, we emphasize that the ability of government funding agencies to influence data exchange practices will be much greater in the formal than the informal zone.
The Zone of Formal Data Exchanges
Where no significant proprietary interests come into play, the optimal solution for government-generated data and for data produced by government-funded research is a formally structured, archival data center also supported by government. As discussed, many such data centers have already been formed around large-facility research projects. Building on the opportunities afforded by digital networks, it has now become possible to extend this time-tested model to highly distributed research operations conducted by groups of academics in different countries.
The traditional model entails a “bricks-and-mortar” centralized facility into which researchers deposit their data unconditionally. In addition to academics, contributors may include government and even private-sector scientists, but in all cases the true public-domain status of any data deposited is usually maintained. Examples include the National Center for Biotechnology Information, directly operated by the National Institutes of Health, and the National Center for Atmospheric Research, operated by a university consortium and funded primarily by the National Science Foundation (NSF).
A second, more recent model, enabled by improved Internet capabilities, also envisions a centralized administrative entity, but this entity governs a network of highly distributed smaller data repositories, sometimes referred to as “nodes.” Taken together, the nodes constitute a virtual archive whose relatively small central office oversees agreed technical, operational, and legal standards to which all member nodes adhere. Examples of such a decentralized network, which operate on a public-domain basis, are the NASA Distributed Active Archive Centers under the Earth Observing System program and the NSF-funded Long Term Ecological Research Network.
These virtual archives, known as “federated” data management systems, extend the benefits and practices of a centralized brick-and-mortar repository to the outlying districts and suburbs of the scientific enterprise. They help to reconcile practice with theory in the sense that the investigators—most of whom are funded by government anyway—are encouraged to deposit their data in such networked facilities. The very existence of these formally constituted networks thus helps to ensure that the resulting data are effectively made available to the scientific community as a whole, which means that the social benefits of public funding are more perfectly captured and the sharing ethos is more fully implemented.
At the same time, some of the existing “networks of nodes” have already adopted the practice of providing conditional availability of their data: a feature of considerable importance for our proposals. By “conditional availability,” we mean that the members of the network have agreed to make their data available for public science
See Chapter 1 of these Proceedings, “Session 1 Discussion Framework,” by Paul Uhlir.
purposes on mutually acceptable terms, but they also permit suppliers to restrict uses of their data for other purposes, typically with a view to preserving their commercial opportunities.3
The networked systems thus provide prospective suppliers with a mix of options to accommodate deposits ranging from true public-domain status to fully proprietary data that has been made available subject to rules the member nodes have adopted. The element of flexibility that conditional deposits afford make these federated data management systems particularly responsive to the realities of present-day university research in areas of scientific investigation where commercial opportunities are especially prominent.
Our first proposition is that the government funding agencies should encourage unconditional deposits of research data, to the fullest extent possible, into both centralized repositories and decentralized network structures. The obvious principle here is that, because the data in question are government funded, improved methods should be devised for capturing the social benefits of public funding, lest commercial temptations produce a kind of de facto freeride at the taxpayers’ expense.
When unconditional deposits occur in a true public-domain environment removed from proprietary concerns, the legal mechanisms to implement these expanded data centers need not be complicated. Single researchers or small research teams could contribute their data to centers serving their specific disciplines, with no strings attached other than measures to ensure attribution and professional recognition. Alternatively, as newly integrated scientific communities organize themselves, they could seek government help in establishing new data centers or nodes that would accept unrestricted deposits on their behalf. Private companies could also contribute to a true public-domain model or organize their own variants of such a model; these practices should be encouraged as a matter of public policy.
If the unrestricted data were deposited in federal government-sponsored repositories, existing federal information law and associated protocols would define the public access rights. The maintenance of public-interest data centers in academia, however, is problematic without government support. These data centers can become partly or fully self-supporting through some appropriate fee structure, but resorting to a fee structure based on payments of more than the marginal cost of delivery quickly begins to defeat the public good and positive externality attributes of the system, even absent further use restrictions.
Leaving aside the funding issue, the deeper question that this first proposal raises is how the universities and other nonprofit research entities will resolve the potential conflict between the pressures to disclose and deposit their government-funded data and the valuable proprietary interests that are increasingly likely to surface in a high-protectionist intellectual property environment. One cannot ignore the risk that the viability and effectiveness of these centers could be undermined to the extent that the beneficiaries of government funding can resist pressures to further implement the sharing ethos and even decline to deposit their research data because of their commercial interests.
Despite their educational missions and nonprofit status, both universities and individual academics are increasingly prone to regard their databases as targets of opportunity for commercialization. This tendency will become more pronounced as more of the financial burden inherent in the generation and management of scientific data is shouldered by the universities themselves or by cooperative research arrangements with the private sector. In this context, the universities are likely to envision split uses of their data and will prefer to make them available on restricted conditions. They will logically distinguish between uses of data for basic research purposes by other nonprofit institutions and purely commercial applicants. Even this apparently clear-cut distinction might break down, moreover, if universities treat databases whose principal user base is other nonprofit research institutions as commercial research tools.
The point is that the universities may not want to deposit data in designated repositories, even with government support, unless the repositories can accommodate these interests, and the repositories could compromise their public research functions if they are held hostage to too many demands of this kind. The same potential
An example of an international network that operates on the basis of conditional deposits is the Global Biodiversity Information Network, headquartered in Denmark, which is substantially supported by U.S. government funding. For additional information, see http://www.gbif.org.
situation exists for individual databases made available by universities (as opposed to their contributions to larger, multisource repositories). This state of affairs will accordingly require still more creative initiatives to parry the economic and legal pressures on universities and academic researchers to withhold data.
With these factors in mind, our second major proposal is to establish a zone of conditionally available data to reconstruct and artificially preserve functional equivalents of a public domain. This strategy entails using property rights and contracts to reinforce the sharing norms of science in the nonprofit, transinstitutional dimension without unduly disrupting the commercial interests of those entities that choose to operate in the private dimension.
To this end, the universities and nonprofit research institutions that depend on the sharing ethos, together with the government science funding agencies, should consider stipulating suitable “treaties” and other contractual arrangements to ensure unimpeded access to commonly needed raw materials in a public or quasi-public space. From this perspective, one can envision the accumulation of shared scientific data as a community asset held in a contractually reconstructed research commons to which all researchers have access for purposes of public scientific pursuits.
One can further imagine that this public research commons exists in an ever-expanding “horizontal dimension,” as contrasted with the commercial operations of the same data suppliers in what we shall call the “vertical” or private dimension. The object of the exercise would be to persuade the government, as primary funder, to join with universities and scientific bodies in an effort to develop suitable contractual templates that could be used to regulate the research commons. These templates would ensure that data held in the quasi-public or “horizontal” dimension would remain accessible for scientific purposes and could not be removed or otherwise appropriated to the private or “vertical” dimension. At the same time, these contractual arrangements would expressly contemplate the possibilities for commercial exploitation of the same data in the private or vertical dimension, and they would clarify the depositor’s rights in that regard and ensure that the exercise of those rights did not impede or disrupt access to the horizontal space for research purposes.
In fashioning these proposals, we are aware that considerable thought has recently been given to the construction of voluntary social structures to support the production of large, complex information projects. Particularly relevant in this regard are the open-source software movement that has collectively developed and managed the GNU/Linux Operating System and the Creative Commons, which seeks to encourage authors and artists to conditionally dedicate some or all of their exclusive rights to the public domain.4 In both these pioneering movements, agreed contractual templates have been experimentally developed to reverse or constrain the exclusionary effects of strong intellectual property rights.
Although neither of these models was developed with the needs of public science in mind, both provide helpful examples of how universities, federal funding agencies, and scientific bodies might contractually reconstruct a research commons for scientific data that could withstand the legal, economic, and technological pressures on the public domain identified in this paper. In what follows, we draw on these and other sources to propose the contractual regulation of government-funded data in two specific situations: (1) where government-funded, university-generated data are licensed to the private sector and (2) where such data are made available to other universities for research purposes.
Licensing Government-Funded Data to the Private Sector
In approaching this topic, one must consider that the production of scientific databases in academia is not always dominated by activities funded by the federal government. It may also entail funding by universities themselves, foundations, and the private sector. Although these sources seem likely to grow in the future, especially if Congress adopts a database protection right, the government’s role in funding academic data production
See Chapter 23 of these Proceedings, “New Legal Approaches in the Private Sector,” by Jonathan Zittrain, for a description of the Creative Commons initiative.
will nonetheless remain a major factor, at least in the near term (although its role will vary from project to project). As we discussed earlier in this symposium, this presence gives the federal funding agencies unique opportunities to influence the data-sharing policies of its beneficiary institutions.
Ideally, funders and universities would agree on the need to maintain the functions of a public domain to the fullest extent possible, to provide open access for nonprofit research activities, and to encourage efficient technological applications of available data. At the same time, technological applications and other opportunities for commercial exploitation of certain types of databases will push the universities to enter into private contractual transactions that, if left totally unregulated, could adversely affect the availability of the relevant data for public research purposes. The reconciliation of the conflicts between enhancing the public research interests and freedom of contract will require carefully formulated policies and institutional adjustments.
Assuming the existence of sufficient funds, the maximum availability of academic data for research purposes is assured if those data have been deposited in the public data centers. To the extent that agencies successfully encourage academics and their universities to deposit government-funded data into either old or new repositories established for this purpose, the research-friendly policies of these centers should automatically apply. As long as these policies are not themselves watered down by commercial and proprietary considerations, they should generally immunize the research function from conflicts deriving from private transactions.
However, the universities or their researchers may very well balk at depositing commercially valuable data in these repositories unless a relative degree of autonomy is preserved for depositories to negotiate the terms of their private transactions and to impose restrictions on the uses of the data deposited for commercial purposes. This raises two important questions. The first concerns the willingness of data centers themselves—whether of the centralized brick-and-mortar variety or virtual networks—to accept conditional deposits that impose restrictions on use for certain purposes in the first place. The second question, closely tied to the first, concerns the extent to which federal funding agencies should further seek to define and influence the relations between universities and the private sector to protect the public research function—especially when the data in question have not been deposited in an appropriate repository or when they have been so deposited but the repository permits conditional deposits.
Regarding the first of these questions, we previously observed that the emerging “network of nodes” model is more likely to accommodate conditional deposits or availability than are the traditional centralized data centers. Nevertheless, the practice remains controversial in scientific circles in that it deviates from the traditional norm of “full and open access.” For present purposes, we shall simply state our view that the possibilities for maximizing access to scientific data for public nonprofit research will not be fully realized in a highly protectionist legal and economic environment unless the scientific community agrees to experiment with suitably regulated conditional deposits.
The second question, concerning the need to regulate the interface between universities and the private sector with regard to government-funded data, acquires important contextual nuances when viewed in the light of the policies and practices that currently surround the Bayh-Dole Act and related legislation. The Bayh-Dole Act encourages universities to transfer the fruits of federally funded research to the private sector by means of the patent system. In a somewhat similar vein, federal research grants and contracts allow researchers to retain copyrights in their published research results. By extension, the same philosophy could apply to databases produced with federal funding, especially if Congress were to adopt a sui generis database protection right, with incalculably negative results, unless steps were taken to reconcile the goals of Bayh-Dole with the dual nature of data as both an input and an output of scientific research and of the larger system of technological innovation.
It would also be a mistake for the science policy establishment to wait for the enactment of database legislation before considering the implications of blindly applying the spirit of Bayh-Dole to any database law that Congress may adopt. Because databases differ significantly from either patented inventions or copyrighted research results, policy makers should anticipate the advent of some database legislation and address the problems it may cause for science—particularly in regard to government-funded data. Special consideration must be given to how the power to control uses of scientific data after publication would be exercised once a database protection law was enacted.
We do not mean to question the underlying philosophy or premise of Bayh-Dole, which has produced socially beneficial results. Its very success, however, has generated unintended consequences and raised new questions that
require careful consideration. In advocating a program for a contractually reconstructed research commons, one of our explicit goals is, indeed, to ensure that academics and their universities benefit from new opportunities to exploit research data in an industrial context. This goal reflects the policies behind Bayh-Dole. At the same time, it would hardly be consistent with the spirit of Bayh-Dole to allow the commercial partners of academic institutions to dictate the terms on which government-funded data are made available for purposes of nonprofit scientific research.
On the contrary, a real opportunity exists for government funders and universities to develop agreed contractual templates that would apply to commercial users of government-funded data in general. In effect, the public scientific community would thus develop a database protection scheme of its own that would override the less research-friendly provisions of any sui generis regime that Congress might adopt. In so doing, the scientific community could also significantly influence the data-licensing policies and practices of the private sector, before that sector ends up influencing the data-licensing practices of university technology transfer offices.
If one takes this proposal seriously, a capital point of departure would be to address the problem of follow-on applications, which has greatly perturbed the debate about database protection in general. The critical role of data as inputs into the information economy weighs heavily against endowing database proprietors with any exclusive right to control follow-on applications. This principle becomes doubly persuasive when the government itself has defrayed the costs of generating the data in question, in which case an exclusive right to control value-added applications takes on a cast of reverse free-riding. The solution is to allow second-comers freely to extract and use data from any given collection for bona fide value-adding purposes in exchange for adequate compensation of the initial investor based on an expressly limited range of royalty options. If the rules developed by universities and funding agencies imposed this kind of “compensatory liability” regime on follow-on applications of government-funded academic data, in lieu of any statutorily created exclusive right, there is reason to believe it would significantly advance both technological development and the larger public interest in access to scientific data.
Universities and funding agencies could also adopt clauses similar to those proposed above in the context of government-generated data, including a general prohibition against legally or technically hindering access to any database built around government-funded data for purposes of nonprofit scientific research. Clauses that obligate private partners not to hinder the reuse of data in the construction of new databases to address new scientific research objectives seem particularly important, as are clauses requiring private partners to license their commercial products on fair and reasonable terms and conditions. Also desirable are clauses forbidding misuse of any underlying intellectual property rights and establishing guidelines that courts should apply in evaluating specific claims of misuse.
Moreover, when considering relations with the private sector, attention should be given to the high costs of managing and archiving data holdings for scientific purposes and to the possibilities of defraying some of these costs through commercial exploitation. Although government support ought to increase, especially as the potential gains from a horizontal e-commons become better understood, the costs of data management will also increase with the success of the system. For this reason, universities may want to levy charges against users in the private sector or the vertical dimension, in order to help defray the costs of administering operations in the horizontal domain and to make this overall approach more economically feasible.
Finally, care must be taken to reduce friction between the scientific data commons as we envision it and universities’ patenting practices under the Bayh-Dole Act. For example, any agreed contractual templates might have to allow for deferred release of data, even into repositories operating as a true public domain, at least for the duration of the one-year novelty grace period during which relevant patent applications based on the data could be filed. Other measures to synchronize the operations of the e-commons with the ability of universities to commercialize their holdings under Bayh-Dole would have to be identified and carefully addressed. We also note that there is an interface between our proposals for an e-commons for science and antitrust law, which would at least require consultation with the Federal Trade Commission and might also require enabling legislation.
In sum, to successfully regulate relations between universities and the private sector in the United States, where most of the scientific data in question are government funded (if not government generated), considerable thought must be given to devising suitable contractual templates that universities could use when licensing such data to the private sector. These templates, which should aim to promote the smooth operations of a research
commons and to facilitate general research and development uses of data as inputs into technological development, could themselves constitute a model database regime that optimally balances public and private interests in ways any federally enacted law might not. To succeed, however, these templates must be acceptable to the universities, the funding agencies, the broader scientific community, and to the specific disciplinary subcommunities—all of whom must eventually weigh in to ensure that academics themselves observe the norms that they would thus have collectively implemented.
In so doing, the participating institutions could avoid a race to the bottom in which single universities might otherwise trade away more restrictions on open access and research to attract more and better deals from the private sector. Unless science itself takes steps of this kind, there is a serious risk that, under the impetus of Bayh-Dole, the private sector will gradually impose its own database rules on all government-funded data products developed with their university partners.
Inter-University Licensing of Scientific Data
Whatever the merits of our proposals for regulating transfers of scientific data from universities to the private sector, the need for science policy to regulate inter-university transfers of such data seems irrefutable. In this context, most of the data are generated for public scientific purposes and at public expense, and the progress of science depends on continued access to, and further applications of, such data. Not to construct a research commons that could withstand the pressures to privatize government-funded data at the inter-university level would thus amount to an indefensible abdication of the public trust by encumbering nonprofit research with high transaction and exclusion costs. All the same, implementing this task poses very difficult problems that are likely to exacerbate the conflicts of interests between the open and cooperative norms of science and the quest for additional funding sources we previously identified.
One may note at the outset that these conflicts of interest are rooted in the Bayh-Dole approach to the transfer of technology itself. This legislative framework stimulates universities to protect basic research results through intellectual property rights and to license those rights to the private sector for commercial applications. If Congress enacted a strong database protection law, it could extend Bayh-Dole to this new intellectual property right. In such a case, Bayh-Dole would simply pass the relevant exclusive rights to extract and utilize collected data straight through the existing system to the same universities and academic researchers who now patent their research results and who would thus end up owning all the government-funded data they had generated.
Moreover, the Bayh-Dole legislation makes no corresponding provision for beneficiary universities to give differential and more favorable treatment to other universities when licensing patented research products. On the contrary, there is evidence that in transactions concerning patented biotechnology research tools, universities have viewed each others’ scientists as a target market. In these transactions, universities have virtually the same commercial interests as private producers of similar tools for scientific research. Such inter-university deals have accordingly been constructed on a case-by-case basis, often with considerable difficulty, by technology transfer offices striving to maximize all their commercial opportunities.
Without any agreed restraints on how universities are to deal with collections of data in which they had acquired statutorily conferred ownership and exclusive exploitation rights, their technology transfer offices could simply treat databases like patented inventions—despite the immensely greater impact this could have on both basic and applied research. In this milieu, reliance on good-faith accommodations hammered out by the respective technology transfer offices would, at best, make inter-university exchanges resemble the complicated transactions that already characterize relations between highly distributed laboratories and research teams in the zone of informal exchanges of scientific data. All the vices of that zone would soon be imparted into the more formal zone of inter-university relationships. At worst, this would precipitate a race to the bottom as universities tried to maximize their returns from these rights, in which case some technology transfer offices could be expected to contractually override any modest research exceptions a future database law might have codified.
At the same time, the Bayh-Dole legislative framework may itself suggest an antidote for resolving these potential conflicts of interest, or, at least, a sound point of departure for addressing them. Section 203 of the Bayh-Dole Act explicitly recognizes that the public interest in certain patented inventions may outweigh the benefits
usually anticipated from private exploitation under exclusive property rights. In such cases, it authorizes the government to impose a compulsory license or otherwise to exercise “march in” rights and take control of the invention it has paid to produce. In fact, these public-interest adjustments have never successfully been exercised in practice.
Nevertheless, the principle (if not the actual practice) behind these provisions presents a platform on which universities and federal funding agencies can build their own mutually acceptable arrangements to promote their common interest in full and open access to government-funded collections of data. Our goal, indeed, is to persuade them to address this challenge now, before a database protection law is enacted, by examining how to ensure the smooth and relatively frictionless exchange of scientific data between academic institutions, regardless of any exclusive property rights they may eventually acquire and notwithstanding any other commercial undertakings with the private sector they may pursue. Absent such a proactive approach, we fear a slow unraveling of the traditional sharing norms in the inter-university context and an inevitable race to the bottom.
• Structuring Inter-University Data Exchanges in a High-Protectionist Legal Environment. Because the issues under consideration here pertain to uses of government-funded data produced by academics for university-sponsored programs, one looks to “full and open access” as the optimal guiding principle and to the sharing norms of science as the foundation of any arrangement governing inter-university licensing of data. On this approach, the government-funded data collections held by universities would be viewed as a single common resource for inter-university research purposes. The operational goal would be to nurture and extend this common resource within a horizontally linked administrative framework that facilitated every university’s public research functions, without unduly disrupting commercial relations with the private sector that some departments of some universities will undertake in the vertical dimension.
To achieve this goal, universities, funding agencies, and interested scientific bodies would have to negotiate an acceptable legal and administrative framework, analogous to a multilateral pact, that would govern the common resource and provide day-to-day logistical support. Ideally, the participating universities or their designated agents would operate as trustees for the horizontally constructed common resource, much as occurs with what the Free Software Foundation does with the GNU system. In this capacity, the trustees would assume responsibility for ensuring access to the holdings on the agreed terms and for restraining deviant uses that violate those terms or otherwise undermine the integrity of the commons. The full weight of the federal granting structure could then be made to support these efforts by mandating compliance with agreed terms and by directly or indirectly imposing sanctions for noncompliance.
Alternatively, a less formal administrative structure could be built around a set of agreed contractual templates regulating access to government-funded data collections for public research purposes. On this approach, the participating universities would retain greater autonomy, there would be less need for a fully fleshed out “multilateral pact,” and the monitoring and other transaction costs might be reduced.
In a less than perfect world, however, there are formidable obstacles standing in the way of a negotiated commons project, over and above inertia, that would have to be removed. Initially, the very concept of an e-commons needs to be sold to skeptical elements of the scientific community whose services are indispensable to its development. Academic institutions, science funders, the research community, and other interested parties must then successfully negotiate and stipulate the pacts needed to establish it, as well as the legal framework to implement it. Transaction costs would all need to be monitored closely and, whenever possible, reduced throughout the various development phases.
Once the research universities became wholeheartedly committed to the idea of a regime that guaranteed them universal access to, and shared use of, the government-funded data that they had collectively generated, these organizational problems might seem relatively minor. The difficulties of winning such a commitment, however, cannot be overestimated in a world where university administrators are already conflicted about the efforts of their technology transfer offices to exploit commercially valuable databases in the genomic sciences and other disciplines with significant potential for commercial development. The prospect that Congress will eventually adopt a hybrid intellectual property right in collections of data could make these same administrators reluctant to lock their institutions into a kind of voluntary pool of any resulting exclusive property rights, even for public scientific research purposes.
Conceptually, the problems inherent in organizing a pool of intellectual property rights so as to preserve access to, and use of, a common resource have become much better understood than in the past—owing to the experience gained from both the open-source software movement and the new Creative Commons initiative. These projects demonstrate that there are few, if any, technical obstacles that cannot be overcome by adroitly directing relevant exclusive rights and standard-form contracts to public, rather than private, purposes.
The deeper problem is persuading university administrators that they stand to gain more from open access to each others’ databases in a horizontally organized research commons than they stand to lose from licensing data to each other under more restrictive, case-by-case transactions. Although we believe this proposition to be true, following it could amount to an act of faith, albeit one that resonates with the established norms of science, and with the primary mission of universities.
To the extent that the universities may have to be sold on the benefits of an e-commons for data, with a view to rationalizing and modifying their disparate licensing policies, this project would require statesmanship, especially on the part of the leading research universities. It may also require pressure from the major government funders and standard-setting initiatives by scientific subcommunities. Funding agencies, in particular, must be prepared to discipline would-be holdouts and to discourage other forms of deviant strategic behavior that could undermine the cohesiveness of those institutions willing to pool their resources. In this regard, account will have to be taken as well of the universities’ patenting interests, which will need to be suitably accommodated.
Assuming a sufficient degree of organizational momentum, there remains the thorny problem of establishing the terms and conditions under which participating universities could contribute their data to a horizontally organized research commons. The bulk of the departments and subdisciplines involved would almost certainly prefer a bright-line rule that required all deposits of government-funded data to be made without conditions and subject to no restrictions on use. This preference follows from the fact that most science departments currently see no realistic prospects for licensing basic research data, even to the private sector, and have not yet experienced the proprietary temptations of exclusive ownership that a sui generis intellectual property right in noncopyrightable databases might eventually confer.
At the same time, such a bright-line rule could utterly deter those subdisciplines that already license data on commercial terms to either the private or public sectors, or that contemplate doing so in the near future. These subdisciplines would not readily forego these opportunities and would, on the contrary, insist that any multilateral negotiations to establish a horizontal commons devise contractual templates that protected their commercial interests in the vertical dimension. If, moreover, Congress enacts a de facto exclusive property right in collections of data, it would probably deter other components of the scientific community, who might become unwilling to forego either the prospective commercial opportunities or other strategic advantages such rights might make possible.
In a word, a bright-line rule requiring unconditional deposits in all cases could thus defeat the goal of linking all university generators of government-funded data in a single, horizontally organized research commons. At the same time, the goal of universality could, paradoxically, require negotiators seeking to establish the system to deviate from the norm of full and open access by allowing a second type of conditional deposit of data into the horizontal domain by those disciplines or departments that were unwilling to jeopardize present or future commercial opportunities.
• Resolving the Paradox of Conditional Deposits. Science policy in the United States has long disfavored a two-tiered system for the distribution of government-funded data. Such two-tiered systems for government or academic data distribution have been favored and promoted by the scientific community in the European Union, but these initiatives have been strongly opposed by U.S. science agencies and academics. Under such a system, database proprietors envision split (or two-tier) uses of certain data and will only make them available on conditions that govern the different types of uses they have expressly permitted.
Typically, split-level arrangements distinguish between relatively unrestricted uses for basic research purposes by nonprofit entities and more restricted uses for commercial applications by private firms that license data from scientific entities. The latter conditions may range from a simple menu of price-discriminated payment options to more complicated provisions that regulate certain data extractions, seek grant-backs of follow-on
applications by second-comers, or impose reach-through clauses seeking legal or equitable rights in subsequent products. In some cases, moreover, the distinction between profit and nonprofit uses of scientific data becomes blurred, and the two categories may overlap, which adds to the costs and complications of administration. For example, universities may treat some databases as commercial research tools and impose a price discrimination policy that provides access to the research community at a lower cost than to for-profit entities. This becomes more likely when there is a private-sector partner.
We recognize that a decision to allow participating universities to make conditional deposits of government-funded data to a collectively managed research consortium represents a second-best solution, one that conflicts with the goal of establishing a true public domain based on the premise of full and open access to all users. The allowance of restrictions on use breaks up the continuity of data flows across the public sector and necessitates administrative measures and transaction costs to monitor and enforce differentiated uses. It also entails measures to prevent unacceptable leakage between the horizontal and vertical planes, and it may result in charges for public-interest uses that exceed the marginal cost of delivery, even in the horizontal plane.
We nonetheless doubt that a drive for totally unconditional deposits of government-funded data could succeed in the face of mounting worldwide pressures to commoditize scientific data, and we fear that excessive reliance on the orthodox position would, in the end, undermine—rather than save—the sharing ethos. Even if one disregards the prospects for strengthened intellectual property protection of noncopyrightable databases, too many universities have already begun to perceive the potential financial benefits they might reap from commercial exploitation of genomic databases in particular and biotechnology-related databases in general. Their reluctance to contribute such data to a research commons that allowed private firms freely to appropriate that same data could not easily be overcome. Adoption of a database protection law would then magnify this reluctance and encourage the respective technology transfer offices to find more ways to commercially exploit more of the government-funded data products that were subsequently invested with proprietary rights.
Even if a consortium of universities were to formally consent to such an unconditional arrangement, their technology transfer offices might soon be demanding an exceptional status for any databases that contained components produced without government funds. They could persuasively argue that private funds for most jointly created data products could decrease or even dry up if both customers and competitors could readily obtain the bulk of the data from the public domain. Once it became clear that an admixture of privately funded data could elicit the right to deposit data in a research commons on conditions that protected commercial exploitation of the databases in question, academics with an eye to cost recovery and profit maximization would logically make persistent efforts to qualify for this treatment. They would thus seek more private investment for this purpose or seek to obtain the university’s own funds for the project. Either way, there would be a perverse incentive to privatize more data than ever if the only legitimate way to avoid dedicating it all to the public domain was to show that some of it had been privatized.
In other words, if the quasi-public research space accommodated only unconditional deposits of data, it could foster an insuperable holdout problem as participating universities found ways to detach and isolate their commercially valuable databases from such a system. In these circumstances, a failure to obtain a best-case scenario premised on “full and open access” would quickly degenerate into a worst-case scenario, characterized by growing gaps in the communally accessible collection and an unraveling of the sharing ethos, that would require case-by-case construction of inter-university data flows and could sometimes culminate in bargaining to impasse. In our estimation, the worst-case scenario is so bad, and the pressures to commoditize could become so great in the presence of a strong database right, that steps must be taken to ensure universal participation in a contractually reconstructed research commons from the outset by judiciously allowing conditional deposits of government-funded data on standard terms and conditions to which all the stakeholders had previously agreed. Indeed, the goal is to develop negotiated contractual templates that clearly reinforce and implement terms and conditions favorable to public research without unduly compromising the ability of the consortium’s member universities to undertake commercial relations with the private sector.
At stake in this process is not just a few thousand patentable inventions, but, rather, every government-funded data product that has potential commercial value to other universities as a research tool or educational device. Sound data management policies thus point to a second-best solution that would preserve the integrity of the inter-
university commons by disallowing the principal ground on which concerted holdout actions might take root, by ensuring that only research-friendly terms and conditions apply in both the horizontal and vertical dimensions, and by making it too costly for any institution to deviate from the agreed regulatory framework governing the two-tiered regime.
Those who object to this proposal will argue that it unduly undermines the “full- and open-access” principle by tempting more and more university departments or subdisciplines to opt for conditional deposits than would otherwise have been the case. On this view, once a negotiated two-tiered model were set in place, universities would come under intense pressures to avoid the true public domain or open-access option even when there was no need to do so.
However, a universal and functionally effective inter-university research commons simply cannot be constructed with a bright-line, true public-domain rule applied across the board for the reasons we previously set out. A bright-line rule also carries with it the well-recognized difficulty of distinguishing for-profit from not-for-profit research activities when single libraries increasingly engage in both. In contrast, a regime based on conditional deposits overcomes this problem by allowing a scientific entity to contribute to and benefit from the data commons so long as it respects the agreed norms bearing on that arrangement. In this respect, a normative accommodation will have displaced legal distinctions that cannot feasibly be enforced.
Moreover, the very contractual templates that make the construction of such a commons feasible in a two-tiered system should also mitigate its social costs. Even if conditional deposits are allowed, many subdisciplines will continue to have no commercial prospects and no need to invoke the contractual templates that regulate them. When this is the case, peer pressures reinforced by the funding agencies should make it difficult, if not impractical, for members of those communities to opt out of the traditional practice of making data available unconditionally.
When, instead, given communities find themselves forced to deal with serious commercial pressures, the negotiated contractual solutions that enabled them to make the data conditionally available for public research purposes should also tend to preserve and implement the norms of science. In particular, the applicable contractual templates should immunize deposited data from the vagaries of case-by-case transactions under the aegis of universities’ technology transfer offices and would also limit the kinds of restrictions private-sector partners might otherwise seek to impose on universities.
At the end of the day, a set of agreed contractual templates permitting conditional deposits in the interests of a horizontally linked research commons would provide a tool universities could use with more or less wisdom. If used wisely, this tool should ensure that more data are made available to a contractually reconstructed research commons than would be possible if member universities could not protect the interests of their commercial partners. This same tool may also provide incentives for the private sector to work with universities in producing better data products than the latter alone could generate with their limited funds.
• Other hard problems. Allowing universities to deposit government-funded data into a contractually reconstructed research commons, on conditions designed to protect their commercial relations with the private sector, solves two difficult problems. First, it avoids the risk that large quantities of government-funded data would remain outside the system on the grounds that they had been commingled with privately funded components. Second, it ensures that any negotiated contractual templates the research consortium adopts to govern its horizontal space will apply to all the data holdings subject to its jurisdiction, including databases to which the private sector had contributed. However, it does not automatically determine the precise conditions that the agreed contractual templates should apply to inter-university licensing of data subject to their collective jurisdiction. In the process of defining these conditions, moreover, those who negotiated the “multilateral pact” among universities, federal funders, and scientific bodies needed to launch the consortium would have to resolve a number of contentious issues.
The guiding principle that should apply to inter-university licensing of data available from the quasi-public space is that depositors may not impose any conditions that impede the customary and traditional uses of scientific data for nonprofit research purposes. A logical corollary is that they should affirmatively adopt the measures that may prove necessary to extend and apply this principle to the online environment. Because the data under
discussion are government funded for academic purposes to begin with, the “open access” and sharing norms of science should then color any specific implementing templates that regulate access and use.
With regard to access, the customary mode of implementing these norms would be to make data available to other nonprofit institutions at no more than the marginal cost of delivery. In the online environment, these marginal costs are essentially zero. This represents the preferred option whenever the costs of maintaining the data collection are defrayed by public subsidy or by nonexclusive licenses to private firms in the vertical dimension.
If, however, the policy of free or marginally priced access appears unable to sustain the costs of managing a given project at the inter-university level, an incremental pricing structure may become unavoidable. The options for such a pricing structure range from a formula allowing partial incremental cost recovery when a project is partially subsidized to a formula providing full cost recovery when this is necessary to keep the data collection alive. This may be accomplished through a paying inter-university consortium, such as the Inter-University Consortium for Political and Social Science Research at the University of Michigan, or by means of more ad hoc cost-recovery methods. Examples of subcommunities that have found it necessary to rely on the second option are largely in the laboratory physical sciences.
The prices charged other nonprofit users to access data in the research commons should never exceed the full incremental cost of managing the collective holdings. This premise follows from the fact that the initial costs of collecting or creating the data were defrayed by the government or by some combination of sources (including private sources) that normally subscribe to the open-access principle.
However, when private firms have defrayed a substantial part of the costs of generating the database in question, there are few, if any, standard solutions. Occasionally, even a private partner might view the collective holdings as a valuable resource for its own pursuits, to which it agrees to contribute on an eleemosynary basis. In the more typical cases, the private partner is likely to view the research community as the target market for a database it paid to create and from which it must derive its expected profits.
In that event, the collection of additional revenues from private-sector access charges should depend entirely on freedom of contract, although a likely demand that public research users pay access charges that exceeded data management costs would pose a hard question. On the one hand, as beneficiaries of government funding, the universities should forego “profits” from charges levied to access their partly publicly funded databases for public research purposes. On the other hand, a private partner will not readily forego such profits, especially if it had invested in the project precisely because of its potential commercial value as a research tool. If the university shared these “profits” with its private partner, this practice would deviate from the basic principle governing inter-university access generally, and it would encourage other universities to seek private partners for this purpose, which in turn would yield both social costs and benefits.
In these cases, care must be taken to avoid adopting policies that would discourage either public-private partnerships for the development of socially beneficial data products or the inclusion of such products in a horizontal, quasi-public research space. At the same time, there is a potential loophole here that would allow universities to deviate from the general rules applicable to that space if the private partner could impose market-driven access rights for nonprofit research purposes, and its partner university shared in those profits.
We know of no standard formula for resolving this problem. If the database is also of interest to the private sector, price discrimination and product differentiation are the preferred techniques for reducing access charges levied for public research. In any event, the trustees that manage the inter-university system should monitor and evaluate these charges, and their powers to challenge unreasonable or excessive demands would become especially important in the absence of any alternative or competing sources of supply.
This strategy, however, begs the question of whether and to what extent the universities should be allowed to retain their share of the “profits” from access charges levied against public research users. As matters stand, this is an issue that can be addressed only by the relevant discipline communities themselves, in the absence of some general norm that would not pose insuperable administrative burdens to implement.
Once access to databases available to the research commons has legitimately been gained, further restrictions on uses of the relevant data should be kept to a minimum. In principle, contractual restrictions on reuses of publicly funded data for nonprofit research purposes should not be permitted. This principle need not impede the use of
conditions that require attribution or credit from researchers who make use of such data, and it can also be reconciled with provisions that defer access by certain users for specified periods of time or that impose restrictions on competing publications for a certain period of time.
This ideal principle runs into trouble, however, when confronted with the difficult problem posed by commercially valuable follow-on applications derived from databases made available to the research commons. It is one thing to posit that the academic beneficiaries of publicly funded research should be limited to the recovery of costs through access charges and should not be entitled to additional claims for follow-on uses by other nonprofit researchers. Quite different situations arise when the funding is public, but a private firm has invested its own resources to develop a follow-on application for commercial pursuits or when the initial data-generating project entailed a mix of private and public funds and the product subsequently gives rise to a commercially valuable follow-on application. These hard cases become even harder if the follow-on product primarily derives its commercial value from being a research tool universities themselves need to acquire.
Assuming, as we do, that a primary objective of any negotiated solution is to avoid gaps in the data made available for public research purposes in the horizontal domain, there is an obvious need for agreed contractual templates that would respect and preserve the commercial interests in the vertical plane identified above. This goal directly conflicts, however, with the most idealistic option set out above, which is to freely allow all follow-on applications based on data made available to the research commons, regardless of the commercial prospects or purposes and without any compensatory obligation beyond access charges (if any).
This option would represent a true public-domain approach to government-funded data, and it would fit within the traditional legal framework applied in the past to collections of data. However, it might be expected to discourage public-private partnerships formed to exploit follow-on applications of publicly funded databases, contrary to the philosophy behind the Bayh-Dole Act, although this risk is tempered by the fact that all would-be competitors who invested in such follow-on applications would find themselves on equal footing in this respect. This option would certainly discourage public-private partnerships formed to produce scientific databases from making them available to the commons if that decision automatically deprived them of any rights to follow-on applications.
A second option is to leave the problem of commercially valuable follow-on applications to freedom of contract, in which case universities and their private partners could license whom they please and exclude the rest. This solution is consistent with proposals to enact a de facto exclusive property right in noncopyrightable databases and with the philosophy behind Bayh-Dole. It would also alleviate disincentives to make databases derived from a mix of public and private funds available to the nonprofit research community.
However, this second option would relegate the problem of follow-on applications to the universities’ technology transfer offices once again, which might be tempted routinely to impose the kind of grant-back and reach-through clauses that are already said to generate anticommons effects in biotechnology and that are inconsistent with the dual nature of data as both inputs and outputs of innovation. Just as a true public-domain approach tends unintentionally to impoverish the commons we seek to construct, so too a true laissez-faire approach undermines the effectiveness of that same commons and triggers a race to the bottom, as universities seek private partners solely for the purpose of occupying a privileged position with respect to follow-on applications.
A third option is to allow freely follow-on applications of databases made available to the research commons for commercial purposes while requiring their producers to pay reasonable compensation for such uses under a predetermined menu that fixes a range of royalties for a specified period of time. For maximum effect, a corollary “no holdout” provision should obligate all universities engaged in public-private database initiatives to make the resulting databases available to the research commons under this “compensatory liability” framework.
This approach enables investors in public-private database initiatives to make their data available for public research purposes without depriving them of revenue flows from follow-on applications needed to cover the costs of research and development or of the opportunities to turn a profit. At the same time, it avoids impeding access to the data for either commercial or noncommercial purposes, in which aspect it mimics a true public domain, and it creates no barriers to entry. Moreover, a compensatory liability approach implements the policies behind the Bayh-Dole Act without the overkill that occurs when publicly funded research results are subjected to exclusive property rights that impoverish the public domain and create barriers to entry to boot.
These, or other, options would require further study and analysis as part of the larger process of reconstructing the research commons we propose. It should be clear, moreover, that any solutions adopted at the outset must be viewed as experimental, subject to review in light of actual results.
The Zone of Informal Data Exchanges
The zone of informal data exchanges is populated by single researchers or laboratories or by small teams of associated researchers whose work is typically expected to lead to future publications. Because this zone operates largely in a prepublication environment or outside the ambit of federal granting agencies, the constraints of government funders on uses of data are relatively less prescriptive, and a considerable amount of the data being produced may not be funded by federal agencies at all. If funding is provided by other nonprofit sources or by state governments, the end results still pertain to public science and its ultimate disclosure norms, but the controls are not standardized. To the extent that private-sector funding is also involved, even the norms of public science may not apply.
Quantitatively, the amount of scientific data held in this informal zone appears large. Despite the relative degree of invisibility that prepublication status confers, these holdings are also of immense qualitative importance for cutting-edge research endeavors. Although these data may not be as well prepared as those released for broad, open use in conjunction with a publication, they typically will reflect the most recent findings. Moreover, this informal sector seems destined to grow even more important in the near future as it increasingly absorbs scientific data that were not released at publication as well as the data researchers continue to compile after publication. If Congress were to adopt a strong intellectual property right in noncopyrightable databases, this informal zone could expand further to include all the published data covered by an exclusive property right that had not otherwise been dedicated to the public domain.
As previously discussed, actual secrecy is taken for granted in this zone, and disclosure depends on individually brokered transactions often based on reciprocity or some quid pro quo. These fragile data streams, which have always been tenuous due to personal and strategic considerations, have increasingly broken down owing to denials of access and to a trading mentality steeped in commercial concerns that is displacing the sharing ethos.
Left to themselves, the legal and economic pressures operating in the informal zone are likely to further reduce disclosures over time and to make the informal data exchange process resemble that of the private sector. That trend, in turn, undermines the new opportunities to link even highly distributed data holdings in virtual archives or to experiment with new forms of collaborative research on a distributed, autonomous basis, as digital networks have recently made possible. The positive synergies expected from organized peer-to-peer file sharing on an open-access basis cannot be realized if researchers decline to make data available at all out of a fear of sacrificing newfound commercial opportunities or other strategic advantages. Nor will these new opportunities fully develop if those who are nominally willing to make data available impose onerous licensing terms and conditions—reinforced by intellectual property rights—that multiply transaction costs, unduly restrict the range of scientific uses permitted, or otherwise embroil those uses in anticommons effects.
Here, the immediate goal of science policy should be to reduce the technical, legal, and institutional obstacles that impede electronic peer-to-peer file exchanges and to generally facilitate exchanges of data on the most open terms possible across a horizontal or quasi-public space. At the same time, the measures adopted to implement this policy must avoid compromising or inhibiting the interests of individual participants who seek commercial applications of their research results in a private or vertical sphere of operations. This two-pronged approach could stabilize the status quo and reinvigorate the flagging cooperative ethos in the zone of informal data exchanges as more individual researchers and small communities experienced the benefits of electronically linked access to virtual archives and discovered the productive gains likely to flow from collaborative, interdisciplinary, and cross-sectoral uses.
From an institutional perspective, however, organizing and implementing such a two-pronged approach to data exchanges in the informal zone presents certain difficulties not encountered in the formal zone of inter-university relations. Here the playing field is much broader, the players are more autonomous and unruly, and the power of federal funders directly to impose top-down regulations has traditionally been weak or underutilized. The moral authority of these funders nonetheless remains strong, and peer pressures in support of the sharing ethos-
would become more effective if a consensus developed that the two-pronged approach we envision actually yielded tangible benefits at acceptable costs.
Much, therefore, depends on short-term, bottom-up initiatives that rely on individual decisions to opt for standardized, research-friendly licensing agreements in place of the defensive, ad hoc transactions that currently hinder the flow of data streams in this sector. Here, the solution is to provide individual researchers with a tool kit for constructing prefabricated, exchange transactions on community-approved terms and conditions. The tool kit would contain a menu of standard-form contractual templates that individual researchers could use to license data, and the templates adopted would be posted online to facilitate electronic access to networks of nodes. These templates would cover a variety of situations and offer a range of ad hoc choices, all aimed at maximizing disclosure in both digital and nondigital mediums for public research purposes.
For this endeavor to succeed, however, the templates in question would clearly need to allow participating researchers and their communities to make data available on conditions that expressly precluded licensees from unauthorized commercial uses or follow-on applications. Although this suggests the need to deviate from true public-domain principles once again, one should remember that, in the informal zone as it stands today and is likely to develop, secrecy and denial of access are already well-established, countervailing practices. One can hardly argue that permitting conditional availability would undermine the norms of science in this zone, given the inability of those norms to adequately defend the interests of public research in unrestricted flows of data at the present time.
The object, rather, is to invigorate those sharing norms by reconciling them with the commercial needs and opportunities of the researchers operating in the informal zone, to elicit more overall benefits for public science under a second-best arrangement than could be expected to emerge from brokered individual transactions in a high-protectionist legal environment. This strategy requires a judicious resort to conditionality that would make it possible to forge digitally networked links between individual data suppliers and that would let their data flow across those links into a quasi-public space relatively free of restrictions on access and use for commercial purposes.
Given the larger number of players and the disparity of interests at stake, a logical starting premise is that only a small number of standard contractual templates seems likely to win the support of the general scientific community, at least initially. A true public domain option, of course, should be available for all willing to use it. For the rest, a limited menu of conditional public-domain provisions, such as those offered by the Creative Commons, should be sufficient. Clauses that delay certain uses for a specified period, or that delay competing publications based on, or derived from, a particular database for a specified period of time should also pass muster, so long as they remain consistent with the practices of the relevant scientific subcommunity. In the absence of any underlying intellectual property right, an additional clause reserving all other rights and excluding unauthorized commercial uses and applications would complete the limited, “copyleft” concept. We believe that even a small number of standard contractual templates that facilitated access and use of scientific data for public research purposes could exert a disproportionately large impact on the increasingly open, collaborative work in the networked environment.
In the scientific milieu, however, difficult problems of leakage and enforcement could also arise. To address these problems, the scientific community, perhaps under the auspices of the American Association for the Advancement of Science, would need to consider developing institutional machinery capable of assisting individual researchers who feared that their data had been used in ways that violate the terms and conditions of the standard-form licensing agreements they had elected to employ.
More complex or refined contractual templates are also feasible, but their use should normally depend less on individual choice and more on the consensus approval of discipline-specific communities. Moreover, in the informal zone, efforts to influence the terms and conditions applicable to private-sector uses seem much less likely to succeed than are similar efforts in the inter-university context.
Attempts to overregulate the zone of informal data exchanges should generally be avoided at this stage, lest they stir up unwarranted controversy and deter the more ambitious efforts to regulate inter-university transactions described above. The success of those efforts in the zone of formal data exchanges should greatly reinforce the norms of science generally. It would also exert considerable indirect pressures on those operating in the informal
zone to respect those norms and to emulate at least the spirit of any agreed contractual templates that had proved their merit in that context. The more that universities succeed in amalgamating their government-funded holdings into an effective, virtual archive or repository, the more that pressures would be brought to bear on individual researchers, research teams, and small communities to similarly make their data available in more formally constituted repositories. As a body of practice develops in both the formal and the informal zones, the most successful approaches and standards would become broadly adopted, and the desire to obtain the greater benefits likely to flow from more formalized arrangements should grow.
Meanwhile, efforts to regulate the zone of informal data exchanges should be viewed as an opportunity to strengthen the norms of science and to facilitate the creation of virtual networked archives electronically linking disparate and highly distributed data holders. The overall objective should be to generate more disclosure than would otherwise have been possible if all the players exercised their proprietary rights in total disregard of the need for a functioning research commons for nonprofit scientific pursuits. If successful, these modest efforts in the informal zone could alleviate some of the most disturbing erosions of the sharing ethos that have already occurred, and they could encourage federal funding agencies to take a more active role in regulating broader uses of research data. A successful application of “copyleft” techniques to the informal zone of academic research could also serve as a model for encouraging disclosure for public research purposes of more data generated in the private sector.
THE PRIVATE SECTOR
Scientific data produced by the private sector are logically subject to any and all of the proprietary rights that may become available. Here, the policy behind a contractually reconstructed research commons is not to defend the norms of science so much as to persuade the private sector of the benefits it stands to gain from sharing its own data with the scientific community for public research purposes. The goal is thus to promote voluntary contributions that might not otherwise be made to the true public domain or to the conditional domain for public research purposes on favorable terms and conditions.
From the perspective of public-interest research, of course, corporate contributions of otherwise proprietary data to a true public domain are the preferred option. Although the copyright paradigm reflected in the Supreme Court’s Feist decision presumably made the factual contents of commercially valuable compilations published in hard copy formats available for such purposes, the federal appellate courts have lately rebelled against Feist and made it harder for second-comers to separate noncopyrightable facts and information from the elements of original selection and arrangement that still attract copyright protection. Online access to noncopyrightable facts and data is further restricted by the stronger regime that prohibits tampering with technological fences that was embodied in the Digital Millennium Copyright Act of 1998, although the full impact of these provisions on scientific pursuits remains to be seen. Meanwhile, many commercial database publishers may be expected to continue to lobby hard for a strong database protection law on the E.U. model that would limit unauthorized extraction or use of the noncopyrightable contents of factual compilations, and it appears likely that Congress will again seek to enact a database protection statute in 2004.
In contrast to the research-friendly legal rules under the print paradigm, all the factual data and noncopyrightable information collected in proprietary databases are increasingly unlikely to enter the public domain and will instead come freighted with the restricted licensing agreements, digital rights management technologies, and sui generis intellectual property rights that characterize a high-protectionist legal environment. Under such a regime, open access and unrestricted use become possible if private-sector database compilers donate their data to public repositories or contractually agree to waive proprietary restrictions on controls that would otherwise impede access and use for public research purposes.
Some examples of both donated and contractually stipulated public-domain data collections from the private sector already exist. An example in the first category is provided in the presentation by Shirley Dutton.5 An example of the second type of arrangement is also provided by Michael Morgan in his presentation.6
Although pure public-domain models initiated by industry will no doubt continue to be the exception rather than the rule, the availability of data on a conditional public-domain basis, or at least on preferential terms and conditions to the not-for-profit research community, should enjoy far broader acceptance and ought to be promoted. Certainly, the existence of contractual templates, along the lines being developed by the Creative Commons, could help to encourage private-sector entities to make conditional deposits of data for relatively unrestricted access and use by public-interest researchers.
Scientific publications by private-sector scientists provide another valuable source of research data. However, these scientists labor under increasing pressures either to limit such publications altogether or to insist that publishers allow supporting data to be made available only on conditions that aim to preserve their commercial value. Although many academics in the scientific community oppose this practice, it is exactly what would proliferate if private-sector scientists held exclusive property rights in the data that allowed them to retain control even after publication. This sobering observation might induce the scientific community to reconsider the need to allow private-sector scientists to modify the bright-line disclosure rules otherwise applied to public-sector scientists to encourage them to disclose more of their data for nonprofit research purposes.
Even when companies remain unwilling to make their data available to nonprofit researchers on a conditional public-domain basis, there is ample experience with price discrimination and product differentiation measures favorable to academics. To the extent that the public research community does not constitute the primary market segment of the commercial data producer, either of these approaches will help promote access and use by noncommercial researchers without undue risks to the data vendor’s bottom line. The conditions under which such arrangements might be considered acceptable by commercial data producers will vary according to discipline area and type of data product, but it is in the interest of the public research community to identify such producers in each discipline and subdiscipline area and to negotiate favorable access and use agreements on a mutually acceptable basis.
The terms and conditions acceptable to private firms operating in the vertical dimension that opt into a public access commons arrangement might be fairly restrictive in their allowable uses, as compared with the conditions applicable under the standard-form templates implementing any of the other options discussed above. However, the goal of securing greater access to privately generated data with fewer restrictions justifies this approach because it makes data available to the research community that would otherwise be subject to commercial terms and conditions in a more research-unfriendly environment.
Finally, the importance of regulating the interface between university-generated data and private-sector applications was treated at length above, with a view to ensuring that the universities’ eagerness to participate in commercial endeavors did not compromise access to, and use of, federally funded data for public research purposes. Here, in contrast, it is worth stressing the benefits that can accrue from data transfers to the private sector whenever a framework for reducing the social costs of such transfers has been worked out to the satisfaction of both the research universities and the public funding sources. These arrangements are especially important if the exploitation, or applications, of any given database by the private sector would not otherwise occur in a nonproprietary environment.
Price discrimination and product differentiation can also facilitate socially beneficial interactions between the private sector and universities. For example, companies might consider licensing certain data to commercial customers on an exclusive-use basis for a limited period of time, after which the data in question would be licensed on preferential terms to nonprofit users or even revert to an open-access status. This strategy may work successfully in the case of certain environmental data, where most commercially valuable applications are produced in real time or near-real time and can then be made available at lower cost and with fewer restrictions for retrospective research that is less time dependent. Such an approach might not work in other research areas, such as biotechnology, however, where a delay in access may not be an acceptable trade-off or that delay is too long to preserve competitive research values.