In the first session of this symposium, we described some of the potentially limitless possibilities for research and innovation that might ensue from using digital technologies to exploit scientific data available from the public domain as it was traditionally constituted. However, these prospects dim the moment we consider the ramifications for science from the economic, legal, and technological assaults on the public domain that are described in the Session 2 presentations. Here, we explore some of the likely negative implications of these trends for science and innovation unless science policy directly addresses these risks.
In the interests of clarity, I outline the effects of present trends on a sectoral basis, in keeping with the functional map of public-domain data flows presented previously.1 I begin with the government’s role as primary producer of such data and then consider the implications of present trends for academia and for our broader innovation system.
If a basic trend is to shift more data production and dissemination activities from government to the private sector, one should recognize at the outset that the social benefits can exceed the costs under the right set of circumstances. In principle, private database producers may operate more efficiently and attain qualitatively better results than government agencies. Positive results are especially likely when markets have formed; competition occurs; and the public interest, including the needs of the research community that was previously served by the government activity, continues to be met.
There are also numerous drawbacks associated with this trend, however, that require careful consideration. To begin with, the private data supplier will seldom be in a position to produce the same quantity and range of data as a government agency and still make a profit while charging prices that users can afford. In other words, the government agency has typically taken on the task of data production and dissemination as a public good precisely because the social need outweighs the market opportunities. Social costs begin to rise if the profit motive induces the private supplier to reduce the quantity and range of data to be produced or made available. For example, a private data producer typically markets highly refined data products to end users in relatively small quantities, whereas basic research, particularly in the observational sciences, generally requires raw or less commercially refined data in voluminous quantities. On the whole, overzealous privatization of the government’s data produc-
tion capabilities poses real risks for both science and innovation, because the private sector simply cannot or will not duplicate the government’s public-good functions and still make a profit, not to say extract maximum rents.
Moreover, unless the private sector can demonstrably produce and distribute much the same data more effectively and with higher-quality standards than a government agency, privatization may become little more than a sham transaction. On this scenario, the would-be entrepreneur merely captures a government function and then licenses data back to a captive market at much higher prices and greatly increased restrictions on access and use. In the absence of market-induced competition, there is a very high risk of trading one monopolist with favorable policies toward science and the broader society—the government—for another monopolist driven entirely by the profit motive and the restrictions that makes necessary.
Absent a sham transaction, one cannot say a priori that any given privatization project necessarily results in a net social loss. The outcome will depend on the contracts the agency stipulates and on the steps it is willing to take to ensure continued access to data for research purposes on reasonable terms and conditions. In contrast to buying data collection services, the licensing of data and information products from the private sector raises serious questions about the types of controls the private sector places on the redistribution and uses of such data and information that the government can subsequently undertake. If the terms of the license are onerous to the government and access, use, and redistribution are substantially restricted, as they almost always are, neither the agency nor the taxpayer is well served. This is particularly true in those cases where the data that need to be collected are for a basic research function or serve a key statutory mission of the agency.
A classic example of what can go wrong was the privatization of the Landsat earth remote sensing program in the mid-1980s. Following the legislatively mandated transfer of this program to the Earth Observation Satellite (EOSAT) Company, the price per scene rose more than 1,000 percent, and significant restrictions were imposed even on nonprofit research uses. Use by both government and academic scientists plummeted, and subsequent studies showed the extent to which both basic and applied research in environmental remote sensing was set back. This experiment also failed in commercial terms, as EOSAT became unable to continue operations after a few years.
The legal and technological pressures identified in this symposium will also affect the uses that are made of government-funded data in academic and other nonprofit institutions. They will intensify the tensions that already exist between the sharing norms of science and the need to restrict access to data in pursuit of increased commercial opportunities.
Although the enhanced opportunities for commercial exploitation that new intellectual property rights (IPRs) and related developments make possible are clear, they will affect the normative behavior of the scientific community gradually and unevenly. Academics are already conflicted in this emerging new environment, and these conflicts are likely to grow. As researchers in public science, they need continued access to a scientific commons on acceptable terms, and they are expected to contribute to it in return. As members of academic institutions, however, they are increasingly under pressure to transfer research results to the private sector for gain, and they themselves may want to profit from the new commercial opportunities.
The government itself fuels these conflicts by the potentially contradictory policies that underlie its funding of research. One message reminds scientists of their duties to share and disclose data, in keeping with the traditional norms of science. The other, more recent, message delivered by the Bayh-Dole Act urges them to transfer the fruits of their research to the private sector or to otherwise exploit the intellectual property protection their research may attract.
At the moment, these conflicts are strongest where the line between basic and applied science has collapsed, and where commercial opportunities are inherent in most projects. Obvious examples are biotechnology and computer science. In the future, the enactment of a powerful IPR in collections of data might be expected to push these tensions into other areas where the lines between basic and applied research remain somewhat clearer and the pressures to commercialize research results have been less noticeable thus far. In exploring the implications of these developments for academic research, we continue to focus attention on the two distinct, but overlapping, research domains we previously characterized as “formal” and “informal.”
In what we term the formal sector, science is conducted within structured research programs that establish guidelines for the production and dissemination of data. Typically, data are released to the public in connection
with the publication of research results. Data may also be disclosed in connection with patent applications and supporting documentation. One should recall that, even without regard to the mounting legal and technological pressures, there are strong economic pressures that already limit the amount of data investigators are inclined to release at publication or in patent applications, there are growing delays in releasing those data as researchers consider commercialization options, and more of the data that are released come with various restrictions.
The enactment of a hybrid IPR in collections of data such as the E.C. Database Directive would introduce a disruptive new element into an already troubled academic environment. To some extent, this development would tend to erase some of the previous distinctions between the “formal” and “informal” domains. In both domains, access to data might nonetheless have to be secured by means of brokered, negotiated transactions, and this outcome is rife with implications. For present purposes, it seems clear that any database protection law, coupled with the other legal and technological measures discussed previously, will further undermine the sharing ethos and encourage the formation of a strategic trading mentality, based on self-interest, that already predominates in the informal domain.
We also predict that these pressures will necessarily tend to blur and dilute the importance of publication as the line of demarcation between a period of exclusive use in relative secrecy and ultimate dedication of data to the public. Suddenly, such a right would make it possible to publish academic research for credit and reputation while retaining ownership and control of the underlying data, which would no longer automatically lapse into the public domain. Once databases attract an exclusive property right valid against the world, the legal duty of scientists publishing research results to disclose the underlying data would depend on codified exceptions permitting use for verification and for certain “reasonable” nonprofit research and educational purposes. We recognize that this new proprietary default rule must ultimately be reconciled in practice with the disclosure obligations of the federal funding agencies. Our point is that the new default rule nonetheless places even published data outside of the public domain, and we note further that much academic research is not federally funded or is not funded in ways that waive such disclosure requirements.
Moreover, the role of academic journal publishers in this new legal environment bears consideration. At present, scientists tend to assign their copyrights to such publishers on an exclusive basis, and many of these journals now produce electronic versions, sometimes exclusive of a print version. This already complicates matters because, as discussed in Session 2, the data that traditional copyright law puts into the public domain may be fenced to a still unknown extent by the technological measures that the Digital Millennium Copyright Act reinforces. If, in addition, a database law is enacted, any data that the scientist assigns to the publisher with the article will become subject to the statutory regime. The publisher would then be in a position to control subsequent uses of the data and to make them available online under a licensed subscription or pay-per-use basis, and with additional restrictions on extraction or reuse.
Even if individual scientists are willing and able to resist the demands for exclusive assignments of both their copyrights and any new database rights, the fact remains that publication of the article in a journal will no longer automatically release the data into the public domain as before. On the contrary, unless the scientist waives the new restrictive default rule, even the data, revealed in the publication itself, will remain subject to the scientist’s exclusive right of extraction and reuse, at least as formulated under the E.C. database protection model.
With or without a new statutory database right in the United States, scientists in public research also appear certain to come under increasing pressure to retain data for commercial exploitation. The research universities are already deeply committed to maximizing income under Bayh-Dole, with varying degrees of success, and they will logically extend these practices and procedures to the commercialization of databases as valuable research tools. A key question is whether they will make the commercialized data available for academic research on reasonable terms and conditions.
As with government-generated data, university efforts to commercially exploit their databases could produce net social gains under the right set of circumstances. In addition to the incentives to generate new and more refined data products that an IPR may promote, greater efforts may be made to enhance the quality and utility of selected databases than would otherwise be the case. Absent such incentives, many scientists may not take pains to organize and document their data for easy use by others, particularly outside their immediate discipline, and they may not refine their data beyond the level needed to support their own research need and related publication
objectives. Legal incentives may thus stimulate the production of more refined databases, especially where markets for such products have formed.
At the same time, these new commercial opportunities tempt university administrators and academics to attenuate or modify the sharing and open-access norms of science and to circumvent obligations in this regard that the federal agencies have established. Were this to occur on a large-scale basis, the unintended harms to research could greatly exceed those we are accustomed to coping with concerning patented inventions under Bayh-Dole. The licensing of academic databases, reinforced by a codified IPR, would thus limit the quantity and quality of data heretofore available from the public domain.
At present, the primary bulwarks against such a breakdown of the sharing ethos are the formal requirements of the federal funding agencies, which in many cases continue to require that data from the research projects they fund should be transferred at some point to public repositories, or made available upon request. To avoid the negative results we envision, the agencies would have to strengthen these requirements—and their enforcement— and adapt them to the emerging high-protectionist intellectual property environment. We elaborate further on this topic in the next session. The point for now is that, absent express overrides that universities voluntarily adopt or that funding agencies impose in their research grants and contracts, the new restrictive default rules of ownership and control will automatically take effect if Congress enacts a database protection law. Indeed, they could become general practice even without such a law as the result of routine, unregulated database licensing practices.
In the informal zone, researchers are not yet ready to publish, or they are working independently on “small science” projects beyond the formal controls and requirements of a federal research program that requires open access or public deposit. This includes research funded by state governments, foundations, and the universities themselves, which leave more discretion in these matters to researchers, and by private companies, which normally require secrecy.
Much of what has been said about the effects of the new legal and technological pressures on the formal academic zone thus applies with even greater force to the informal zone because the impetus to commercialize data will encounter fewer regulatory constraints. The changing mores likely to undermine disclosure and open access in the formal zone will make it ever harder to organize cooperative networks in the less structured and more unruly informal domain.
These tendencies would predictably become more pronounced over time as more scientists became aware of the new possibilities to retain ownership and control of data, even after publication of research results. Indeed, one would logically expect that strategic behavior in the informal zone would increasingly be geared to efforts to maximize advantages from postpublication opportunities. Should this occur, academics themselves would exert pressure on the federal system that defends open access and on their universities to fall in line with the needs of commercial partners.
One can thus project a kind of cascading effect if a strong database protection right is enacted and the scientific community fails to take steps to preserve and reinforce the research commons. On this view, today’s formal zone built around release of data into the public domain at publication would begin to resemble the informal zone, while that same informal zone would look more and more like the private sector. Under these circumstances, one cannot necessarily assume that the open-access policies currently supporting the formal sector would continue in force, in which case, even basic research could be adversely affected, as occurred in the United Kingdom in the 1980s.
What the new equilibrium that will result from the conflict between these privatizing and commercializing pressures on the one hand, and the traditional norms of public science on the other, will look like cannot be predicted with any degree of certainty. In previous articles, however, we have outlined the cumulative negative effects that such tendencies likely would have on scientific endeavor. For the sake of brevity, we recall them here in summary form:
less effective domestic and international scientific collaboration, with serious impediments to the use, reuse, and transformation of factual data that are the building blocks of research;
increased transaction costs driven by the need to enforce the new legal restrictions on data obtained from different sources, by the implementation of new administrative guidelines concerning institutional acquisitions and uses of databases, and associated legal fees;
monopoly pricing of data and anticompetitive practices by entities that acquire market power, or by first entrants into niche markets that predominate in many research areas; and
less data-intensive research and lost opportunity costs.
What could well be the greatest casualty are the new opportunities that digital networks provide to create virtual information commons within and across discipline-specific communities that are built around optimal access to and exchange of scientific data. To the extent that public science becomes dominated by brokered intellectual property transactions, then the resulting combination of high transaction costs, unbridled self-interest, and anticommons effects would defeat the fragile cooperative arrangements needed to create and maintain such virtual information commons and the distributed research opportunities they make possible.
Finally, to see why some critics in the United States harbor deep concerns about the long-term consequences of the E.U.’s approach, it suffices to grasp how radical a change it would introduce into the domestic system of innovation and to consider how great the risks of such change really are. Traditionally, U.S. intellectual property law has not protected investment as such, a tradition that still has constitutional underpinnings. At the same time, the national system of innovation depends on enormous flows of mostly government-generated or government-funded scientific and technical data and information upstream, which everyone is free to use, and on free competition with respect to downstream information goods.
The domestic intellectual property laws protected downstream bundles of information in two situations only: copyrightable works of art and literature and patentable inventions. However, the following conditions apply in both cases:
these regimes require relatively large creative contributions based on free inputs of information and ideas;
they presuppose a flow of unprotected information and data upstream; and
they presuppose free competition with regard to the products of mere investment that are neither copyrightable nor patentable.
As previously observed, the E.C.’s Database Directive changes this approach, as would the last parallel proposal, H.R. 354, to enact strong database rights in the United States. Specifically, these sui generis regimes confer a strong and, in the European Union, potentially perpetual exclusive property right on the fruits of mere investment, without requiring any creative contribution. They also convert data and information—the previously unprotectible raw materials and basic inputs of the modern information economy—into the subject matter of this new exclusive property right.
The sui generis database regimes would thus effectuate a radical change in the economic nature and role of IPRs. Until now, the economic function of IPRs was to make markets possible where previously there existed a risk of market failure due to the public-good nature of intangible creations. Exclusive rights make embodiments of intangible public goods artificially appropriable, they create markets for those embodiments, and they make it possible to exchange payment for access to these creations.
In contrast, an exclusive IPR in the contents of databases breaks existing markets for downstream aggregates of information, which were formed around inputs of information largely available from the public domain. In effect, the sui generis database regimes create new and potentially serious barriers to entry to all existing markets for intellectual goods owing to the multiplicity of new owners of upstream information in whom they vest exclusive rights, any one of whom can hold out and all of whom can impose onerous transaction costs analogous to the problem of multimedia transactions under copyright law. This thicket of rights fosters anticommons effects, and the database laws appear to be ideal generators of this phenomenon.
Under the new sui generis database regime, in short, there is a built-in risk that too many owners of information inputs will impose too many costs and conditions on all the information processes we now take for granted in the information economy. At best, the costs of research and development activities might be expected to rise across the entire economy, well in excess of benefits, owing to the potential stranglehold of data suppliers on the raw materials. This stranglehold will increase with market power if databases are owned by sole-source providers. Over time, the comparative advantage from owning a large, complex database will tend progressively to elevate these barriers to entry.
Supporters of strong database protection laws and of strong contractual regimes to reinforce them believe that the benefits of private property rights are without limit, and that more is always better. They expect that these powerful legal incentives will attract huge resources into the production of electronic databases and information goods. In contrast, critics fear that an exclusive property right in noncopyrightable collections of data, coupled with the proprietors’ unlimited power to impose adhesion contracts in the course of online delivery, will compromise the operations of the national system of innovation, which depend on the relatively free flow of upstream data and information. In place of the explosive production of new databases that proponents envision, opponents of a strong database right predict a steep rise in the costs across the information economy and a progressive balkanization of that economy, in which fewer knowledge goods may be produced as more tithes have to be paid to more information rent seekers along the way.