The digital revolution has made investors acutely aware of the heightened value that collections of data and information may acquire in the new information economy. Attention has logically focused on the incentive and protective structures for generating and disseminating digital information products, especially online. Although most of the legal and economic initiatives have been focused on—and driven by—the entertainment sector, software, and large publishing concerns, significant focus has been devoted to the possibility that commoditization of even public-sector and public-domain data would stimulate substantial investments by providing new means of recovering the costs of production. Moreover, investors have increasingly understood the economic potential that awaits those who capture and market data and information as raw materials or inputs into the upstream stages of the innovation process.
What follows focuses first on pressures to commoditize data in the public sector and then on legal and technological measures that endow database producers with new proprietary rights and with novel means of exploiting the facts and data that copyright law had traditionally left in the public domain. These pressures arise both from within the research community itself and from forces extraneous to it. How that community responds to these pressures will over time determine the future metes and bounds of the information commons that support scientific endeavors.
If, as we have reason to fear, current trends will greatly diminish the amount of data available in the public domain, this decrease could initially compromise the scientific community’s ability to fully exploit the promise of the digital revolution. Moreover, if these pressures continue unabated and become institutionalized at the international level, they could disrupt the flow of upstream data to both basic and applied science and undermine the ability of academia and the private sector to convert cumulative data streams into innovative products and services.
The pressures discussed here also pose serious conflicts between the norms of public science and the norms of private industry. We contend that failure to resolve these conflicts and to properly balance the interests at stake in preserving an effective information commons could eventually undermine the national system of innovation.
COMMODITIZATION OF DATA IN PUBLIC SCIENCE
During the past 10 years, there has been a marked tendency to shift the production of science-relevant databases from the public to the private sector. This development occurs against the background of a broader trend in which the government’s share of overall funding for research and development vis-à-vis that of the private sector has decreased from a high of 67 percent in the 1960s to 26 percent in 2000. Furthermore, since the passage of the Bayh-Dole Act in 1980, the results of federally funded research at universities have increasingly been commercialized either by public–private partnerships with industry or directly by the universities themselves.
Reducing the Scope of Government-generated Data
The budgetary pressures on the government are both structural and political in nature. On the whole, mandated entitlements in the federal budget, such as Medicare and Medicaid, are politically impossible to reduce; as their costs mount, the money available for other discretionary programs, including federally sponsored research, has shrunk as a percentage of total expenditures.
This structural limitation is compounded by the rapidly rising costs of state-of-the-art research, including some researcher salaries, scientific equipment, and major facilities. With specific regard to the information infrastructure, researchers earmark the lion’s share of expenses to computing and communications equipment, with the remainder devoted to managing, preserving, and disseminating the public-domain data and information that result from basic research and other federal data collection activities. The government’s scientific and technical data and information services are thus the last to be funded and are almost always the first to suffer cutbacks.
For example, the National Oceanic and Atmospheric Administration’s (NOAA) budget for its National Data Centers remained flat and actually decreased in real dollars between 1980 and 1994, whereas its data holdings increased exponentially and the overall agency budget doubled (mostly to pay for new environmental satellites and a ground-based weather radar system that are producing the exponential data increases). Information managers at most other science agencies have complained about reductions in funding, for both their data management and scientific and technical information budgets.
These chronic budgetary shortfalls for managing and disseminating public-domain scientific data and information have been accompanied by recurring political pressures on the scientific agencies to privatize their outputs. Until recently, for example, the common practice of the environmental and space science agencies was to procure data collection systems, such as observational satellites or ground-based sensor systems, from private companies. Such procurements were made under contract and pursuant to government specifications based on consensus scientific requirements recommended by the research community. Private contractors would thus build and deliver the data collection systems, which the agencies would then operate pursuant to their mission. All data from the system would then belong to the government and would enter the public domain.
Today, however, industry has successfully pursued a strategy of providing an independent supply of the government’s needs for data and information products rather than building and delivering data collection systems for government agencies to operate. This solution leaves the control and ownership of the resulting data in the hands of the company, and allows it to license them to the government and to anyone else willing to pay. Because of this new-found role of the government agency as a cash cow, there has recently been a great deal of pressure on the science agencies, particularly from Congress, to stop collecting or disseminating data in-house and to obtain them from the private sector instead.
This approach previously resulted in at least one well-documented fiasco, namely, the privatization of the NASA–NOAA Landsat program in 1985, which seriously undermined basic and applied research in environmental remote sensing in the United States for the better part of a decade. More recently, the Commercial Space Act of 1998 directed NASA to purchase space and earth science data collection and dissemination services from the private sector and to treat data as commercial commodities under federal procurement regulations. The meteorological data value-adding industry has directed similar lobbying pressures at NOAA. The photogrammetric industry has likewise indicated a desire to expand the licensing of data products to the U.S. Geological Survey and to other federal agencies.
Efforts have also been made by various industry groups to limit the online information dissemination services of several federal science and technology agencies. In the cases of the patent database of the U.S. Patent and Trademark Office, the PubMed Central database of peer-reviewed life science journal literature provided on a free and unrestricted basis by the National Institutes of Health National Library of Medicine, and certain types of weather information disseminated by the National Weather Service, such efforts have proved unsuccessful to date. However, publisher groups did succeed in terminating the Department of Energy’s PubScience Web portal for physical science information.
Commercial Exploitation of Academic Research
Turning to government-funded research activities, the trend of greatest concern for purposes of this chapter is the progressive incorporation of data and data products into the commercialization process already under way in academia. The original purpose of the Bayh–Dole Act and related legislation was primarily to enable universities to obtain patents on applications of research results. More recently, this activity has expanded to securing both patents and copyrights in computer programs. Now, databases used in molecular biology have themselves become sources of patentable inventions, and the potential commercial value of these databases as research tools has attracted considerable attention and controversy.
These and other databases have increasingly been subject to licensing agreements prepared by university technology transfer offices, which may be prone to treat databases like other objects of Material Transfer Agreements. The default rules that such licensing agreements tend to favor are exclusive arrangements under onerous terms and conditions that include restrictions on use, and even grant-back and reach-through clauses claiming interests in future applications.
Moreover, there is a growing awareness in academic circles generally that data and data products may be of considerable commercial value, and individual researchers have become correspondingly more wary of making them as available as before. This trend, together with the pressures on government agencies described previously, would pose serious problems for the research community’s abilities to access and use needed data resources under any circumstances. In reality, these problems could become much greater as new legal and technological fencing measures become more broadly implemented.
INTELLECTUAL PROPERTY, E-CONTRACTS, AND TECHNOLOGICAL FENCES
Traditional copyright law was friendly to science, education, and innovation by dint of its refusal to protect either facts or ideas as eligible subject matter; by limiting the scope of protection for compilations and other factual works to the stylistic expression of facts and ideas; by carving out express exceptions and immunities for teaching, research, and libraries; and by recognizing a catchall, fall-back “fair-use” exception for nonprofit research and other endeavors that advanced the public interest in the diffusion of facts and ideas at relatively little expense to authors. Reinforcing these policies were various judicial decisions and partially codified exceptions for functionally dictated components of literary works, which take the form of nonprotectible methods, principles, processes, and discoveries. On the whole, these principles tended to render facts and data as such ineligible for protection and to allow researchers to access and use facts and data otherwise embodied in protectible works of authorship without undue legal impediments.
In contrast, recent legal developments in intellectual property law and contracts law have radically changed the preexisting regime. These and other related developments now make it possible to assert and enforce proprietarial claims to virtually all the factual matter that previously entered the public domain the moment it was disclosed.
Some of the earliest changes were intended to bring U.S. copyright law into line with longstanding norms of protection recognized in the Berne Convention. For example, the principle of automatic copyright protection, the abolition of technical forfeiture due to lack of formal prerequisites, such as notice, and the provision of a basic term of protection lasting for the life of the creator plus 50 years were all measures adopted in the predigital era for this reason.
Beginning in the 1980s, however, the United States took the lead in reshaping the Berne Convention itself to accommodate computer programs, which many commentators and governments had preferred to view as “electronic information tools” subject to more procompetitive industrial property laws, including patents, unfair competition, and hybrid (or sui generis) forms of protection. By the 1990s, a coalition of “content providers” concerned about online copying of movies, music, and software in the new digital environment had persuaded the U.S. government to press for still more far-reaching changes of international copyright and related laws. These efforts led to the codification of universal copyright norms in the Trade-Related Intellectual Property Rights (TRIPS) Agreement of 1994 and to two 1996 World Intellectual Property Organization (WIPO) treaties on copyrights and related rights in cyberspace, which endowed authors with a bevy of new exclusive rights tailor-made for online transmissions and imposed unprecedented obligations on participating governments to prohibit electronic equipment capable of circumventing these rights. All of these new norms and obligations, ostensibly adopted to discourage market-destructive copying of literary and artistic works, then became domestic law, often with no regard for their impact on science and sometimes with deliberate disregard of measures adopted to safeguard science and education at the international level.
At the same time, and as part of the same overall movement, the coalition of content providers that had captured Congress’ attention took aim at two closely related areas in which much more than market-destructive copying was actually at stake. The first of these was to validate the uncertain status of standard-form electronic contracts used to regulate online dissemination of works in digital form. Because traditional contracts and sales laws can be interpreted in ways that limit the kinds of terms that can be imposed through “shrinkwrap” or “clickon” licenses, and the one-sidedness of the resulting “adhesion contracts,” the coalition pushing the high-protectionist digital agenda has also sponsored a new uniform law, the Uniform Computer Information Transactions Act, to validate such contracts in the form they desire, and it has lobbied state legislatures to adopt them.
The last major component of the high-protectionists’ digital agenda was an attempt by some of the largest database companies to obtain a sui generis exclusive property right in noncopyrightable collections of information, even though facts and data had hitherto been off-limits even to international copyright law as reformed under the TRIPS Agreement of 1994. These efforts culminated in the European Community’s Directive on the Legal Protection of Databases adopted in 1996; in a proposed WIPO treaty on the international protection of databases built on the same model, which was barely defeated at the WIPO Diplomatic Conference in December 1996; and in a series of database protection bills that have been introduced in the U.S. Congress and that attempt to enact similar measures into United States law.
Most of the developments outlined above resulted from efforts that were not undertaken with science in mind, although publishers who profit from distributing commercialized scientific products promoted some of the changes that appear most threatening for scientific research, especially database protection laws. The following subsections show that all these measures—whatever their ostensible purpose—have the cumulative effect of shrinking the research commons.
I will first briefly note the impact of selected developments in both federal statutory copyright law and in contract laws at the state level. I then discuss current proposals to confer strong exclusive property rights on noncopyrightable collections of data, which constitute the clearest and most overt assault on the public domain that has fueled both scientific endeavors and technological innovation in the past.
Expanding Copyright Protection of Factual Compilations: The Revolt Against Feist
The quest for a new legal regime to protect databases was triggered in part by the U.S. Supreme Court’s 1991 decision in Feist Publications, Inc. v. Rural Telephone Service Co., which denied copyright protection to the white pages of a telephone directory. As discussed above, that decision was notable for reaffirming the principle that facts and data as such were ineligible for copyright protection as “original and creative works of authorship.” It also limited the scope of copyright protection to any original elements of selection and arrangement that otherwise met the test of eligibility. In effect, this meant that second-comers who developed their own criteria of selection and arrangement could in principle use prior data to make follow-on products without falling afoul of the copyright owner’s strong exclusive right to prepare derivative works. Taken together, these propositions supported the
customary and traditional practices of the scientific community and facilitated both access to and use of research data.
In recent years, however, judicial concerns about the compilers’ inability to appropriate the returns from their investments have induced leading federal appellate courts to broaden copyright protection of low authorship compilations in ways that significantly deform both the spirit and the letter of Feist. At the eligibility stage, so little in the way of original selection and arrangement is now required that the only print media still certain to be excluded from protection are the white pages of telephone directories.
More tellingly, the courts have increasingly perceived the eligibility criteria of selection and arrangement as pervading the data themselves to restrain second-comers from using preexisting datasets to perform operations that are functionally equivalent to those of an initial compiler. In the Second Circuit, for example, a competitor could not assess used car values by the same technical means as those embodied in a first-comer’s copyrightable compilation, even if those means turned out to be particularly efficient.2 Similarly, the Ninth Circuit prevented even the use of a small amount of data from a copyrighted compilation that was essential to achieving a functional result.3
Copyright law provides a very long term of protection, and it endows authors, including eligible database proprietors, with strong rights to control follow-on applications of the protectible contents of their works. Stretching copyright law to cover algorithms and aggregates of facts (and even so-called “soft ideas”) as these recent decisions have done conflates the idea-expression dichotomy and indirectly extends protection to facts as such.
Opponents of sui generis database protection in the United States cite these and other cases as evidence that no sui generis database protection law is needed. In reality, these cases suggest that, in the absence of a suitable minimalist regime of database protection to alleviate the risk of market failure without impoverishing the public domain, courts tend to convert copyright law into a roving unfair competition law that can protect algorithms and other functional matter for very long periods of time and that could create formidable barriers to entry. This tendency, however, ignores the historical limits of copyright protection and ultimately jeopardizes access to the research commons.
The Digital Millennium Copyright Act of 1998: An Exclusive Right to Access Copyrightable Compilations of Data?
With regard to copyrightable compilations of data distributed online, amendments to the Copyright Act of 1976, known as the Digital Millennium Copyright Act of 1998 (DMCA), may have greatly reduced the traditional safe-guards surrounding research uses of factual works. Technically, Section 1201(a) establishes a right to prevent the direct circumvention of any electronic fencing devices that a content provider may have employed to control access to a copyrighted work delivered online. Section 1201(b) then perfects the scheme by exposing manufacturers and suppliers of equipment capable of circumventing electronic fencing devices to liability for copyright infringement when such equipment can be used to violate the exclusive rights traditionally held by copyright owners.
In enacting these provisions, Congress seems to have detached the prohibition against gaining unauthorized direct access to electronically fenced works under Section 1201(a) from the balance of public and private interests otherwise established in the Copyright Act of 1976. As Professor Jane Ginsburg interprets this provision, a violation of Section 1201(a) is not an “infringement of copyright” because it attracts a separate set of distinct remedies set out in Section 1203 and because it constitutes “a new violation” for which those remedies are provided.4 On this reading, unlawful access is not subject to the traditional defenses and immunities of the copyright law, and one is “not. . .permitted to circumvent the access controls, even to perform acts that are lawful under the Copyright Act,” including presumably the user’s rights to extract unprotectible facts and ideas or to invoke the “fair use” defense.5 On the contrary, “Congress may in effect have extended copyright to cover ‘use’ of
works of authorship, including minimally original databases. . .because ‘access’ is a prerequisite to ‘use,’ [and] by controlling the former, the copyright owner may well end up preventing or conditioning the latter.”6
While the precise contours of these provisions remain to be worked out in future judicial decisions, they could potentiate the ability of both publishers and scientists to protect online collections of data that were heretofore unprotectible in print media. If, for example, a database provider combined the noncopyrightable collection of data with a nominally copyrightable component, such as an analytical explanation of how the data were compiled, the “fig leaf” copyrightable component might suffice to trigger the “no direct access” provisions of Section 1201(a).7 In that event, later scientific researchers could not circumvent the electronic fence in order to extract or use the noncopyrightable data, even for nonprofit scientific research, because Section 1201(a) does not recognize the normal exceptions to copyright protection that would allow such use and scientific research is not one of the few very limited exceptions that were codified in Section 1201(d)-(j).
Later researchers would thus have to acquire lawful access to the electronically fenced database under Section 1201(a) and then attempt to extract the noncopyrightable data for nonprofit research purposes under Section 1201(b), which does in principle recognize the traditional users defenses as well as the privileges and immunities codified in Sections 107-122 of the Copyright Act of 1976. Even here, however, later scientists could discover that the technical devices they had used to extract nonprotectible data from minimally copyrightable databases independently violated Section 1201(b) of the DMCA because those devices were otherwise capable of substantial infringing uses.8 In practice, moreover, the posterior scientists’ theoretical opportunity to extract noncopyrightable data by technical devices that did not violate Section 1201(b) could already have been compromised by the electronic contracts these scientists will have accepted in order to gain lawful access to the online database in the first place and thus to avoid the crushing power of Section 1201(a). In that event, the scientists would almost certainly have waived any user rights they had retained under Section 1201(b), unless the electronic contracts themselves became unenforceable on one ground or another, as discussed below.
In effect, the DMCA allows copyright owners to surround their collections of data with technological fences and electronic identity marks buttressed by encryption and other digital controls that force would-be users to enter the system through an electronic gateway. To pass through the gateway, users must accede to non-negotiable electronic contracts, which impose the copyright owner’s terms and conditions without regard to the traditional defenses and statutory immunities of the copyright law.
The DMCA indirectly recognized the potential conflict between proprietors and users of ineligible material, such as facts and data, that Section 1201(a) of the statute could thus trigger, and it empowered the Copyright Office, which reports to the Librarian of Congress, to exempt categories of users whose activities could be adversely affected.9 While representatives of the educational and library communities have petitioned for relief on various grounds, including the need of researchers to access and use noncopyrightable facts and ideas transmitted online, the authorities have so far declined to act. It is too soon to know how far owners of copyrightable compilations can push this so-called “right of access” at the expense of research, competition, and free speech without incurring resistance based on the misuse doctrine of copyright law, on the public policy and unconscionability doctrines of state contract laws, and on First Amendment concerns that have in the past limited copyright protection of factual works. For the foreseeable future, nonetheless, the DMCA empowers owners of copyrightable collections of facts to contractually limit online access to the pre-existing public domain in ways that contrast drastically with the traditional availability of factual contents in printed works.
ONE-SIDED ELECTRONIC LICENSING CONTRACTS
Data published in print media traditionally entered the public domain under the classical intellectual property regime described above. Further ensuring that result is an ancillary copyright doctrine, known as “exhaustion” or
“first-sale doctrine,” which limits the authors’ powers to control the uses that third parties can make of copyrighted literary works distributed to the public in hard copies.
Under this doctrine, the copyright owner may extract a profit from the first sale of the copy embodying an original and protectible compilation of data, but cannot prevent a purchaser from reselling that physical copy or from using it in any way the latter deems fit, say, for research purposes, unless such uses amount to infringing reproductions, adaptations, or performances of the expressive components of the copyrighted compilation. In effect, copyright law not only made it difficult to protect compilations of data as such, it denied authors any exclusive right to control the use of a protected work once it had been distributed to the public.
The first-sale doctrine thus complements and perfects the other science-friendly provisions described above, unless individual scientists, libraries, or scientific entities were to contractually waive their rights to use copies of purchased works in the manner described above. Such contractual waivers always remain theoretically possible, and publishers have increasingly pressed them upon the scientific and educational communities in the online environment for reasons discussed below. Nevertheless, it was not generally feasible to impose such waivers against scientists who bought scientific works distributed to the public in hard copies, and even when attempts to do so were made, such contracts could not bind subsequent purchasers of the copies in question. The upshot was that, precisely because authors and publishers could not rely on contractual agreements, they depended on the default rules of copyright law, which are binding against the world. These default rules, in turn, impose legislatively enacted “contracts,” which balance public and private interests by, for example, defining the uses that libraries can make of their copies and by further allowing a set of “fair uses” that scientists and other researchers can invoke.
Against this background, online delivery of both copyrightable and noncopyrightable productions possesses the inherent capabilities of changing the preexisting relationship between authors and readers or between “content providers” and “users.” By putting a collection of data online and surrounding it with technological fencing devices, publishers can condition access to the database on the would-be user’s acquiescing to the terms and conditions of the former’s “click-on,” standard-form, nonnegotiable contract (known as a “contract of adhesion”). In effect, online delivery solves the problems that the printing press created for contractually restricting the use of published works and it thus restores the “power of the two-party deal” that publishers lost in the sixteenth century.
The power of the two-party deal that online delivery makes possible is conceptually and empirically independent of statutory intellectual property rights, which makes it of capital importance for the theses discussed here. It means that anyone who makes data available to the world at large can control access to them and control their use by contract in ways that were inconceivable only a few years ago. Nevertheless, statutory intellectual property rights can reinforce the contractual powers of online vendors to prohibit would-be users from disarming encryption devices to gain entry or to limit the ability of would-be users to extract uncopyrightable facts and ideas from copyrightable works delivered online, or even to limit their ability to invoke the statutory defense of fair use. The DMCA lends itself to these ends, ostensibly with a view to impeding market-destructive copying, but with the result of strengthening the copyright monopoly at the expense of the public domain.
Online delivery, coupled with technological fencing devices, potentially confers these same contractual powers on content providers in the absence of supporting intellectual property regimes, such as the DMCA discussed above, and the new database protection rights to be discussed below. The highly restrictive digital rights management technologies that are being developed include hardware- and software-based “trusted systems,” online database access controls, and increasingly effective forms of encryption. These emerging technological controls on content, when combined with the statutory intellectual property and contractual rights, can supersede long-established user rights and exceptions under copyright law for print media and thereby eliminate large categories of data and information from public-domain access.
Moreover, because electronic contracts are enforceable in state courts, they provide private rights of action that tend to either substitute for or override statutory intellectual property rights. Electronic contracts become substitutes for intellectual property rights to the extent that they make it infeasible for third parties to obtain publicly disclosed but electronically fenced data without incurring contractual liability for damages. They may override statutory intellectual property rights, for example, by forbidding the uses that libraries could otherwise make of a scientific work under federal copyright law, or by prohibiting follow-on applications or the reverse
engineering of a computer program that both federal copyright law and state trade secret law would otherwise permit.
To the extent that these contracts are allowed to impose terms and conditions that ignore the goals and policies of the federal intellectual property system, they would establish privately legislated intellectual property rights unencumbered by concessions to the public interest. By the same token, a privately generated database protected by technical devices and electronic adhesion contracts is subject to no federally imposed duration clause and, accordingly, will never lapse into the public domain.
Whether electronic contracts—especially the nonnegotiable, standard-form “click on” and “shrinkwrap” contracts—are in fact enforceable remains an open and controversial question. In addition to technical obstacles to formation based on general contracts law principles, courts may deem such contracts unenforceable under the “public policy” defense of state contracts law, under the preemption doctrine that supports the integrity of the federal intellectual property system, or under some combination of the two. In practice, however, courts appear reluctant to exercise such powers even when their right to do so is clear. The most recent line of cases, led by the Seventh Circuit’s opinion in ProCD v. Zeidenberg,10 has tended to validate such contracts in the name of “freedom of contract.”
In this same vein, the National Council of Commissioners for Uniform State Law has proposed a Uniform Computer Information Transactions Act (UCITA), which, if state legislatures enacted it, would broadly validate electronic contracts of adhesion and largely immunize them from legal challenge. For example, UCITA permits vendors of information products to define virtually every transaction as a “license” rather than a “sale,” and it tolerates perpetual licenses. It could thus override the first-sale doctrine of copyright law and any analogous doctrine that might be embodied in the proposed database protection laws discussed below. The proposed uniform law would then proceed to broadly validate mass-market “click-on” and “shrink-wrap” licenses that imposed all the provisions vendors could hope for, with little regard for the interests of scientific and educational users, or the public in general.
A detailed analysis of UCITA’s provisions is beyond the scope of this discussion. Suffice it to say, however, that its less than transparent drafting process so favored the interests of sellers of software and other information products at the expense of consumers and users generally that a coalition of 16 state attorneys general vigorously opposed its adoption, and the American Law Institute withdrew its cosponsorship of the original project. Nonetheless, two states—Maryland and Virginia—have adopted nonuniform versions of UCITA, and major software and information industry firms continue to lobby assiduously for its enactment by other state legislatures.
If present trends continue unabated, privately generated information products delivered online—including databases and computer software—may be kept under a kind of perpetual, mass-market trade secret protection, subject to no reverse engineering efforts or public-interest uses that are not expressly sanctioned by licensing agreements. Contractual rights of this kind, backed by a one-sided regulatory framework, such as UCITA, could conceivably produce an even higher level of protection than that available from some future federal database right subject to statutory public-interest exceptions. The most powerful proprietary cocktail of all, however, would probably emerge from a combination of a strong federal database right with UCITA-backed contracts of adhesion.
New Exclusive Property Rights in Noncopyrightable Collections of Data
The challenge of protecting collections of information that fail to meet the technical eligibility requirements of copyright law poses a hard problem that has existed for a half-century or longer, and at least three different approaches have emerged over time. One solution was to allow a domestic copyright law to accommodate “low authorship” literary productions, with some adjustments to the bundle of rights at the margins. A second approach, adopted in the Nordic countries, was to enact a short-term sui generis regime, built on a distinctly copyrightlike model, that would protect catalogs, directories, and tables of data against wholesale duplication, without conferring on proprietors any exclusive adaptation right like that afforded to authors of true literary and artistic works. A third approach, experimented with at different times and to varying degrees in different countries, including the
United States, was to protect compilers of information against wholesale duplication of their products under different theories rooted in the “misappropriation” branch of unfair competition law.
What changed in the 1990s was the convergence of digital and telecommunications networks, which potentiated the role of electronic databases in the information economy generally and which made scientific databases in particular into agents of technological innovation whose economic potential may eventually outstrip that accruing from the patent system. Notwithstanding the robust appearance of the present-day database industry under free-market conditions, analysts asked whether inadequate investment in complex digital databases would not inevitably hinder that industry’s long-term growth prospects if free-riding second-comers could appropriate the contents of successful new products without contributing to their costs of development and maintenance over time. In other words, if copyright, contract law, digital rights management technologies, residual unfair competition laws, and various protective business practices inadequately filled a gap in the law, then regulatory action to enhance investment might be justified. This utilitarian rationale, however, raised new and still largely unaddressed questions about the unintended social costs likely to ensue if intellectual property rights were injudiciously bestowed upon the raw materials of the information economy in general and on the building blocks of scientific research in particular.
Any serious effort to find an appropriate sui generis solution to the question of database protection accordingly should have engendered an investigation of the comparative economic advantages and disadvantages of regimes based on exclusive property rights as distinct from regimes based on unfair competition laws and other forms of liability rules. This investigation also should have taken account of larger questions about the varying impacts of different legal regimes on freedom of speech and on the conditions of democratic discourse, which, in the United States at least, are of primary constitutional importance. Instead, the Commission of the European Communities cut the inquiry short by adopting the Directive on the Legal Protection of Databases in 1996.11 This directive required all E.U. member countries (and affiliated states) to pass laws that confer a hybrid exclusive property right on publishers who make substantial investments in noncopyrightable compilations of facts and information.
The European Union Database Directive in Brief
The hybrid exclusive right that the European Commission ultimately crafted in its Directive on the Legal Protection of Databases does not resemble any preexisting intellectual property regime. It protects any collection of data, information, or other materials that are arranged in a systematic or methodological way, provided that they are individually accessible by electronic or other means.12 To become eligible for protection, the database producer must demonstrate a “substantial investment,” as measured in either qualitative or quantitative terms,13 which leaves the courts to develop this criterion with little guidance from the legislative history. The drafters explicitly recognized that the qualifying investment may consist of no more than simply verifying or maintaining the database.
In return for this investment, the compiler obtains exclusive rights to extract or to utilize all or a substantial part of the contents of the protected database. The exclusive extraction right pertains to any transfer in any form of all or a substantial part of the contents of a protected database;14 the exclusive reutilization right, by contrast, covers only the making available to the public of all or a substantial part of the same database.15 In every case, the first-comer obtains an exclusive right to control uses of collected data as such, as well as a powerful adaptation (or derivative work) right along the lines that U.S. copyright law bestows on “original works of authorship,”16 even though such a right is alien to the protection of investment under existing unfair competition laws. In a recent interpretation of this provision, a U.K. court vigorously enforced this right to control follow-on applications of an original database against a value-adding second-comer.17 It took this position even though the proprietor was the sole source of the data in question and there was no feasible way to generate them by independent means.
The directive contains no provision expressly regulating the collections of information that member governments themselves produce. This lacuna leaves European governments that generate data free to exercise either copyrights or sui generis rights in their own productions in keeping with their respective domestic policies. This result contrasts sharply with the situation in the United States, where the government cannot claim intellectual property rights in the data it generates and must normally make such data available to the public for no more than a cost-of-delivery fee.
The directive provides no mandatory public-interest exceptions comparable to those recognized under domestic and international copyright laws. An optional, but ambiguous, exception concerning “illustrations for teaching or scientific research” applies to extractions but not reutilization.18 It may be open to flexible interpretation, and some member countries, notably the Nordic countries, have implemented this broader version. Other countries, notably France, Italy, and Greece, have simply ignored this exception altogether, which defeats the commission’s supposed concerns to promote uniform law.
The directive’s sui generis regime does exempt from liability anyone who extracts or uses an insubstantial part of a protected database.19 However, such a user bears the risk of accurately drawing the line between a substantial and an insubstantial part, and any repeated or systematic uses of even an insubstantial part will forfeit this exemption.20 Judicial interpretation has so far taken a very restrictive view of this exemption, and one cannot effectively make unauthorized extractions or uses of an insubstantial part of any protected database without serious risk of triggering an action for infringement.
Qualifying databases are nominally protected for a 15-year period.21 In reality, each new investment in a protected database, such as the provision of updates, will requalify that database as a whole for a new term of protection.22 In this and other respects, the scope of the sui generis adaptation right exceeds that of U.S. copyright law, which attaches only to the new matter added to an underlying, preexisting work and expires at a certain time.23
Finally, the directive carries no national treatment requirement into its sui generis component. Foreign database producers become eligible only if their countries of origin provide a similar form of protection or if they set up operations within the European Union.24 Nonqualifying foreign producers, however, may nonetheless seek protection for their databases under residual domestic copyright and unfair competition laws, where available.25
The E.C.’s Directive on the Legal Protection of Databases thus broke radically with the historical limits of intellectual property protection in at least three ways. First, it overtly and expressly conferred an exclusive property right on the fruits of investment as such, without predicating the grant of protection on any predetermined level of creative contribution to the public domain. Next, it conferred this new exclusive property right on aggregates of information as such, which had heretofore been considered as unprotectible raw material or as basic inputs available to creators operating under all other preexisting intellectual property rights. Finally, it potentially conferred the new exclusive property right in perpetuity, with no concomitant requirement that the public ultimately require ownership of the object of protection at the end of a specified period. The directive thus effectively abolished the very concept of a public domain that had historically justified the grant of temporary exclusive rights in intangible creations.
The Database Protection Controversy in the United States
The situation in the United States differs markedly from that which preceded the adoption of the E.C.’s Directive on the Legal Protection of Databases. In general, the legislative process in the United States has become relatively transparent. Since the first legislative proposal, H.R. 3531, which was modeled on the E.C. Directive and
introduced by the House Committee on the Judiciary in May 1996, this transparency has generated a spirited and often high-level public debate. Very little progress toward a compromise solution had been reached as of the time of writing, however, which is hardly surprising given the intensity of the opposing views, the methodological distance that divides them, and the political clout of the opposing camps.
We are, accordingly, left with the two basic proposals that were still on the table at the end of the legislative session that ended in 2000 at an impasse. These proposals, as refined during that session, represent the baseline positions that each coalition carried into the current round of negotiations. One bill, H.R. 354, as revised in January 2000, embodied the proponents’ last set of proposals for a sui generis regime built on an exclusive property rights model (although some effort was made to conceal that solution behind a facade that evoked unfair competition law). The other bill, H.R. 1858, set out the opponents’ views of a so-called minimalist misappropriation regime as it stood on the eve of the current round of negotiations.
The Exclusive Rights Model. The proposals embodied in H.R. 354 attempted to achieve levels of protection comparable to those of the E.C. Directive by means that are more congenial to the legal traditions of the United States. The changes introduced at the end of the 2000 legislative session softened some of the most controversial provisions at the margins, while maintaining the overall integrity of a strongly protectionist regime.
The bill in this form continued to define “collections of information” very broadly as “information . . . collected and . . . organized for the purpose of bringing discrete items of information together in one place or through one source so that persons may access them.”26 Like the E.C. Directive, this bill then cast eligibility in terms of an “investment of substantial monetary or other resources” in the gathering, organizing or maintaining of a “collection of information.”27 It conferred two exclusive rights on the investor: first, a right to make all or a substantial part of a protected collection “available to others,” and second, a right “to extract all or a substantial part to make available to others.” Here the term “others” was manifestly broader than “public” in ways that remained to be clarified.
H.R. 354 then superimposed an additional criterion of liability on both exclusive rights that is not present in the E.C. model. This is the requirement that, to trigger liability for infringement, any unauthorized act of “making available to others” or of “extraction” for that purpose must cause “material harm to the market” of the qualifying investor “for a product or service that incorporates that collection of information and is offered or intended to be offered in commerce.” The crux of liability under the bill thus derived from a “material harm to markets” test that is meant to cloud the copyrightlike nature of the bill and to shroud it in different terminology.
Here a number of concessions were made to the opponents’ concerns in the last public iteration of the bill (January 11, 2000), some of them real, others nominal in effect. The addition of “material” to the market harm test may, for example, have addressed complaints that proponents viewed “one lost sale” as constituting actionable harm to the market.
At the same time, the revised bill contained convoluted and tortuous definitions of “market” that the Clinton administration hoped would reduce the scope of protection in the case of follow-on applications.28 On closer inspection, however, these definitions provided a static picture of a moving target that amounted to a mostly illusory limitation on the investor’s broad adaptation right. Notwithstanding these so-called concessions, the bill effectively assigned most follow-on applications to any initial investor whose dynamic operations expand the range of potentially protectible matter with every update, ad infinitum.
The bill then introduced a “reasonable use” exception that was intended to benefit the nonprofit user communities, especially researchers and educators,29 and that conveyed a sense of similarity to the “fair-use” exception in copyright law.30 Once again, this became largely illusory on closer analysis, because under the proposed bill, the
very facts, data, and information that copyright law exclude themselves became the objects of protection, and there were no other significant exceptions. Hence, virtually every customary or traditional use of facts or data compiled by others that copyright law would presumably have allowed scientists, researchers, or other nonprofit entities to make in the past now becomes a prima facie instance of infringement under H.R. 354. These users would in effect have either to license such uses or be prepared to seek judicial relief for “reasonableness” on a continuing basis. Because university administrators dislike litigation and are risk averse by nature, and this provision put the burden of showing reasonableness on them, there is reason to expect a chilling effect on customary uses by these institutions of data heretofore in the public domain.
The bill recognized an “independent creation” norm, which presumably exempts any database, however similar to an existing database that was not the fruit of “copying.”31 This provision codified a fundamental norm of copyright law, and the European Commission made much of a similar norm in justifying its own regulatory scheme. In reality, this “independent creation” principle would produce unintended and socially deleterious consequences when transposed to the database milieu precisely because many of the most complex and important databases are inherently not able to be independently regenerated. Sometimes the database cannot be reconstituted because the underlying phenomena are one-time events, as often occurs in the observational sciences. In other instances, key components of a complex database can no longer be reconstituted with certainty at a later date. Any independently regenerated database suffering from these defects would necessarily contain gaps that made it inherently less reliable than its predecessor.
These problems point to a more general phenomenon that affects competition in large or complex databases. Even when, in principle, such databases could be reconstituted from scratch, the high costs of doing so—as compared with the add-on costs of existing producers—will tend to make the second-comer’s costs so high as to constitute a barrier to entry. Meanwhile, the first-comer’s comparative advantage from already owning a large collection that is too costly to reconstitute will only grow more formidable over time, an economic reality that progressively strengthens the barriers to entry and tends to reinforce (and, indeed, to explain) the predominance of sole-source data suppliers in the marketplace.
Government-generated data would have remained excluded, in principle, from protection, in keeping with current U.S. practice,32 which differs from E.U. practice in this important respect. However, there is considerable controversy surrounding the degree of protection to be afforded government-generated data that subsequently become embodied in value-adding, privately funded databases. All parties agree that a private, value-adding compiler should obtain whatever degree of protection is elsewhere provided, notwithstanding the incorporation of government-generated data. The issue concerns the rights and abilities of third parties to continue to access the original, government-generated data sets. The proponents of H.R. 354 have been little inclined to accept measures seeking to preserve access to the original data sets, despite pressures in this direction.
H.R. 354 imposed no restrictions whatsoever on licensing agreements, including agreements that might overrule the few exceptions otherwise allowed by the bill.33 Despite constant remonstrations from opponents about the need to regulate licensing in a variety of circumstances—and especially with respect to sole-source providers—the bill itself did not budge in this direction. On the contrary, new provisions added to H.R. 354 in 2000 would have set up measures that would prohibit tampering with encryption devices (“anti-circumvention measures”) and with electronically embedded “watermarks” in a manner that paralleled the provisions adopted for online transmissions of copyrighted works under the DMCA. Because these provisions would have effectively secured a database against unauthorized access (and tended to create an additional “exclusive right of access” without expressly so declaring), they would only have added to the database owner’s market power to dictate contractual terms and conditions without regard to the public interest. These powers were further magnified by the imposition of criminal sanctions in addition to strong civil remedies for infringement.34
The one major concession that was made to the opponents’ constitutional arguments concerned the question of duration. As previously noted, the E.C. Directive allows for perpetual protection of the whole database so long as any substantial part of it is updated or maintained by virtue of a new and substantial investment, and the proponents’ early proposals in the United States echoed this provision. However, the U.S. Constitution clearly prescribes a limited term of duration for intellectual property rights,35 and the proponents finally bowed to pressures from many directions by limiting the term of duration to 15 years.36
Any update to an existing database would have qualified for a new term of 15 years, but this protection would apply, at least in principle, only to the material added in the update. In practice, however, the inability to clearly separate old from new matter in complex databases, coupled with ambiguous language concerning the scope of protection against harm to “likely, expected, or planned” market segments, could still have left a loophole for an indefinite term of duration.
The Unfair Competition Model. The opponents’ bill, the Consumer and Investor Access to Information Act of 1999, H.R. 1858, was introduced by the House Commerce Committee in 1999, as a sign of good faith, in response to critics’ claims that the opponents’ coalition sought only to block the adoption of any database protection law. H.R. 1858 began with a definition of databases that is not appreciably narrower than that of H.R. 354, except for an express exclusion of traditional literary works that “tell a story, communicate a message,” and the like.37 In other words, it attempted to draw a clearer line of demarcation between the proposed database regime and copyright law, to reduce overlap or cumulative protection as might occur under H.R. 354.
The operative protective language in H.R. 1858 was short and direct, but it relied on a series of contingent definitions that muddy the true scope of protection. Thus, the bill would prohibit anyone from selling or distributing to the public a database that is (1) “a duplicate of another database . . . collected and organized by another person or entity,” and (2) “is sold or distributed in commerce in competition with that other database.”38 The bill then defined a prohibited duplicate as a database that is “substantially the same as such other database, as a result of the extraction of information from such other database.”39
Here, in other words, liability would attach only for a wholesale duplication of a preexisting database that results in a substantially identical end product. However, this basic misappropriation approach became further subject to both expansionist and limiting thrusts. Expanding the potential for liability was a proviso added to the definition of a protectible database that treats “any discrete sections [of a protected database] containing a large number of discrete items of information” as a separably identifiable database entitled to protection in its own right.40 The bill would thus have codified a surprisingly broad prohibition of follow-on applications that make use of discrete segments of preexisting databases, subject to the limitations set out below.
A second protectionist thrust resulted from the lack of any duration clause whatsoever, with the prohibition against wholesale duplication—subject to limitations set out below—conceivably lasting forever. This perpetual threat of liability would have attached to wholesale duplication of even a discrete segment of a preexisting database, if the other criteria for liability were met.
These powerfully protective provisions, put into H.R. 1858 at an early stage to weaken support for H.R. 354, were offset to some degree by other express limitations on liability and by a codified set of misuse standards to help regulate licensing. To understand these further limitations, one should recall that liability even for wholesale duplication of all, or a discrete segment, of a protected database would not attach unless the unauthorized copy were sold or distributed in commerce and “in competition with” the protected database.41 The term “in competition with,” when used in connection with a sale or distribution to the public, was then defined to mean that the unauthorized duplication “displaces substantial sales or licenses likely to accrue from the original database” and
“significantly threatens. . .[the first-comer’s] opportunity to recover a reasonable return on the investment” in the duplicated database.42 Both prongs had to be met before liability would attach.
It follows that even a wholesale duplication that was not commercially exploited or that did not substantially decrease expected revenues (as might occur from, for example, nonprofit scientific research activities) could presumably have escaped liability in appropriate circumstances. Similarly, a follow-on commercial product that made use of data from a protected database might have escaped liability if it were sold in a distant market segment or required substantial independent investment.
H.R. 1858 then further reduced the potential scope of liability by imposing a set of well-defined exceptions and by limiting enforcement to actions brought by the Federal Trade Commission.43 There were express exceptions comparable to those under H.R. 354 for news reporting, law enforcement activities, intelligence agencies, online stockbrokers, and online service providers.44 There was also an express exception for nonprofit scientific, educational, or research activities,45 in case any such uses were thought to escape other definitions that limit liability to unauthorized uses in competition with the first-comer. Still other provisions clarified that the protection of government-generated data or of legal materials in value-adding embodiments would remain contingent upon arrangements that facilitate continued public access to the original data sets or materials.46 A blanket exclusion of protection for “any individual idea, fact, procedure, system, method of operation, concept, principle or discovery” wisely attempted to provide a line of demarcation with patent law and to ward off unintended protectionist consequences in this direction.47
Another important set of safeguards emerged from the drafters’ real concerns about potential misuses of even this so-called “minimalist” form of protection. These concerns were expressed in a provision that expressly denied liability in any case where the protected party “misuses the protection” that H.R. 1858 would afford. A related provision then elaborated a detailed list of standards that courts could use as guidelines to determine whether an instance of misuse had occurred.48 These guidelines or standards would have greatly clarified the line between acceptable and unacceptable licensing conditions, and if enacted, they could have made a major contribution to the doctrine of misuse as applied to the licensing of other intellectual property rights as well.
In summary, the underlying purpose of H.R. 1858 was to prohibit wholesale duplication of a database as a form of unfair competition. It thus set out to create a minimalist liability rule that would prohibit market-destructive conduct rather than an exclusive property right as such, and in this sense, it initially posed a strong contrast to H.R. 354. Over time, however, different iterations of the bill, designed to win supporters away from H.R. 354, made H.R. 1858 surprisingly protectionist—especially in view of its de facto derivative work right.