2
Incentives and Disincentives Affecting the Availability and Use of Scientific and Technical Databases

As noted in Chapter 1, scientific and technical (S&T) databases are not just a by-product of research, but also an essential foundation on which progress in science builds. Increasingly, too, databases are a research tool that can be sold or licensed and used as the input for new products, the creation of customized derivative databases, and innovations to broaden the scope and increase the pace of discovery. The incentives and disincentives to provide databases reflect all these uses, as well as financial motives. This chapter points out the divergent objectives of organizations that produce and distribute S&T databases, and it outlines some of the economic factors affecting access to such data. In addition, it elaborates on the committee's conclusion that any new legislation that would change the status quo must take into account how it would alter the incentives to produce both original and derivative databases, how it would affect the dissemination and use of databases (especially regarding whether access would be exclusive or unrestricted, particularly for the scientific community), and what the unintended consequences might be for scientific inquiry and other public-interest uses.

DIVERGENT OBJECTIVES OF ORGANIZATIONS THAT PRODUCE AND DISTRIBUTE SCIENTIFIC AND TECHNICAL DATABASES

Original and derivative S&T databases are produced by government, not-for-profit/academic, and for-profit organizations (see Table 1.1 in Chapter 1 for selected examples). Whereas individual researchers typically are motivated by curiosity, a desire to contribute to the knowledge base, and an opportunity to



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 40
A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases 2 Incentives and Disincentives Affecting the Availability and Use of Scientific and Technical Databases As noted in Chapter 1, scientific and technical (S&T) databases are not just a by-product of research, but also an essential foundation on which progress in science builds. Increasingly, too, databases are a research tool that can be sold or licensed and used as the input for new products, the creation of customized derivative databases, and innovations to broaden the scope and increase the pace of discovery. The incentives and disincentives to provide databases reflect all these uses, as well as financial motives. This chapter points out the divergent objectives of organizations that produce and distribute S&T databases, and it outlines some of the economic factors affecting access to such data. In addition, it elaborates on the committee's conclusion that any new legislation that would change the status quo must take into account how it would alter the incentives to produce both original and derivative databases, how it would affect the dissemination and use of databases (especially regarding whether access would be exclusive or unrestricted, particularly for the scientific community), and what the unintended consequences might be for scientific inquiry and other public-interest uses. DIVERGENT OBJECTIVES OF ORGANIZATIONS THAT PRODUCE AND DISTRIBUTE SCIENTIFIC AND TECHNICAL DATABASES Original and derivative S&T databases are produced by government, not-for-profit/academic, and for-profit organizations (see Table 1.1 in Chapter 1 for selected examples). Whereas individual researchers typically are motivated by curiosity, a desire to contribute to the knowledge base, and an opportunity to

OCR for page 40
A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases influence the thinking of others, their employers also have in mind a combination of organizational mission and funding considerations. Table 2.1 summarizes typical objectives of government, not-for-profit/academic, and for-profit organizations in producing and disseminating S&T databases and the different weight each places on mission versus financial considerations. In government and not-for-profit research organizations, including universities, basic research institutes, and national laboratories, the advancement of knowledge as an intrinsic good and in the service of national goals motivates the production and distribution of S&T databases; exploiting data for financial gain is subordinate to fulfilling public-interest objectives. The data products of not-for-profit and government organizations are judged primarily by criteria that are not directly profit related, such as their value to end users, their potential value in advancing a field, their ability to enhance the status of an institution and its research or educational capabilities, and similar objectives typically associated with public-interest or public good activities related, for example, to improving knowledge of disease factors or interdependencies within ecosystems. Of course, not all not-for-profit institutions behave in this generalized way. At one end of the spectrum are organizations that, especially if they are fully subsidized, distribute their data freely on the Internet without any effort at cost recovery. Many individual researchers or academics certainly operate this way. At the other end are not-for-profit institutions that seek to maximize the revenues from their S&T databases, subject to the constraints of their tax-exempt status, to finance future R&D and database development in order to remain at the forefront of their respective fields. Most not-for-profits, however, fall somewhere in the middle in trying to reconcile their public-interest mission, on the one hand, with the need to generate sufficient operating revenues, on the other. Universities present a good example of this dichotomy, with the trend in recent years toward greater cost recovery1 and greater attention to the protection and exploitation of their intellectual property.2 In contrast, the for-profit sector seeks mainly to generate profit for management and shareholders. Of course, market success also depends on creating value for users—otherwise, the data products would not be successful. High-value can translate to high prices, and such pricing inevitably restricts access. Nevertheless, there are exceptions to the rule here as well, since not all for-profit entities attempt to charge as much as they could for their proprietary databases, perhaps 1   For a discussion of the trend in academic institutions to protect their research results as intellectual property, see Kenneth W. Dam (1998), "Intellectual Property and the Academic Enterprise," John M. Olin Law & Economics Working Paper No. 68 (2d Series), University of Chicago Law School. 2   Intellectual Property Task Force (1999), "Intellectual Property and New Media Technologies: A Framework for Policy Development and AAU Institutions," Association of American Universities, May 13, Washington, D.C., 31 pp., available online at <www.tulane.edu/~aau/AAUPolicy.html>.

OCR for page 40
A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases TABLE 2.1 Typical Objectives of Organizations That Produce and Disseminate S&T Databases   Federal Government Primary motivations Serve national goals, including promoting societal well-being and supporting basic research and other public good interests Goals of S&T data collection and database development Support agency mission; undertake basic research as a basis for economic growth and productivity and for public well-being Goals of S&T database distribution Maximize the downstream benefits of basic research; promote availability and use of research results in both public and private-sectors Access to data Open exchange of information encouraged by federal policy Interest in protecting the databases produced Very low; any restrictions generally seen as a problem, with few benefits NOTE: This table provides broad generalizations regarding the organizations of the three sectors. The committee recognizes that many exceptions and nuances exist, as discussed in this chapter. instead using such databases as a marketing tool for other products or services, or choosing revenue-generating methods such as advertising as an alternative to charging users for access to data. Such exceptions, however, usually are seen in non-scientific database markets that have large user clienteles.3 3   See generally Computer Science and Telecommunications Board, National Research Council (2000), The Digital Dilemma: Intellectual Property in the Information Age, National Academy Press, Washington, D.C., in press.

OCR for page 40
A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases Not-for-Profit/Academic For-Profit Fulfill mission, including furthering research, education, creation of knowledge, and discovery; remain economically viable Achieve corporate objectives, including profit making and growth, and ensure shareholder and customer satisfaction Advance knowledge by conducting new research and by validating and building on the research of others; educate future researchers; contribute to basis for producing social benefits; build reputation and status of researchers and their institutions Support development of new or improved products or services; develop databases for direct sale or lease as products or as services in support of other products or services Encourage open sharing of ideas; enable existing data to be reused for discovery of new knowledge; invite review and validation of research results; facilitate use of research results for product development by S&T community and commercial concerns; recover costs or generate revenue in support of mission Disseminate data to protect competitive advantage when databases are used for development of other products or are themselves products or services; disseminate via sale or license to generate revenue, enhance customer base and market position, gain competitive advantage, achieve profits, or recover costs Open, with data and ideas shared after results have been published Internal and confidential, or available/marketed externally at a cost set by the organization Moderate; ranges from very low (for fully subsidized databases) to moderate (when cost recovery is necessary) to high (when data are a source of revenue required to support mission) Very high; databases regarded as investments to be protected whether they are used in product development or are themselves products or services to be sold SCIENTIFIC AND TECHNICAL DATABASE COSTS, PRICING, AND ACCESS Despite their differences in mission and motivation, organizations in all three sectors are constrained by financial responsibilities: federal government agencies must justify their expenditures to Congress; not-for-profit entities must make their organizations at least viable (with any excess of income over expenses reinvested in the organization); and commercial firms must answer to shareholders. All organizations therefore give careful consideration both to the costs

OCR for page 40
A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases associated with database production and distribution and to the potential for generating revenue. Production and Distribution Costs The costs of databases can be categorized as production costs and distribution costs. Database production includes both data collection and database preparation. The cost of data collection can be very high and varies with the size, complexity, and difficulty of obtaining the data. It includes the costs of the observational or experimental instrumentation, deployment and maintenance of that instrumentation for the lifetime of the data collection project, and related infrastructure costs such as initial data storage. Data acquisition costs typically represent the bulk of the costs of a database. In major data collection projects such as those involving remote sensing satellites (e.g., National Oceanic and Atmospheric Administration geostationary or polar-orbiting spacecraft), or in large physics experiments requiring specialized facilities (e.g., high-energy particle accelerators), the costs can easily be in the hundreds of millions, or even billions, of dollars. Database preparation costs are those associated with preparing a database for dissemination, including ensuring data quality and accuracy, and enhancing the utility of the data for users. The cost of these efforts can vary from modest to large, depending on the nature of the database and its intended use. For example, the cost of the labor for abstracting or indexing databases can make their production expensive. Distribution costs include the cost of making, sending, and billing for physical copies; any additional transaction costs such as those for licensing and related administrative activities; and user-specific costs such as those for database maintenance and specialized handling. Distribution costs tend to be much lower than database production costs, particularly if the Internet is the medium of dissemination and little effort is devoted to marketing or user assistance. In some cases, the distribution costs are simply the costs of copying. Producing databases for online distribution is more a matter of potential market opportunity and incentives, rather than of potential cost savings. Customers have come to expect online distribution, particularly in the S&T data market in which nearly all users are now connected to networks and are technically sophisticated. For such users, online access can improve a database's accessibility, functionality, and utility. From the producer's perspective, it opens new market opportunities or broadens existing ones, but it is more likely to shift costs than to reduce them. The concern over adequate protection of a rights holder's database products is exacerbated in the online milieu by factors such as the possibility of unauthorized access, duplication of content, and mass redistribution. As discussed in Chapter 3, however, making the information available online also can reduce the opportunities for misappropriation of a database by preventing the user from accessing all the underlying data.

OCR for page 40
A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases Pricing and Access Because for most S&T databases the production costs are high relative to the subsequent distribution costs, there is a trade-off between efficient access and cost recovery. An economist would define an efficient-access price for a database as one that is equal to the cost of making the database accessible to one additional (the marginal) user. A higher price inefficiently excludes potential users, whether the data activity is in the public or the private-sector. The users' welfare could be enhanced without increasing the burden on taxpayers if users were allowed to buy access to the data at the marginal cost of dissemination.4 An efficient-access price will almost never generate revenue that matches the database production costs.5 Instead, the database must be subsidized in some way. Cost recovery has an equity justification, namely that the database is paid for' by the users instead of being subsidized out of general revenue. It also subjects the database to a (weak) market test of whether the value to users exceeds the cost. If not, then revenue cannot exceed costs. Efficient-access pricing is only feasible in conjunction with some other source of revenue or subsidy, such as taxpayer support (in the case of federal agencies) or endowments or industry or government contract support (in the case of not-for-profits). Federal agencies are required by law to provide efficient access, limiting charges to no more than the incremental cost of data dissemination, and not including the average cost of producing the database.6 Frequently, the federal agency simply charges for the cost of reproduction and distribution, which can be zero if the database is distributed online. 7 4   Such a price is efficient whether the database is sold or licensed to end users or those using the data in derivative products. Competition among vendors in the derivative market will keep the price low, thus transferring much of the benefit of the underlying data to the consumer. In both cases, efficient access could be preserved with higher revenue if the provider could distinguish users with a high willingness to pay for access and use from those with a low willingness to pay. 5   For example, in remote sensing systems the development and maintenance costs of the system are very high per user, but the cost of dissemination is trivial in comparison, especially when done online. Information goods such as databases share the essential feature of public goods, namely that use of the good is nonrivalrous. This means that after the first user is served, the marginal cost of allowing access by each additional user is minimal, and the average cost per user is declining (there are economies of scale). Competitive theory does not extend to pure public goods, and the marketplace will not deliver them efficiently, if at all. In order to cover costs, the price must exceed the cost of supplying the marginal user. 6   See Office of Management and Budget (1993), Circular A-130, ''Management of Federal Information Resources," U.S. Government Printing Office, Washington, D.C., regarding federal government information dissemination practices and policies, which were codified in the Paperwork Reduction Act of 1995, P.L. 104-13, which amended 44 United States Code Chapter 35, effective October 1, 1995. 7   For a discussion of pricing publicly funded S&T data, see National Research Council (1997), Bits of Power: Issues in Global Access to Scientific Data, National Academy Press, Washington, D.C., pp. 124-126. On the other hand, providing access can be more costly than it seems, and users

OCR for page 40
A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases In contrast to government agencies and some not-for-profit entities, profit-motivated firms must, at a minimum, recover their costs. Of course, for-profit entities seek to generate additional income, sometimes using sophisticated pricing strategies, such as segmenting the market with differentiated products and varying the price according to volume, convenience of delivery, customer support services, documentation, scope of subject matter, geographic coverage, and so on.8 (See Box 2.1 for examples of marketing models for private-sector S&T data products and services.) Unless commercial providers are much more efficient than government providers, the profit motive leads to higher prices for access than a government agency would ideally charge under its mandate to provide wide access. Whether a commercial database will be made available to public users (such as university laboratories) on better terms than to commercial firms depends on whether there is competition between those two sectors. If the database contains information whose value to commercial buyers is reduced by academic use, then the vendor will not sell it at preferential rates to academics. Stronger Statutory Protection and the Incentives for Investment There is a natural link between cost recovery and protection of databases. 9 If databases can be copied without any legal constraints or otherwise freely acquired by users or competitors, then the rights holders will not recover their costs and hence will have no incentive to produce databases.10 Stronger statutory protection might help prevent unauthorized copying, particularly in unprotected digital formats, and thus promote cost recovery and improve profit margins. In this way it could provide incentives for the creation of new databases in the private-sector, especially by commercial entities. However, although this motivation sounds compelling, it should be tempered by the realization, elaborated in     must sometimes invoke the Freedom of Information Act to obtain data from the federal government. One interpretation is that the cost of providing the information includes technicians' time, which cannot be disentangled from other activities and is hard to pass on to the user as a cost of dissemination. 8   See National Research Council (1997), Bits of Power, note 7, p. 125. 9   The trade-off between access and cost recovery is expressed by Richard Gilbert in Chapter 4 of the committee's online workshop Proceedings as a trade-off between access by users and protection of rights holders or vendors, where protection refers to market exclusivity that comes from intellectual property rights. See National Research Council (1999), Proceedings of the Workshop on Promoting Access to Scientific and Technical Data for the Public Interest: An Assessment of Policy Options, National Academy Press, Washington, D.C., <http://www.nap.edu>. 10   Some databases, such as consumer-oriented catalogs, are a way to sell products. These databases are not the committee's concern, however, because firms already have every incentive to provide them and they do not require statutory protection. The committee's concern here is with databases for which the pricing for access is the only source of revenue, e.g., most S&T databases.

OCR for page 40
A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases Chapter 3, that databases already benefit from many forms of protection that permit rights holders to recover their costs. And even when balanced, new legislation could have unintended, negative consequences that need to be avoided, as discussed in Chapter 4. Moreover, the participants in the committee's January 1999 workshop did not report any instances in which the current lack of statutory protection had dissuaded them from investing in promising new projects. While most of the not-for-profit and commercial-sector participants noted that their organizations' data had been copied on occasion, such copying was written off as an unavoidable loss and part of doing business (similar to shoplifting in retail stores). Commercial-sector participants noted that in repeated instances of suspected infringement, the practice of issuing "cease and desist" letters normally was effective, but when not, they felt appropriately obliged to terminate the infringer's further access (i.e., to their password-protected online databases). 11 In no case, however, had copying of their data prevented the companies represented at the workshop from earning a reasonable profit, over and above full cost recovery.12 Mounting Pressures on Government Producers and Distributors of Scientific and Technical Databases for Cost Recovery Historically, the generation of primary S&T data, in such diverse fields as meteorology, astronomy, and high-energy physics and in the public census, has been funded by governments either directly through specific contracts or indirectly through the sponsorship of academic research. In the United States, the resulting databases of the federal government have been placed in the public domain and have provided the basis for subsequent research. In many cases in which the government produces databases, it distributes the raw or partially processed data as a public good and lets not-for-profit organizations and commercial firms customize the data for special market segments or individual users. This achieves a better balance between requirements for cost recovery and the advantages to the public of efficient-access pricing. Under this approach, taxpayers subsidize the collection and preservation of the raw or partially processed data, but users pay the entire additional cost of customizing the 11   Several participants reported that pirated editions of their databases were being sold in developing countries. This kind of wholesale copying is illegal under domestic and international copyright and unfair competition laws. The ongoing failure of other nations to respect and enforce existing intellectual property law is largely a concern of new World Trade Organization rules known as the TRIPS agreement (see note 5 in Chapter 3). Increased worldwide protection of databases would require a new treaty, and its effectiveness could depend on its integration into the TRIPS agreement. 12   An extensive background report prepared in advance of the workshop also failed to uncover any negative consequences for companies. See Stephen M. Maurer, Appendix C, in the committee's online Proceedings , note 9.

OCR for page 40
A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases Box 2.1 Marketing Models for Private-Sector S&T Data Products and Services Traditional Marketing Methods Many of the business plans for creating and distributing databases are similar to those used in the software industry. Referred to here as "traditional" methods, they are usually tailored to the perceived market's size and wealth: Mass-market products. The typical commercial database directory lists thousands of low-cost, mass-market databases with prices ranging from a few hundred to tens of thousands of dollars each. This approach extends to scientific and engineering products. Examples include POISINDEX (a medical database that links many thousands of poisons to treatment protocols and is aimed at emergency room personnel), Science and Technology Network (STN) International (online databases of the physical and mechanical properties of thousands of materials), and the Institute for Scientific Information, Inc. (see Table 1.1 in Chapter 1) Specialty-market products. Many database producers concentrate on developing products aimed at relatively small, high-value markets. Examples from Table 1.1 include Molecular Applications Group (software for storing, mining, and visualizing genomic data) and TASC (custom weather data for aviation, agribusiness, and power companies). Not surprisingly, such products tend to be costly—indeed, large pharmaceutical and biotechnology firms may pay millions and even tens of millions of dollars per year in licensing fees for access. Sales contracts normally include detailed confidentiality provisions Custom products. Large pharmaceutical houses sometimes ask bioinformatic companies to provide exclusive access to a particular database. For example, in 1994 SmithKline Beecham agreed to pay $125 million to Human Genome Sciences, Inc. for exclusive rights to one such proprietary database containing EST (gene fragment) information.1 1   Jon Cohen (1997), "The Genomics Gamble," Science, Vol. 275, February 7, pp. 767-772 databases, which are prepared and sold mostly by commercial firms. There are problems with both systems of finance, however. As mentioned above, subsidies avoid the market test of whether the willingness to pay for the data exceeds their cost. Commercial provision at higher prices must face this market test, but the higher prices inefficiently restrict access. In the evidence collected by the committee, there was no suggestion that federal agencies were collecting too much raw data, so it is reasonable to price the raw data for efficient access, relying on

OCR for page 40
A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases New Marketing Methods In addition to taking a traditional approach to marketing databases, vendors have also devised effective alternatives for selling their products. The following sampling of new methods is provided by way of illustration: Bundling. product-linked databases are frequently sold as part of package that includes other products. For example, medical and scientific instrument makers may bundle their products with relevant nuclear physics data. Databases are also included in the price of some proprietary search tools. Use of market makers. Many companies create elaborate databases to help users find and use their products. For example, semiconductor chip manufacturers commonly prepare elaborate "cookbooks," "libraries," and design tools to help engineers use their products. Similarly, some online bibliographic services are made available to consumers at little or no cost. When a user's research is successful, such services offer to sell reprints at costs ranging from $10 to $20 per article. Migration of old products to new media. Some publishers have been able to create new digitized versions of databases as an outgrowth of existing print products. Electronic versions of journals and other print-based resources are probably the classic example. Current vendors include Elsevier Science (Science Direct), John Wiley & Sons, Springer-Verlag, and Academic Press. Extensions of the concept, which would link traditional articles to online data sets, are already being developed and implemented. SOURCE: Commissioned paper by Stephen M. Maurer, "Raw Knowledge: Protecting Technical Databases for Science and Industry," Appendix C in National Research Council (1999), Proceedings of the Workshop on Promoting Access to Scientific and Technical Data for the Public Interest: An Assessment of Policy Options, National Academy Press, Washington, D.C., <http://www.nap.edu>. taxpayer subsidies to the extent possible. Of course, taxing for general revenue also involves some inefficiency.13 13   Charging an access price higher than the efficient-access price can be thought of as funding the database through an excise tax. However, basic principles of optimal taxation suggest that income taxes are less distorting than isolated excise taxes. See, for example, Richard Tresch (1981), Public Finance: A Normative Theory, Irwin-Dorsey Limited (Georgetown, Ontario), p. 320, who says:

OCR for page 40
A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases However, because of mounting budgetary pressures generally, and increasing costs of public information management specifically, government database providers worldwide have recently been coming under pressure to recover all costs, rather than only the cost of distribution. This pressure toward full cost recovery results both from the rising costs associated with the rapidly expanding rate of data collection and from foreign pressures. European governments, for example, are turning increasingly to a full cost recovery approach for their S&T database production and dissemination activities. The E.U. Database Directive puts considerable pressure on non-E.U. countries, including the United States, to do the same.14 Although existing law in the United States precludes adoption of such restrictions for government data, the enactment of a strong new database protection statute for private-sector databases could stimulate further interest in privatizing U.S. government database dissemination activities. By making databases more profitable, new protectionist legislation could shift responsibility for their creation from the public-sector to the private-sector. The social harm of such a shift would be an increase in the price for access, especially for highly specialized databases—such as some S&T databases—with a comparatively small market. A possible social benefit would be that the private-sector would be subjected to a weak market test of whether the value of the databases exceeded their cost. As noted above, however, most data collected by public agencies are raw data whose full potential value has not yet been realized, and in many cases, user-oriented transformations of the data are already in the hands of the private and nonprofit organizations that might have a better sense of the user market. Pressures for cost recovery also arise because the benefits that accrue to consumers and the broader society under the efficient-access pricing model are harder for legislators and administration policy makers to see (and measure) than those that accrue as reduced tax burdens under the cost-recovery model, or those that accrue as increased profits under the commercial model. Thus, science agencies in the United States are increasingly turning to the private-sector, to both not-for-profit and commercial entities, in outsourcing government S&T database dissemination activities, or even to purchase data from commercial suppliers. For example, in order to promote private-sector investment and development of space technologies and applications, the Commercial Space Act of 1998 encourages NASA—an agency engaged to a substantial degree in basic research activities—to purchase space and Earth science data products and services from the private -sector, and to treat data as a commercial commodity under federal procurement regulations. The potential negative effects of this trend are discussed in some detail in Chapter 4.     " Income taxes are held in high regard by many public sector economists .... (They are) seen as being reasonably efficient, based on large empirical literature which indicates that the supply of labor and capital are both extremely price inelastic." 14   See Stephen M. Maurer (1999), "Raw Knowledge: Protecting Technical Databases for Science and Industry," Appendix C in the online workshop Proceedings, note 9.