National Academies Press: OpenBook

A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases (1999)

Chapter: 1 Importance and Use of Scientific and Technical Databases

« Previous: Summary
Suggested Citation:"1 Importance and Use of Scientific and Technical Databases." National Research Council. 1999. A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases. Washington, DC: The National Academies Press. doi: 10.17226/9692.
×

1
Importance and Use of Scientific and Technical Databases

Modern technology has propelled us into the information age, making it possible to generate and record vast quantities of new data.1 Advances in computing and communications technologies and the development of digital networks have revolutionized the manner in which data are stored, communicated, and manipulated. Databases, and uses to which they can be put, have become increasingly valuable commodities.

The now-common practice of downloading material from online databases has made it easy for researchers and other users to acquire data, which frequently have been produced with considerable investments of time, money, and other resources. Government agencies and most government contractors or grantees in the United States (though not in many other countries) usually make their data, produced at taxpayer expense, available at no cost or for the cost of reproduction and dissemination. For-profit and not-for-profit database producers (other than most government contractors and grantees) typically charge for access to and use of their data through subscriptions, licensing agreements, and individual sales.

Currently many for-profit and not-for-profit database producers are concerned about the possibility that significant portions of their databases will be copied or used in substantial part by others to create "new" derivative databases. If an identical or substantially similar database is then either redisseminated broadly or sold and used in direct competition with the original rights holder's database, the rights holder's revenues will be undermined, or in extreme cases,

1  

Box 1.1 provides definitions of data and of several other key terms used in this report.

Suggested Citation:"1 Importance and Use of Scientific and Technical Databases." National Research Council. 1999. A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases. Washington, DC: The National Academies Press. doi: 10.17226/9692.
×

Box 1.1 Definitions of Key Terms Used in This Report

Data are facts, numbers, letters, and symbols that describe an object, idea, condition, situation, or other factors. A data element is the smallest unit of information to which reference is made. This report is concerned primarily with digital data, although a large portion of raw data is recorded as analog data, which also can be digitized. For purposes of this report the terms data and facts are treated interchangeably, as is the case in legal contexts.

Data in a database may be characterized as predominantly word oriented (e.g., as in a text, bibliography, directory, dictionary), numeric (e.g., properties, statistics, experimental values), image (e.g., fixed or moving video, such as a film of microbes under magnification or time-lapse photography of a flower opening), or sound (e.g., a sound recording of a tornado or a fire). Word oriented, numeric, image, and sound databases are processed by different types of software (text or word processing, data processing, image processing, and sound processing).

Data can also be referred to as raw, processed, or verified. Raw data consist of original observations, such as those collected by satellite and beamed back to Earth, or initial experimental results, such as laboratory test data. After they are collected, raw data can be processed or refined in many different ways. Processing usually makes data more usable, ordered, or simplified, thus increasing their intelligibility. Verified data are data whose quality and accuracy have been assured. For experimental results, verification signifies that the data have been shown to be reproducible in a test or experiment that repeats the original. For observational data, verification means that the data have been compared with other data whose quality is known or that the instrument with which they were obtained has been properly calibrated and tested.

Digital data may be processed or stored on various types of media, including magnetic (RAM, hard drive, diskettes, tapes) and optical (CD-ROM, DVD) media. Data can be made accessible either through portable media or, increasingly, online.

A database is a collection of related data and information—generally numeric, word oriented, sound, and/or image—organized to permit search and retrieval or processing and reorganizing. A data set is a collection of similar and related data records or data points. Many databases are a resource from which specific data points, facts, or textual information are extracted for use in building a derivative database or data product. A derivative database, also called a value-added or transformative database, is built from one or more preexisting database(s) and frequently includes extractions from multiple databases, as well as original data.

A database producer acquires data in raw, reduced, or otherwise processed from—either directly, through experimentation or observation, or indirectly, from one or more organizations or preexisting databases—for inclusion in a database that the database producer is generating. Such database creators—sometimes known as database publishers or originators but for the purpose of this report referred to as database producers—traditionally are the rights holders of the intellectual property rights in the databases.

Suggested Citation:"1 Importance and Use of Scientific and Technical Databases." National Research Council. 1999. A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases. Washington, DC: The National Academies Press. doi: 10.17226/9692.
×

In general, database production covers all aspects of preparation, processing, and maintenance; development of software for search, retrieval, and manipulation; and documentation of the software and database features and functions prior to distribution of the database by a vendor. Among the wide variety of functions encompassed by database production, in addition to data acquisition, are data reduction (where needed), formatting, enhancing, expanding, merging with other data or data records, categorizing, classifying, indexing, abstracting, tagging, flagging, coding, sorting/rearranging, putting into tabular form, creating visual representations, updating, and putting into searchable and retrievable form for and use and manipulation by users.

A database vendor (variously known as a distributor, online host (mostly in Europe and the United Kingdom), disseminator, or provider) sells, leases, or licenses digitized versions of a database on optical disks (e.g., CD-ROM, DVD), floppy disks, tapes, or downloadable complete databases. Many databases, particularly textual ones, are also based on or provided as hand-copy paper publications. A database producer organization may also serve as a database vendor if it both produces a database and provides online access directly to users or sells, leases, or licenses the database.

For the sake of simplicity, the term database dissemination or distribution as used in this report includes the concept of making databases available online.

The modifier scientific and technical designates the subject matter of the database content in the general areas covered in this report.

the rights holder will be put out of business. Besides being unfair to the rights holder, this actual or potential loss of revenue may create a disincentive to produce and then maintain databases, thus reducing the number of databases available to others. However, preventing database uses by others, or making access and subsequent use more expensive or difficult, may discourage socially useful applications of databases. The question is how to protect rights in databases while ensuring that factual data remain accessible for public-interest and other uses.

This report explores issues in the conundrum posed by the need to properly balance the rights of original database producers or rights holders and the rights of all the downstream users and competitors—with the principal focus on the balance of rights between the database rights holders and public-interest users such as researchers, educators, and librarians. In particular, the Committee for a Study on Promoting Access to Scientific and Technical Data for the Public Interest focuses on scientific and technical (S&T) data (with examples drawn primarily from the physical and biological sciences) as an essential consideration in reasoned attempts to balance competing interests in databases.

Suggested Citation:"1 Importance and Use of Scientific and Technical Databases." National Research Council. 1999. A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases. Washington, DC: The National Academies Press. doi: 10.17226/9692.
×

To broaden the perspective of and enhance cooperation among the various competing interests, and to help ensure an efficient and effective outcome for all, the committee examines the following basic elements in the larger issue at hand:

  • Salient characteristics and the importance of S&T databases produced and used in research;

  • Impacts of computer technology on the production, distribution, and use of S&T databases;

  • Motivations of the various sectors involved in S&T research and the dissemination and use of research results;

  • Economic issues and incentives that influence the production, distribution, and use of S&T databases, and how these activities are interrelated;

  • Mechanisms currently in place for protecting these economic incentives; and

  • New legislation currently under consideration that would affect the production, dissemination, and use of S&T databases in a variety of ways.

To ensure the most successful outcome in the current debate over rights in databases, any new action must take account of and balance the legitimate interests of the various stakeholders, and must reflect awareness of how the broad public interest can best be served.

SCIENTIFIC AND TECHNICAL DATA AND THE CREATION OF NEW KNOWLEDGE

Factual data are both an essential resource for and a valuable output from scientific research. It is through the formation, communication, and use of facts and ideas that scientists conduct research. Throughout the history of science, new findings and ideas have been recorded and used as the basis for further scientific advances and for educating students.

Now, as a result of the near-complete digitization of data collection, manipulation, and dissemination over the past 30 years, almost every aspect of the natural world, human activity, and indeed every life form can be observed and captured in an electronic database.2 There is barely a sector of the economy that is not significantly engaged in the creation and exploitation of digital databases, and there are many—such as insurance, banking, or direct marketing—that are completely database dependent.

Certainly scientific and engineering research is no exception in its growing reliance on the creation and exploitation of electronic databases. The genetic sequence of each living organism is a natural database, transforming biological

2  

See Paul F. Uhlir (1995), "From Spacecraft to Statecraft: The Role of Earth Observation Satellites in the Development and Verification of International Environmental Protection Agreements," GIS Law, Vol. 2, p. 1.

Suggested Citation:"1 Importance and Use of Scientific and Technical Databases." National Research Council. 1999. A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases. Washington, DC: The National Academies Press. doi: 10.17226/9692.
×

research and applications over the past decade into a data-dependent enterprise and giving rise to the rapidly growing field of bioinformatics. Myriad data collection platforms, recording and storing information about our physical universe at an ever-increasing rate, are now integral to the study and understanding of the natural environment, from small ecological subsystems to planet-scale geophysical processes and beyond. Similarly, the engineering disciplines continually create databases about our constructed environment and new technical processes, which are endlessly updated and refined to fuel our technological progress and innovation system.

Basic scientific research drives most of the world's progress in the natural and social sciences. Basic, or fundamental, research may be defined as research that leads to new understanding of how nature works and how its many facets are interconnected.3 Society uses the fruits of such research to expand the world's base of knowledge and applies that knowledge in myriad ways to create wealth and to enhance the public welfare.

New scientific understanding and its applications are yielding benefits such as the following:

  • Improved diagnosis, pharmaceuticals, and treatments in medicine;

  • Better and higher-yield food production in agriculture;

  • New and improved materials for fabrication of manufactured objects, building materials, packaging, and special applications such as microelectronics;

  • Faster, cheaper, and safer transportation and communication;

  • Better means for energy production;

  • Improved ability to forecast environmental conditions and to manage natural resources; and

  • More powerful ways to explore all aspects of our universe, ranging from the finest subnuclear scale to the boundaries of the universe, and encompassing living organisms in all their variety.4

SCIENTIFIC AND TECHNICAL DATABASES AS A RESOURCE-THE CURRENT CONTEXT

The committee's January 1999 Workshop on Promoting Access to Scientific and Technical Data for the Public Interest: An Assessment of Policy Options,5

3  

See John A. Armstrong (1993), "Is Basic Research a Luxury Our Society Can No Longer Afford?" Karl Taylor Compton Lecture, Massachusetts Institute of Technology, October 13.

4  

National Research Council (1997), Bits of Power: Issues in Global Access to Scientific Data, National Academy Press, Washington, D.C., p. 18.

5  

See online, National Research Council (1999), Proceedings of the Workshop on Promoting Access to Scientific and Technical Data for the Public Interest: An Assessment of Policy Options, National Academy Press, Washington, D.C., <http://www.nap.edu>.

Suggested Citation:"1 Importance and Use of Scientific and Technical Databases." National Research Council. 1999. A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases. Washington, DC: The National Academies Press. doi: 10.17226/9692.
×

included presentations on and discussions of data activities in twelve selected organizations representing three broad sectors (government, not-for-profit, and commercial). The sample activities illustrated some of the depth and range of uses for S&T databases today (Table 1.1 provides a summary) and indicated also the complexity of the often overlapping relationships and interests of database users and producers.

The discussion below outlines basic aspects of current data activities, including collection and production of S&T data and databases, dissemination, and use, and it describes the roles that the three sectors play in the overall process. In contrasting past and current practices, it indicates how ongoing technological advances have contributed to increased capabilities for obtaining and using S&T data. This description, which provides essential background for the remainder of this report, draws on examples from the four general discipline areas—geographic and environmental, genomic, chemical and chemical engineering, and meteorological research and applications—focused on in the workshop.

Collection of Original Data and Production of New Databases

Sources of Primary Data and Uses

The process of scientific inquiry typically has begun with the formulation of a working hypothesis, based usually on limited observation and data, followed by experimentation designed to test the hypothesis. The experimentation results in the accumulation of new data used to confirm or refute the original hypothesis. Understanding of the natural and physical world has been advanced by researchers building on a growing base of knowledge that is continually being refined, tested, and augmented in the long-established approach to scientific inquiry known as the scientific method.

With the advent of digital technologies has come a dramatic increase in the pace and volume of data acquisition. Ongoing rapid advances in electronic technologies for computing and communications, experimentation, and observation ranging from high-frequency direct sampling to multispectral remote sensing have enabled dramatic increases in the quantities of data generated about the natural world at scales from the microcosm to the macrocosm. For instance, the volume of data on weather and climate stored in the National Climatic Data Center has increased 750-fold in the past two decades (Box 1.2). A pharmaceutical company that 5 years ago could characterize 100,000 compounds per year can now handle a million compounds in a week.

Although some of these data represent actual measurements, large quantities of data also are being generated through numerical simulations performed on supercomputers. Collection of new data is becoming increasingly automated as recording devices and instrumentation become more sophisticated and rapid. Moreover, many older paper-based data sets, such as historical U.S. Weather

Suggested Citation:"1 Importance and Use of Scientific and Technical Databases." National Research Council. 1999. A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases. Washington, DC: The National Academies Press. doi: 10.17226/9692.
×

TABLE 1.1 Examples of Different Types of S&T Database Activities Discussed in the January 1999 Workshop

Organization (Sector)

Information and Tools Provided

Data Sources

Geographic and Environmental

U.S. Geological Survey (USGS) (Government)

Geographic data: maps and map products

Data from other programs: biologic, geologic, hydrologic

USGS, other federal agencies, state and local governments, not-for-profit researchers, partnerships with private-sector

Long-Term Ecological Research (LTER)

Network Office (Not-for-Profit)

Site description database, integrated climate database, remotely sensed ecological data

Ecological researchers at distributed sites belonging to the LTER network

GeoSystems Global Corp. (Commercial)

Digital maps, MapQuest Web site, mapping services

From the public domain: government-produced maps (federal, state, local), digital geographic data, remotely sensed imagery

Other sources: commercial and other countries' maps, digital data, remotely sensed imagery, other published sources

Genomic

National Center for Biotechnology Information (Government)

GenBank: DNA and protein sequence data; Other genomic mapping databases; 3D protein structure database; bibliographic databases; software tools

Direct contributions from scientists; access to other databases from government, not-for-profit, other country sources

Center for Bioinformatics University of Pennsylvania (Not-for-Profit)

Specialized biological databases; software tools for integration of distributed heterogeneous databases

Proprietary and public-domain experimental data from academic researchers; manual processing and encoding of data from published literature; online molecular and cellular biology and genomic databases

Molecular Applications Group (Commercial)

Software for storing, mining, and visualizing genomic data; databases derived from public and private data and proprietary software

>150 online database sites, public and proprietary

Suggested Citation:"1 Importance and Use of Scientific and Technical Databases." National Research Council. 1999. A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases. Washington, DC: The National Academies Press. doi: 10.17226/9692.
×

Users

Dissemination Modes

USGS, other government agencies, commercial database providers and value adders, researchers, and the general public

Maps: hard copy (paper, plastic, film) and digital form; distributed by agency directly and through partnerships with private-sector, not-for-profit sector

Researchers

Internet, some tape and CD-ROM for portability

Commercial clients: large companies and consumers

Maps: hard copy and digital form Software products distributed via retail channels (CD-ROM) and directly to corporate customers Mapping services distributed via Internet

Research scientists in academic, government, commercial organizations

Internet access via Web servers and File Transfer Protocol (FTP)

Research scientists in academic, government, commercial organizations (U.S. and abroad)

Internet access Source code distributed directly

Research scientists in academic, government, commercial organizations (U.S. and abroad)

Some software products downloaded from the Web; others require on-site expert installation

Suggested Citation:"1 Importance and Use of Scientific and Technical Databases." National Research Council. 1999. A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases. Washington, DC: The National Academies Press. doi: 10.17226/9692.
×

Organization (Sector)

Information and Tools Provided

Data Sources

Chemical and Chemical Engineering

National Institute of Standards and Technology (NIST) Physical and Chemical Properties Division (Government)

Specialized chemistry and chemical engineering databases (extensively evaluated and documented)

Experimental results from published literature; experiments done specifically for data acquisition; published data evaluations; supplementary data deposits

Chemical Abstracts Service American Chemical Society (Not-for-Profit)

Chemical Abstracts: bibliographic database

Registry: registry of chemical substances

Software access tools

Journals, patents, books, proceedings, dissertations

Institute for Scientific Information (Commercial)

Bibliographic databases: citation indexes, tables of contents Information services Linkages to publishers' full text databases

Journals, books, proceedings (print and electronic format)

Meteorological

National Climatic Data Center (Government)

Climatological summaries from National Weather Service stations; historical long-term climatic databases

National Weather Service, World Meteorological Organization, NASA, bilateral agreements with other countries

Unidata Program, University Corporation for Atmospheric Research (Not-for-Profit)

Quasi-real-time atmospheric and related data Case study data sets Software tools

Public: National Weather Service, National Environmental Data Service

Private: network of lightning sensors, sensors in commercial aircraft

TASC (Commercial)

Real-time weather information

Public: National Weather Service—downlink directly from U.S. and international weather satellites, other observational sources

NOTE: Although the subject matter of this study included all S&T databases, the committee was able to choose only representative examples for discussion and analysis in the report. For instance, specific examples from the social sciences or the space sciences, among other disciplines, were not included.

Suggested Citation:"1 Importance and Use of Scientific and Technical Databases." National Research Council. 1999. A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases. Washington, DC: The National Academies Press. doi: 10.17226/9692.
×

Users

Dissemination Modes

Researchers in academic, government, commercial organizations (some databases used primarily by industrial users)

Variety of forms: hard copy publication, CD-ROM or floppy disk, Internet access; NIST distributes directly or via agreements with secondary distributors

Researchers in academic, government, commercial organizations; patent examiners; students

Electronic access, hard copy, CD-ROM

Academic, government lab, and corporate libraries; researchers in academic, government, commercial organizations

Diskette, CD-ROM, FTP files, Internet access, hard copy

Individuals, commercial clients, government agencies, engineering uses

Hard copy, microfiche, magnetic tape, disks, CD-ROM, FTP, Internet

Academic departments

Internet

News media (broadcast and cable TV), aviation, energy and power, agribusiness

Public and private data communication networks: satellite broadcasting services and Internet

Suggested Citation:"1 Importance and Use of Scientific and Technical Databases." National Research Council. 1999. A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases. Washington, DC: The National Academies Press. doi: 10.17226/9692.
×

Box 1.2 Example of Large-Scale Data Collection Activity by the Federal Government

Statistics from just one discipline in the natural sciences—atmospheric physics—illustrate the explosive growth in the size of some digital scientific and technical databases. The National Climatic Data Center (NCDC) is responsible for storing national, as well as some global, weather and climatic information. Once, most of these data came from human observations of the current state of the weather using simple and straightforward instrumentation, including such commonplace devices as thermometers, barometers, wind vanes, and rain gauges. The comparatively recent deployment of satellites, sophisticated Doppler radars, lightning de tection networks, automatic surface-observing platforms, and heavily instrumented buoys in the marine environment, all linked together through broadband, high-speed communication systems, has increased the types and volumes of data collected. The NCDC's storage requirements for these data have increased concomitantly by many orders of magnitude. In the period between 1980, when some of the high-resolution data were just beginning to be recorded, and 1994, when much of the Doppler radar and lightning data had yet to be generated, the volume of data stored at the NCDC increased from approximately 1 terabyte to 230 terabytes. By 1999, the NCDC's data holdings had grown to 750 terabytes and are projected to expand to more than 20 petabytes by 2014. These data are archived indefinitely and made available to the public.

SOURCE: Information provided by Gerald Barton, National Oceanic and Atmospheric Administration, Washington, D.C

Bureau observational records or U.S. census data, are being digitized and organized into electronically accessible databases. This shift from a data-poor to a data-rich research and education environment is occurring through the activities of a host of government agencies, universities, and other research establishments, both public and private, nationally and internationally, in diverse research disciplines.

In many cases data are being collected not to answer specific scientific questions, but rather to describe various physical and biological phenomena in ever-increasing detail. This broad-based acquisition of data, coupled with data mining and knowledge discovery6 and the broad review and analysis of information stored in large databases, is anticipated to reveal trends or patterns or to lead

6  

Data mining and knowledge discovery are related, frequently confused terms, as are data, information, and knowledge. In the context of electronic databases, the data stored therein remain as data until they are extracted (mined) and recompiled (put in a context), at which point they become information. After ''information is developed into a collection of related inferences, the data, now

Suggested Citation:"1 Importance and Use of Scientific and Technical Databases." National Research Council. 1999. A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases. Washington, DC: The National Academies Press. doi: 10.17226/9692.
×

to discoveries that will serve as a source of new hypotheses. The increasing use of databases as a research tool, whether in pursuit of new information or clues to unexpected relationships as starting points for conducting fundamental research, or for developing new commercial applications,7 relies on the production and availability of such databases as an initial step in the process.

In recent decades, and in most disciplines today, the federal government and federally funded research have played the major role in generating primary S&T data. Substantial amounts and varieties of data are created by thousands of federal government grantees doing basic research, either individually or in teams, and most often at universities and other not-for-profit research institutions. The National Science Foundation and the National Institutes of Health, which in FY 1999 funded over $2.6 billion8 and $11.8 billion9 in extramural grants, respectively, provide the bulk of support for these efforts. However, other federal departments and independent agencies also have significant research grant programs that involve the collection of research data and the production of associated databases outside the direct control of the government.

In FY 1998, the federal government spent approximately $19.5 billion on intramural and extramural basic research and almost $50 billion on applied research.10 A substantial fraction of that funding was devoted to the creation of primary data used for fundamental research, education, and other public-interest purposes. Among the current major observational data research programs are NASA's Earth Observing System and numerous space science missions.11 The Human Genome Project of the National Institutes of Health is another large-scale, data-intensive research effort.12 Large experimental facilities dedicated to

   

information, become knowledge." The automated process of evaluating data and finding relationships is data mining, and that of extracting information, especially predicted relationships, or discovering previously unknown patterns among data is knowledge discovery. Numeric databases are more amenable to data mining than textual databases. See Walter J. Trybula (1997), "Data Mining and Knowledge Discovery," pp. 197-229, Annual Review of Information Science and Technology, Vol. 32, Martha E. Williams, ed., published for the American Society for Information Sciences by Information Today, Inc., Medford, N.J.

7  

For a discussion of different types of scientific data and their distinguishing characteristics, see National Research Council (1997), Bits of Power, note 4, pp. 49-57.

8  

Office of Management and Budget (1999), "Budget of the United States Government, Fiscal Year 2000—Appendix," U.S. Government Printing Office, Washington, D.C., p. 1062.

9  

Office of Management and Budget (1999), note 8, p. 441.

10  

Intersociety Working Group (1999), "Table I-11. Total U.S. R&D 1996-1998," in Research and Development FY 2000, American Association for the Advancement of Science, Washington, D.C., p. 71.

11  

See the NASA Web site at <www.nasa.gov> for a description of these major projects under the Earth Sciences Mission home page at <http://www.earth.nasa.gov/missions/index.html> and the Space Science Missions home page at <http://spacescience.nasa.gov/missions/index.html>, respectively.

12  

See the NIH Web site at <www.nih.gov> and the National Human Genome Research Institute home page at <http://www.nhgri.nih.gov/HGP>.

Suggested Citation:"1 Importance and Use of Scientific and Technical Databases." National Research Council. 1999. A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases. Washington, DC: The National Academies Press. doi: 10.17226/9692.
×

enabling advances in fundamental physics are operated by the Department of Energy13 or by universities or not-for-profit organizations under contract to one or more federal government agencies.

The government also collects large amounts of data for operational, non-research applications, such as daily weather forecasting, public health and safety, and other public-interest government functions. Many of the resulting databases—such as those developed from observations made by meteorological satellites and ground-based NEXRAD radars operated by the National Oceanic and Atmospheric Administration, or the geological, hydrological, and ecological data collected by the U.S. Geological Survey in response to Department of the Interior mandates14 have multiple uses, as well as value for both immediate and long-term research.

Additional extensive data are collected continuously at the state and local government levels, principally in support of public government functions, such as the provision of local health, education, and welfare services, or the regulation of various economic activities. These databases also provide a wealth of factual and statistical information for social science researchers, as well as for historians.

While original data collection activities in the United States, especially for research and educational purposes, are carried out largely under government auspices, a significant amount of basic research is also funded outside government by both not-for-profit and commercial institutions. In 1998, nongovernmental sources spent approximately $15 billion on basic research,15 some of which was used to produce and analyze new S&T databases. In addition, most large federal government research projects and programs involve one or more foreign government agencies, often with significant international participation of researchers. 16 Large-scale research in areas such as climate trends, marine biology, and space science requires international cooperation in the collection, production, and dissemination of observational data. The effectiveness of such cooperation is dependent on, among other things, agreement on laws and policies for sharing and using those data in different countries.

Significant Aspects of Database Production

The rate of scientific progress depends not only on the collection of new data, but also on the quality of the data collected, their ease of use, and the dissemination of information about the database. Considerable attention must be

13  

See the DOE Web site at <www.doe.gov> and the Office of HighEnergy and Nuclear Physics home page at <http://www.er.doe.gov/production/henp/henp.html>.

14  

See Table 1.1 and the committee's online Proceedings, note 5, at Chapter 3, "Characteristics of Scientific and Technical Databases."

15  

Intersociety Working Group (1999), note 10, p. 71.

16  

See National Research Council (1997), Bits of Power, note 4, pp. 58-61.

Suggested Citation:"1 Importance and Use of Scientific and Technical Databases." National Research Council. 1999. A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases. Washington, DC: The National Academies Press. doi: 10.17226/9692.
×

given to all those activities necessary to organize raw or disparate data into databases for broader use. These functions and methods typically include digitally processing the data into successively more highly refined and usable products; organizing the data into a database with appropriate structure, format, presentation, and documentation; creating the necessary accompanying analytical support software; providing adequate quality assurance and quality control; announcing the availability of the database; and arranging for secure near-term storage and eventual deposit in an archive that preserves the database and enables continued access.17 As databases become ever larger and more complex, effective database production methods become increasingly important and constitute a significant component of the overall cost of the database.

The production of S&T databases requires at least some involvement by those responsible for collecting the original data. Typically, those closest to the collection of the data have the greatest expertise and interest in organizing them into a database whose contents are both available to and readily usable by others. Furthermore, the highly technical and frequently esoteric nature of S&T databases is likely to require that the original data collectors (or project scientists) participate in at least the initial stages of organizing, documenting, and reviewing the quality of data in the database. Involvement by the original data collectors in managing that part of the database production process decreases the probability that unusable or inaccurate databases will result, reduces the need for subsequent attempts to rescue or complete such data sets, and saves time and expense overall.

The level of processing and related database production activities is a significant factor in defining the ultimate utility (and legal protection) of a digital data collection. It is the original unprocessed, or minimally processed, data that are usually the most difficult to understand or use by anyone other than the originator of those data, or an expert in that particular area. With every successive level of processing, organization, and documentation, the data tend to become more comprehensible and easier to use by the nonexpert. As a database is prepared for more widespread use with the addition of more creative elements, it also tends to become more copyrightable as well as more generally marketable. In the case of observational sciences, it is the raw, noncopyrightable data that are typically of greatest long-term value to basic research (see "The Uniqueness of Many S&T Databases," below). Increased or new protection for noncopyrightable databases previously in the public domain could therefore have a disproportionate impact on the heretofore unrestricted access to and use of raw data sets for basic research and education.

Although the production of many S&T databases is performed by, or with

17  

See generally, National Research Council (1995), Preserving Scientific Data on Our Physical Universe: A New Strategy for Archiving Our Nation's Scientific Resources, National Academy Press, Washington, D.C.

Suggested Citation:"1 Importance and Use of Scientific and Technical Databases." National Research Council. 1999. A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases. Washington, DC: The National Academies Press. doi: 10.17226/9692.
×

the active participation of, the originating researchers, it also is common for third parties to be involved in an aspect of database production referred to as "value adding." Because the comprehensive production of very large or complex databases can be quite expensive, organizations that collect data, especially in government, are increasingly "outsourcing" database production and subsequent distribution to third parties in an effort to contain costs. In such instances, the raw, or minimally processed, data are provided to a private-sector vendor expressly contracted with by government to add value to the data and produce a database in a commercially marketable format to meet broad user requirements. However, since most federal government databases are openly available and in the public domain, adding value to them may be undertaken as an initiative by entrepreneurs that see a business opportunity in such activities, without any formal contractual arrangement with the government data source. (For examples of such third-party providers, see the summary in Table 1.1 and the committee's online workshop Proceedings.)

In the context of this report, the most significant aspect of these third-party, value-adding arrangements is that they almost always involve the transfer of public or publicly funded data and databases to private-sector proprietary database producers and vendors. To the extent that these transfers are done on an exclusive basis and the original government databases are not maintained or otherwise made publicly available, the result is a concomitant decrease in the public availability of S&T data.

Perspective on Number of Databases Produced—Some Statistics

According to one set of recently compiled statistics,18 over the period from 1975 through 1998 the number of all databases grew by a factor of 38 (from 301

18  

The statistics provided in this section were all compiled by Martha E. Williams (1998), "State of Databases Today: 1999," in Gale Directory' of Databases, L. Kumar, ed., Gale Research, Farmington Hills, MI. Dr. Williams notes the following caveats:

There are undoubtedly large numbers of government and private databases that are not included in these numbers. The government seems not to have a systematic way of making the existence of numeric databases known so that they could be identified and described in database directories where the public could learn about them. The compiler of these statistics estimates that there are tens of thousands of such numeric databases and that the numbers of records contained therein is in the petabyte range. Many such databases are in the hands of individual researchers, some of whom would be reluctant to fill out questionnaires or who consider the data to be of interest only to a small number of known colleagues. The statistics reported herein relate to publicly available databases where the producer wants to make the data publicly known versus databases that are available to the public in theory only. Databases that would require a Freedom of Information Act request or need-to-know are not included in these statistics.

Suggested Citation:"1 Importance and Use of Scientific and Technical Databases." National Research Council. 1999. A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases. Washington, DC: The National Academies Press. doi: 10.17226/9692.
×

to 11,339), the number of database producers increased by a factor of 18 (from 200 to 3,686), and the number of vendors grew by a factor of 23 (from 105 to 2,459). In 1975 the 301 identified databases contained about 52 million records, whereas in 1998 the 11,339 tallied databases held nearly 12.05 billion records, a 231-fold increase in the number of records.

Although in today's digitized information world databases are produced on all continents, the percentage of all types of databases produced in the United States continues to represent the lion's share of the global output. In 1998, of the 11,339 databases that were identified, 63% were produced in the United States. In 1975, of the 301 publicly available, computer-readable databases worldwide, 59% were U.S. databases. From 1985 to 1993, the ratio of U.S. to non-U.S. databases remained at about 2:1. From 1994 on, production of non-U.S. databases has accelerated somewhat, so that in 1998 the ratio of the number of U.S. to non-U.S. databases was about 3:2. The average size of U.S. databases in terms of the number of records they contained was larger than that of the non-U.S. databases. As noted above, however, most U.S. government and academic databases are not represented in these figures.

In the source quoted here, database statistics were compiled in eight major subject categories—business, health/life/medical sciences, humanities, law, multi-disciplinary, news/general, science/technology/engineering, and social sciences. If the health/life/medical sciences category is combined with science/technology/engineering, that general scientific and technical category had the largest number of databases (28%) in 1998, followed by business (26%), news/general (15%), and law (11%), with the remaining three categories accounting for the other 20%.19

The Uniqueness of Many S&T Databases

A key characteristic of original S&T databases is that many of them are the only one of their particular kind, available only from a single-source, which has significant economic and legal implications, as discussed in subsequent chapters of this report. For example, many S&T databases describe physical phenomena or transitory events that have been rendered unique by the passage of time. Measurements of a snowstorm obtained with a single radar observation, or a statistical compilation of some key socioeconomic characteristics such as income levels collected by a state agency, cannot be recaptured after the original event. The vast majority of observational data sets of the natural world, as well as all unique historical records, can never again be recreated independently and are thus available only as originally obtained, frequently from a single-source. Other S&T databases are de facto unique because the cost of obtaining the data was

19  

Williams (1998), "State of Databases Today," note 18, p. xxvi.

Suggested Citation:"1 Importance and Use of Scientific and Technical Databases." National Research Council. 1999. A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases. Washington, DC: The National Academies Press. doi: 10.17226/9692.
×

extremely high. This is the case with very large facilities for physical experiments or space-based observatories.

Even when data similar but not identical to original research results or observations are available for use in non-technical applications, scientists and engineers will likely not find an inexact replica of a database a suitable substitute if it does not meet certain specifications for a particular experiment or analysis. For example, two infrared sensors with similar spatial and spectral characteristics on different satellites collecting observations of Earth may provide relatively interchangeable data products for the non-expert consumer, but for a researcher, the absence of one spectral band can make all the difference in whether a certain type of research can be performed. Thus a database generally deemed adequate as a substitute in the mass consumer market very likely will not be usable for many research or education purposes.

Dissemination of Scientific and Technical Data and the Issue of Access

S&T data traditionally were disseminated in paper form in journal articles, textbooks, reference books, and abstracting and indexing publications. As data have become available in electronic form, they have been distributed via magnetic tape and, more recently, optical media such as CD-ROM or DVD. The growing use of the Internet has revolutionized dissemination by allowing most databases to be made available globally in electronic form. Digitization and the potential for instant, low-cost global communication have opened tremendous new opportunities for the dissemination and utilization of S&T databases and other forms of information, but also have led to a blurring of the traditional roles and relationships of database producers, vendors, and users of those databases in the government, not-for-profit, and commercial-sectors. In fact, virtually anyone who obtains access to a digital database can instantly become a worldwide disseminator, whether legally or illegally.20

Two of the most important mechanisms for the dissemination of public and publicly funded databases have been government data centers and public libraries. Government, or government-funded, data centers have been created in recent decades for dissemination of data obtained in certain programs or research disciplines. Examples of such data centers include the National Center for Biotech

20  

Of course, this same development is occurring with other forms of online information and proprietary publications outside the S&T database context, such as with copyrighted digital music and videos. For an extensive discussion of the impact of the Internet on various types of information and related intellectual property rights management, see Computer Science and Telecommunications Board, National Research Council (2000), The Digital Dilemma: Intellectual Property in the Information Age, National Academy Press, Washington, D.C., in press.

Suggested Citation:"1 Importance and Use of Scientific and Technical Databases." National Research Council. 1999. A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases. Washington, DC: The National Academies Press. doi: 10.17226/9692.
×

nology Information and the National Climatic Data Center (Table 1.1), but many others have been established for almost every field of research. 21

Public libraries, whether part of the federal depository library program, university research libraries, or other public libraries or foundations specializing in various S&T or other academic subjects, not only preserve and publicly disseminate government data, but provide general public access for many proprietary S&T databases as well. With ever-increasing costs, however, the libraries' ability to provide this public "safety net" for all published products is diminishing. 22

Historically, most federal government S&T data and government-funded research data in the United States have been fully and openly available to the public.23 This has meant that such data are available free or at low cost for academic and commercial research—and indeed any other use—without restrictions and can be incorporated into derivative databases, which can, themselves, be redistributed and incorporated into additional databases. In some instances in which the government contracts for the dissemination of data, however, the rights assigned to the database vendor may place restrictions on the ability of the research and education communities to fully utilize the data. Increasingly, both government and not-for-profit organizations are exploring means to recover database production and distribution costs, or to generate revenue streams in order to support their expensive data activities, thereby making them function in a manner similar to commercial organizations.

The ability to access existing data and to extract and recombine selected portions of them for research or for incorporation into new databases for further distribution and use has become a key part of the scientific process by which new insights are gained and knowledge is advanced. When the ability to access or distribute data on an international basis is required, various intergovernmental agreements are depended on to facilitate such exchanges in the public sector. In contrast, to achieve a suitable return on their investment, private-sector vendors of proprietary databases typically seek to control unauthorized access to and use of their databases. It is at the intersection of public and private interests in data

21  

See National Research Council (1995), Preserving Scientific Data , note 17, and the accompanying Study on the Long-term Retention of Selected Scientific and Technical Records of the Federal Government: Working Papers, National Academy Press, Washington, D.C., and National Research Council (1997) Bits of Power, note 4, for a description of many of these government S&T data centers.

22  

As the prices of many serial journal subscriptions substantially outpace the rate of inflation, for example, research libraries increasingly need to rely on interlibrary loans to obtain access for their students and professors. See Association of Research Libraries (1999), ARL Statistics: 1997-98, Martha Kyrillidou et al., eds., Association of Research Libraries, Washington, D.C.

23  

As defined in National Research Council (1997), Bits of Power, note 4, p. 15, "full and open" availability of data means that "data and information derived from publicly funded research are made available with as few restrictions as possible, on a nondiscriminatory basis, for no more than the cost of reproduction and distribution."

Suggested Citation:"1 Importance and Use of Scientific and Technical Databases." National Research Council. 1999. A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases. Washington, DC: The National Academies Press. doi: 10.17226/9692.
×

where the greatest challenges emerge. As an example, Box 1.3 sketches some of the issues and approaches currently being tried.

Use of Scientific and Technical Databases

Prior to its public dissemination, the use of a database is limited to those involved in the collection of data or production, and therefore does not provide the opportunity to contribute broadly to the advancement of scientific knowledge, technical progress, economic growth, or other applications beyond those of the immediate group. It is only upon the distribution of a database that its far-reaching research, educational, and other socioeconomic values are realized. One or more researchers applying varying hypotheses, manipulating the data in different ways, or combining elements from disparate databases may produce a diversity of data and information products. The contribution of any of these products to scientific and technical knowledge might well assume a value far greater than the costs of database production and dissemination. The results of a thorough

Box 1.3 Database Production in Competitive Research and the Question of Access

Genomic sequence databases exemplify the tension over rights in data and their uses associated with the development of original databases that have both important fundamental research uses and great potential for applied commercial products. Advances in molecular biology and automated DNA sequencing technology have made possible the rapid sequencing of genomes from a variety of life forms, including human beings. These databases are being produced simultaneously by researchers at government, not-for-profit, and commercial laboratories

Although the government and not-for-profit genomic database producers may be slower than the commercial-sector in compiling gene sequence data on the same organisms, they are striving to create analogous databases in order to provide the results on an open basis as a public good for broad research and other uses. Government and not-for-profit sequence data are collected and integrated into major sequence databases in a cooperative international effort that includes the National Center for Biotechnology Information in the United States,1 the European Molecular Biology Laboratory in the United Kingdom on behalf of the European Union,2 and the DNA Database of Japan.3 These centers not only collect and share the data on a daily basis, but also provide some quality control, documentation, and organization of the data before making the information freely available to the scientific and technical community, typically over the Internet. The Human

Suggested Citation:"1 Importance and Use of Scientific and Technical Databases." National Research Council. 1999. A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases. Washington, DC: The National Academies Press. doi: 10.17226/9692.
×

Genome Project aims to provide full sequence data for the human genome and to serve as the future reference standard.

Because of the high intrinsic commercial value of human genomic information for the identification of disease markers and therapeutic agents, commercial entities simultaneously seek to be first in generating primary genomic data, which they can license to pharmaceutical or biotechnology companies, or patent, if possible, to gain market advantage.

While the human genome provides the basic blueprint for human life, it is small differences in individual genes that are likely to provide insight into important questions such as variations in disease susceptibility in different populations (for example, why certain groups of people are predisposed to high blood pressure, diabetes, or Alzheimer's disease). These can be studied by comparing the gene sequences of different populations, such as those individuals susceptible to a disease compared to those individuals who are not. Hence, over time, gene sequence databases of a wide variety of discrete populations will be developed supported by a mix of public and private funding.

Recently, for instance, the Icelandic government formed a controversial partnership with a private U.S. firm to develop a database that will contain genetic information on the entire Icelandic population. Icelanders belong to a highly homogeneous gene pool, which will simplify the detection of disease-related genes. The government gave the firm, by statute, and exclusive license to create and operate that database. 4

Another recently begun effort involves a consortium of ten U.S. and foreign pharmaceutical companies, together with government and not-for-profit organizations, formed to generate a map of human single-nucleotide polymorphisms,5 which can be thought of as a low-resolution indicator highlighting areas of variability in the genetic code associated with genetic differences between individuals. Although the research is funded in large part by the commercial-sector, the results will be made publicly available. In addition to the cost-sharing benefits of this consortium, a major reason for its establishment is the fear that an individual company, or group of companies, could generate scientifically valuable databases and information on a proprietary basis, preventing broad access and capturing a high proportion of the associated intellectual property rights.6

1  

See the National Center for Biotechnology Information Web site at <www.ncbi.gov>.

2  

See the European Molecular Biology Laboratory Web site at <www.embl.uk>.

3  

See the DNA Database of Japan Web site at <www.ddbj.nig.ad.ip>.

4  

See J. Gulcher and K. Stefansson (1999), "An Icelandic Saga on a Centralized Healthcare Database and Democratic Decision Making," National Biotechnology, Vol. 17, July, p. 620, and Martin Enserink (1998), "Physicians Wary of Scheme to Pool Icelanders' Genetic Data," Science, Vol. 281, August 14, pp. 890-891.

5  

See Eliot Marshall (1999), "Drug Firms to Create Public Database of Genetic Mutations," Science, Vol. 284, April 16, pp. 406-407.

6  

See Marshall (1999), note 5.

Suggested Citation:"1 Importance and Use of Scientific and Technical Databases." National Research Council. 1999. A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases. Washington, DC: The National Academies Press. doi: 10.17226/9692.
×

database analysis may reveal a value of the data not apparent in even a detailed examination of the individual elements of the database itself. With the widespread availability of information on the Internet have come abundant opportunities to search for scientific and technical gold in this ore of factual elements. The possibilities for discovery of new insights about the natural word—with both commercial and public-interest value—are extraordinary.

In considering how databases are used, it is important to distinguish between end use and derivative use. End use—accessing a database to verify some fact or perform some job-related or personal task, such as obtaining an example for a work memo—is most typical of public consumer uses. End use does not involve the physical integration of one or more portions of the database into another database in order to create a new information product. A derivative (value-adding or transformative) use (see Box 1.1) builds on a preexisting database and includes at least one, and frequently many more, extractions from one or more databases to create a new database, which can be used for the same, a similar, or an entirely different purpose than the original component database(s).

Integration of Distributed Data to Broaden Access and Potential for Discovery

In seeking new knowledge, researchers may gather data from widely disparate sources. A significant advantage arising from the abundance of digitized data now accessible through both private and public networks is the potential for linking data in multiple (even thousands of) databases. The ability to link sites on the World Wide Web is one type of integration that could result in more data being available overall to users. Another is the merging of databases of the same or complementary content. It is now possible to maintain a site with continuously verified links to related information sites for use by subscribers or members of a specific group; an example is the Engineering Village of Engineering Information, Inc.24 Yet another type of integration occurs in the connection of distributed databases such that different parts of a single large database may reside on different computers in geographically dispersed locations throughout the country or the world. With a common structure, data can be located in a physically distributed network and accessed as if they were in one database in one computer in one location. The cost can thus be distributed and the value of each contributory database increased. Still other databases are automatically created from other databases. For example, data are routinely mined and collected by ''knowbots" and "web crawlers" (software employing artificial intelligence and rule-based selection techniques) on the Internet throughout the world and retrieved for pro

24  

See, for example, the Engineering Village of Engineering Information Web site at <www.ei.org/aivillage/village.serve-page?p=4011>.

Suggested Citation:"1 Importance and Use of Scientific and Technical Databases." National Research Council. 1999. A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases. Washington, DC: The National Academies Press. doi: 10.17226/9692.
×

cessing and further use. One such data mining activity in the area of biotechnology was described and discussed at the committee's January 1999 workshop (see the Molecular Applications Group's activities summarized in Table 1.1).

With a capability to integrate information in multiple databases comes the potential for exploiting relationships identified in the information and developing new knowledge. In many scientific fields, the initial investment by the database rights holder may not produce the greatest value until it is integrated with the investments of others. For example, while protein sequence data are valuable in their own right, their value is greatly enhanced if associated x-ray crystallographic data are also concurrently available. It is possible to use the combined data to understand the way in which protein chains are folded and, in the case of an enzyme, the way in which various nonsequential residues, or even residues on separate protein chains, combine to form an active site.

Derivative Databases and New Data-Driven Research and Capabilities

The ethos in research is that science builds on science. The creation of derivative databases not only enables incremental advances in the knowledge base, but also can contribute to major new findings, particularly when existing data are combined with new or entirely different data. The importance for research and related educational activities of producing new derivative databases cannot be overemphasized. 25 The vast increase in the creation of digital databases in recent decades, together with the ability to make them broadly and instantaneously available, has resulted in entire new fields of data-driven research.

For example, the study of biological systems has been transformed radically in the past 20 years from an experimental research endeavor conducted in laboratories to one that relies heavily on computing and on access to and further refine

25  

As noted by Vinton Cerf, senior vice president at MCI WorldCom, Inc. ("ACM Awards Keynote," Association for Computing Machinery, New York City, May 15, 1999):

Scientific databases are proving to be non-linear accelerators of research in specific fields such as biology, astronomy, meteorology, space physics, chemistry, economics, epidemiology, environmental studies and a wealth of other fields. The non-linearity comes about because as each research adds more material to the database, the information is placed in juxtaposition with all other items in the system, exhibiting the same kind of non-linear impact that placing computers in a common network has had, in accordance with Metcalfe's Law (which says that the value of the network grows as the square number of devices in the net). Cerf's Law says that shared databases grow in value in accordance with the number of combinations of data items in the database. When the hundreds of thousands of databases on the Internet and other networks are accessible remotely and can be reached in parallel, and when the partial results can be combined and searched a new, the value of these data can grow dramatically.

Suggested Citation:"1 Importance and Use of Scientific and Technical Databases." National Research Council. 1999. A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases. Washington, DC: The National Academies Press. doi: 10.17226/9692.
×

ment of globally linked databases.26 Indeed, one of the fastest growing disciplines is bioinformatics, a computer-based approach to biological research. New technologies, such as DNA microarrays and high-throughput sequencing machines, are producing a deluge of data. A challenge to biology in the coming decades will be to convert these data into knowledge.27

The availability of global remote-sensing satellite observations, coupled with other airborne and in situ observational capabilities, has given rise to a new field of environmental research, Earth system science, which integrates the study of the physical and biological processes of our planet at various scales. The large meteorological databases obtained from government satellites, ground-based radar, and other data collection systems pose a challenge similar to that mentioned above for biology, but also already have yielded a remarkable range of commercial and non-commercial value. Dissemination of the atmospheric observations in real-time or near-real time for "nowcasts" and daily weather forecasts has very high commercial value, which is captured by third-party distributors. Use of these atmospheric observations to develop numerical models that predict the weather accurately, hours or days in advance, adds value in terms of safety and economic benefits to society that are not readily quantifiable. While the economic value of these data can be gauged by the profits of private-sector distributors, how does one measure the value of the lives and property saved by timely and accurate hurricane forecasts and tornado warnings? Once the immediate and most lucrative commercial value is exploited, the resulting data continue to have significant commercial and public-interest uses indefinitely. For instance, these data enable basic research on severe weather and long-term climate trends and provide various retrospective applications for industry. The original databases are archived and made available by the National Climatic Data Center (see Box 1.2). Derivative databases and data products are distributed under various arrangements by both commercial and not-for-profit entities like the Unidata Program of the University Corporation for Atmospheric Research (see Table 1.1 and the online workshop Proceedings).

Geographic information systems that integrate myriad sources of data provide an opportunity for new insights about the natural and constructed environment, greatly enhancing our knowledge of where we live and how we affect our physical environment. Important applications include environmental management, urban planning, route planning and navigation, emergency preparedness

26  

See generally, Working Group on Biomedical Computing (1999), "The Biomedical Information Science Initiative," National Institutes of Health, June 3, available online at <www.nih.gov/welcome/director/060399.htm>.

27  

See Sylvia Spengler and Manfred D. Zorn (1999), "Handling Data Sets in Biology," Lawrence Berkeley National Laboratory Colloquium, Washington, D.C.

Suggested Citation:"1 Importance and Use of Scientific and Technical Databases." National Research Council. 1999. A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases. Washington, DC: The National Academies Press. doi: 10.17226/9692.
×

and response, land-use regulation, and enhancement of agricultural productivity, among many others.28

Finally, databases used by researchers and educators also frequently are produced and disseminated primarily for other purposes. For example, a physical scientist studying the complex relationships among geology, hydrology, and biology as they relate to the preservation of species diversity likely would draw on numerous digital and hard copy databases originally gathered for other purposes. A social scientist studying the characteristics and patterns of urban crime or the spread of communicable diseases likely would do the same. For many scientists, the ability to supplement existing databases with further data collection in a seamless web of old and new data is basic to meeting the needs of their specific investigations.

Text Databases and Online Publication

Another type of S&T database not yet discussed, but that is used extensively by the research community, consists primarily of text with data summarized or added as examples. These databases may consist of primary literature (as in the case of full text databases of journal articles) or secondary literature (as in the case of bibliographic reference databases). Traditionally, this text has been available in print form, with publishers providing peer review, professional editing, indexing and formatting, and other services, including marketing and distribution. Increasingly this information is being provided as text databases with the publishers also providing the systems that allow access to these databases. These value-adding or information repackaging functions are performed by both not-for-profit and for-profit organizations. For example, the not-for-profit American Association for the Advancement of Science, a scientific society, produces a database containing the full text of articles from Science magazine, including enhancements to the content that do not appear in the print version.29 Similarly, the for-profit publisher Elsevier Science produces Science Direct, a database containing the full text of its journal articles. Bibliographic reference databases are also produced by government, not-for-profit, and for-profit organizations, such as the National Library of Medicine, Chemical Abstracts Service, and the Institute for Scientific Information, respectively (see Table 1.1 and the online workshop Proceedings30). Where full text databases include associated data collec

28  

National Academy of Public Administration (1998), Geographic Information for the 21st Century: Building a Strategy for the Nation , National Academy of Public Administration, Washington, D.C.

29  

See Science online at <http://www.sciencemag.org/>.

30  

See the committee's online Proceedings, note 5, at Chapter 3, "Characteristics of Scientific and Technical Databases."

Suggested Citation:"1 Importance and Use of Scientific and Technical Databases." National Research Council. 1999. A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases. Washington, DC: The National Academies Press. doi: 10.17226/9692.
×

tions, physical and legal possession of the data collections may be retained by the originator or may pass to the publisher.

As S&T data and results are increasingly digitized and made available online, publishers are seeking access to and inclusion of the underlying data collections on which published articles are based. The intent is not only to provide greater validity and support for published research articles, but also to make their online publications more interesting and useful to the S&T customer base. The ability to link to the underlying databases instantaneously and at different levels of detail adds an entirely new and exciting dimension to scientific publishing and to the potential for new research, but also raises the question of who will have the rights to exploiting those data.

THE CHALLENGE OF EFFECTIVELY BALANCING PRIVATE RIGHTS AND THE PUBLIC INTEREST IN SCIENTIFIC AND TECHNICAL DATABASES

The general advancement of knowledge independent of its eventual societal benefits is a goal of basic research. Nevertheless, an endless array of examples demonstrates how the creation of new knowledge, building on the existing base of understanding and information developed by researchers, has enabled broad and important socioeconomic benefits for the nation as a whole. Our society appreciates that knowledge itself is intrinsically valuable and important, and our success in the world market for advanced technology products and services attests to the direct economic benefits of the resulting applications. It is for these reasons that government funds basic research and related data activities as a public good.31,32 Yet it is precisely these activities that are at risk of being hindered, if not in some instances stopped, by proposed major changes to the legal protections of factual databases.

31  

As Lester Thurow points out: "A successful knowledge-based economy requires large public investments in education, infrastructure, and research and development." He goes on to say that private returns are apt to be more certain if one is looking for an extension of existing knowledge rather than for a major breakthrough; thus private firms tend to concentrate their money on the development end of the R&D process. Time lags are shorter, and in the business world speed is everything. Because of this proclivity in the private-sector, government should focus its spending on the long-tailed projects for advancing basic knowledge. This is where the private firms won't invest, but it is precisely where the breakthroughs that generate business opportunities are made.

(Lester C. Thurow (1999), "Building Wealth: The New Rules for Individuals, Companies, and Nations," Atlantic Monthly, June, p. 64.)

32  

For a discussion of public goods in the context of basic scientific research and related data activities, see National Research Council (1997), Bits of Power, note 4, pp. 111-114. This issue is discussed in greater detail in Chapters 3 and 4.

Suggested Citation:"1 Importance and Use of Scientific and Technical Databases." National Research Council. 1999. A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases. Washington, DC: The National Academies Press. doi: 10.17226/9692.
×

Legislative efforts are currently under way in the United States, the European Union, and the World Intellectual Property Organization to greatly enhance the legal protection of proprietary databases. These new legal approaches threaten to compromise traditional and customary access to and use of S&T data for public-interest endeavors, including not-for-profit research, education, and general library uses. At the same time, there are legitimate concerns by the rights holders in databases regarding unauthorized and uncompensated uses of their data products, including at times the wholesale commercial misappropriation of proprietary databases.

Because of the complex web of interdependent relationships among public-sector and private-sector database producers, disseminators, and users, any action to increase the rights of persons in one category likely will compromise the rights of the persons in the other categories, with far-reaching and potentially negative consequences. Of course, it is in the common interest of both database rights holders and users—and of society in general—to achieve a workable balance among the respective interests so that all legitimate rights remain reasonably protected.

Suggested Citation:"1 Importance and Use of Scientific and Technical Databases." National Research Council. 1999. A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases. Washington, DC: The National Academies Press. doi: 10.17226/9692.
×
Page 14
Suggested Citation:"1 Importance and Use of Scientific and Technical Databases." National Research Council. 1999. A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases. Washington, DC: The National Academies Press. doi: 10.17226/9692.
×
Page 15
Suggested Citation:"1 Importance and Use of Scientific and Technical Databases." National Research Council. 1999. A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases. Washington, DC: The National Academies Press. doi: 10.17226/9692.
×
Page 16
Suggested Citation:"1 Importance and Use of Scientific and Technical Databases." National Research Council. 1999. A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases. Washington, DC: The National Academies Press. doi: 10.17226/9692.
×
Page 17
Suggested Citation:"1 Importance and Use of Scientific and Technical Databases." National Research Council. 1999. A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases. Washington, DC: The National Academies Press. doi: 10.17226/9692.
×
Page 18
Suggested Citation:"1 Importance and Use of Scientific and Technical Databases." National Research Council. 1999. A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases. Washington, DC: The National Academies Press. doi: 10.17226/9692.
×
Page 19
Suggested Citation:"1 Importance and Use of Scientific and Technical Databases." National Research Council. 1999. A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases. Washington, DC: The National Academies Press. doi: 10.17226/9692.
×
Page 20
Suggested Citation:"1 Importance and Use of Scientific and Technical Databases." National Research Council. 1999. A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases. Washington, DC: The National Academies Press. doi: 10.17226/9692.
×
Page 21
Suggested Citation:"1 Importance and Use of Scientific and Technical Databases." National Research Council. 1999. A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases. Washington, DC: The National Academies Press. doi: 10.17226/9692.
×
Page 22
Suggested Citation:"1 Importance and Use of Scientific and Technical Databases." National Research Council. 1999. A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases. Washington, DC: The National Academies Press. doi: 10.17226/9692.
×
Page 23
Suggested Citation:"1 Importance and Use of Scientific and Technical Databases." National Research Council. 1999. A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases. Washington, DC: The National Academies Press. doi: 10.17226/9692.
×
Page 24
Suggested Citation:"1 Importance and Use of Scientific and Technical Databases." National Research Council. 1999. A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases. Washington, DC: The National Academies Press. doi: 10.17226/9692.
×
Page 25
Suggested Citation:"1 Importance and Use of Scientific and Technical Databases." National Research Council. 1999. A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases. Washington, DC: The National Academies Press. doi: 10.17226/9692.
×
Page 26
Suggested Citation:"1 Importance and Use of Scientific and Technical Databases." National Research Council. 1999. A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases. Washington, DC: The National Academies Press. doi: 10.17226/9692.
×
Page 27
Suggested Citation:"1 Importance and Use of Scientific and Technical Databases." National Research Council. 1999. A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases. Washington, DC: The National Academies Press. doi: 10.17226/9692.
×
Page 28
Suggested Citation:"1 Importance and Use of Scientific and Technical Databases." National Research Council. 1999. A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases. Washington, DC: The National Academies Press. doi: 10.17226/9692.
×
Page 29
Suggested Citation:"1 Importance and Use of Scientific and Technical Databases." National Research Council. 1999. A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases. Washington, DC: The National Academies Press. doi: 10.17226/9692.
×
Page 30
Suggested Citation:"1 Importance and Use of Scientific and Technical Databases." National Research Council. 1999. A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases. Washington, DC: The National Academies Press. doi: 10.17226/9692.
×
Page 31
Suggested Citation:"1 Importance and Use of Scientific and Technical Databases." National Research Council. 1999. A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases. Washington, DC: The National Academies Press. doi: 10.17226/9692.
×
Page 32
Suggested Citation:"1 Importance and Use of Scientific and Technical Databases." National Research Council. 1999. A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases. Washington, DC: The National Academies Press. doi: 10.17226/9692.
×
Page 33
Suggested Citation:"1 Importance and Use of Scientific and Technical Databases." National Research Council. 1999. A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases. Washington, DC: The National Academies Press. doi: 10.17226/9692.
×
Page 34
Suggested Citation:"1 Importance and Use of Scientific and Technical Databases." National Research Council. 1999. A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases. Washington, DC: The National Academies Press. doi: 10.17226/9692.
×
Page 35
Suggested Citation:"1 Importance and Use of Scientific and Technical Databases." National Research Council. 1999. A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases. Washington, DC: The National Academies Press. doi: 10.17226/9692.
×
Page 36
Suggested Citation:"1 Importance and Use of Scientific and Technical Databases." National Research Council. 1999. A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases. Washington, DC: The National Academies Press. doi: 10.17226/9692.
×
Page 37
Suggested Citation:"1 Importance and Use of Scientific and Technical Databases." National Research Council. 1999. A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases. Washington, DC: The National Academies Press. doi: 10.17226/9692.
×
Page 38
Suggested Citation:"1 Importance and Use of Scientific and Technical Databases." National Research Council. 1999. A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases. Washington, DC: The National Academies Press. doi: 10.17226/9692.
×
Page 39
Next: 2 Incentives and Disincentives Affecting the Availability and Use of Scientific and Technical Databases »
A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases Get This Book
×
Buy Paperback | $44.00 Buy Ebook | $35.99
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

New legal approaches, such as the European Union's 1996 Directive on the Legal Protection of Databases, and other legal initiatives now being considered in the United States at the federal and state level, are threatening to compromise public access to scientific and technical data available through computerized databases. Lawmakers are struggling to strike an appropriate balance between the rights of database rights holders, who are concerned about possible commercial misappropriation of their products, and public-interest users of the data such as researchers, educators, and libraries.

A Question of Balance examines this balancing act. The committee concludes that because database rights holders already enjoy significant legal, technical, and market-based protections, the need for statutory protection has not been sufficiently substantiated. Nevertheless, although the committee opposes the creation of any strong new protective measures, it recognizes that some additional limits against wholesale misappropriation of databases may be necessary. In particular, a new, properly scoped and focused U.S. statute might provide a reasonable alternative to the European Union's highly protectionistic database directive. Such legislation could then serve as a legal model for an international treaty in this area. The book recommends a number of guiding principles for such possible legislation, as well as related policy actions for the administration.

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    Switch between the Original Pages, where you can read the report as it appeared in print, and Text Pages for the web version, where you can highlight and search the text.

    « Back Next »
  6. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  7. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  8. ×

    View our suggested citation for this chapter.

    « Back Next »
  9. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!