Summary

In today's technological world, sustaining science as a source of new knowledge and innovation has become as important to modern society as maintaining the nation's capabilities in manufacturing, trade, and defense. The extent to which public funding in the developed world supports science is testimony to society's recognition that basic as well as applied research must be carried out to advance the public interest.

Science itself is a living enterprise. With few exceptions, acquisition of scientific knowledge is a cumulative process that depends on researchers' continuing ability to collect and share data. This capability has been strengthened by the advent of information technology, which is supplying powerful new tools and enabling new styles of working. However, far-reaching changes involving complex technical, economic, and legal issues also have begun to alter the conditions for exchange of data among scientists, especially across national boundaries.

To help understand the impact of such changes and to learn what actions are needed to ensure full and open exchange of scientific data1 worldwide among researchers in the natural sciences, the Committee on Issues in the Transborder Flow of Scientific Data undertook a study responding to the following charge:

  • Outline the needs for access to data in the major research areas of current scientific interest that fall within the scope of CODATA—the physical, astronomical, geological, and biological sciences.
  • Characterize the legal, economic, policy, and technical factors and trends that have an influence—whether favorable or negative—on access to data by the scientific community.


The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 1
--> Summary In today's technological world, sustaining science as a source of new knowledge and innovation has become as important to modern society as maintaining the nation's capabilities in manufacturing, trade, and defense. The extent to which public funding in the developed world supports science is testimony to society's recognition that basic as well as applied research must be carried out to advance the public interest. Science itself is a living enterprise. With few exceptions, acquisition of scientific knowledge is a cumulative process that depends on researchers' continuing ability to collect and share data. This capability has been strengthened by the advent of information technology, which is supplying powerful new tools and enabling new styles of working. However, far-reaching changes involving complex technical, economic, and legal issues also have begun to alter the conditions for exchange of data among scientists, especially across national boundaries. To help understand the impact of such changes and to learn what actions are needed to ensure full and open exchange of scientific data1 worldwide among researchers in the natural sciences, the Committee on Issues in the Transborder Flow of Scientific Data undertook a study responding to the following charge: Outline the needs for access to data in the major research areas of current scientific interest that fall within the scope of CODATA—the physical, astronomical, geological, and biological sciences. Characterize the legal, economic, policy, and technical factors and trends that have an influence—whether favorable or negative—on access to data by the scientific community.

OCR for page 1
--> Identify and analyze the barriers to international access to scientific data that may be expected to have the most adverse impact in discipline areas within CODATA's purview, with emphasis on factors common to all the disciplines. Recommend to the sponsors of the study approaches that could help overcome barriers to access in the international context. This study addresses issues in effective access to data in numerical, symbolic, and image forms by scientists for scientific research purposes, rather than to bibliographic or purely textual information. The focus is on digital rather than analog data, since practically all scientific data are now collected and stored digitally, and most older data are being transferred to digitized electronic formats. The scope of inquiry also is limited to data in the natural sciences, which is the principal subject-matter focus of CODATA.2 Because the sponsors of the study are U.S. federal government science agencies, the committee has emphasized those trends, issues, and barriers that have an impact on international access to data collected and used in publicly funded, basic research programs—that is, scientific research conducted as a public good. Despite this emphasis, the committee took into account the continua between fundamental and applied research, between raw data and processed information, and between public and private uses of scientific data. Indeed, the most vexing public policy issues facing the international scientific community in the exchange of data involve defining the appropriate balance of divergent interests. Underlying the committee's approach, however, and informing its conclusions and recommendations, is the principle that full and open exchange of scientific data—the ''bits of power" on which the health of the scientific enterprise depends—is vital for advancing the nation's progress and for maximizing the social benefits that accrue from science worldwide. COMPLEX DEVELOPMENTS AFFECTING ACCESS TO SCIENTIFIC DATA Recent Trends and Emerging Concerns Freedom of inquiry, the full and open availability of scientific data on an international basis, and the open publication of results are cornerstones of basic research that U.S. law and tradition have long upheld. For many decades, the United States has been a leader in the collection and dissemination of scientific data, and in the discovery and creation of new knowledge. By sharing and exchanging data with the international community and by openly publishing the results of research, all countries, including the United States, have benefited. Today, however, many rapid changes portend significant consequences, some possibly adverse, for the conduct of basic research in the natural sciences.

OCR for page 1
--> In broad terms, the challenges of greatest import for full and open global sharing of scientific data are those associated with two quite recent trends: The need for scientists to adapt to conducting research with data that come in rapidly increasing quantities, varieties, and modes of dissemination, frequently for purposes far more interdisciplinary than in the past; and The worldwide trend toward imposition of increasing economic and legal restrictions on access to scientific data gained from publicly funded research. The former obliges scientists to reexamine how they carry out their calling. The latter impels the scientific community to become more involved in understanding the significance of public policies and legislative activities that can have a profound impact on their work. Chief among recent developments affecting access to scientific data is the widespread use of powerful new technologies for data acquisition, storage, and communication, as well as their inevitable consequence, the rapidly growing quantity of data that scientists are generating, preserving, and distributing. Moreover, because of increasingly diverse applications for the results of scientific research, these data are becoming ever more useful and valuable in many sectors outside the specific areas of research that generate them. Finding ways to distribute such information to all who want it—equitably, reliably, and in keeping with the principle of full and open exchange as a sine qua non of progress in science—is the greatest challenge this committee identified while conducting its study. Although scientific interchange was an important stimulus for development of the Internet and initially represented one of its greatest uses, commercial activities and entertainment now far surpass scientific use of the network and may be expected to dominate policymaking for the electronic exchange of information. This development raises questions about the scientific community's continuing capability to utilize what has clearly become a beneficial and versatile tool for scientific exchange and interaction. The economic framework for a global information system and legal models for dealing with conflicting interests are increasingly influenced by stakeholders who have no long-term responsibilities for, or concern about, sustaining publicly funded scientific inquiry. Simultaneously, the government science agencies expected to assume long-term responsibilities for sustaining scientific inquiry are questioning their capacity to continue to invest at traditional levels in the creation, preservation, and dissemination of scientific data. Issues in Information Technology Some technical trends and developments have had a significant, largely positive impact on the management and international exchange of scientific data. These include the steadily decreasing cost of computing and communication;

OCR for page 1
--> greatly enhanced capabilities for collecting scientific data, for example, from remote sensors; increasing exploitation of broadband networks and capabilities for transmission of video data over networks; the advent of digital wireless communication; increasing support for collaborative work by long-distance communication; growing capabilities for natural language processing; increasing recognition of the importance of standards in data structures and in networked communication; growing acceptance of the need for cooperation in monitoring and controlling network activity; and increasing use of intranets. Associated with advances in, and increasing reliance on, information tools and infrastructure are a number of problems that present barriers to access, including the growing congestion of the Internet and consequent constraints on scientific communication and research; the storage and distribution of data that are inadequately described or indexed for significant numbers of potential users; the rapid obsolescence of electronic information-processing tools and storage media; the vulnerability of electronic networks and data repositories to accidental or deliberate damage; and the growing competition for use of currently limited network resources. Another difficulty—the current lack of adequate access to scientific data in developing countries-nevertheless has the potential to improve quickly. Data Issues in the Natural Sciences The natural sciences—including the physical, astronomical, geological, and biological sciences—face a number of trends, opportunities, and challenges affecting researchers' capabilities for sharing data. The most obvious involves dealing with the exponentially growing volume of accumulating scientific data, which now, as a result of expanding computational power, also includes elaborate simulations that often incorporate animation as well as quantitative information. With the end of the Cold War has come declassification of some data that are now providing many new opportunities for researchers, particularly in the Earth sciences. In addition, because of the breadth and scale of major interdisciplinary, global-scale research efforts such as the International Geosphere-Biosphere Programme, the Human Genome Project, and the Hubble Space Telescope project, data from individual disciplines have become important to understanding and progress in other fields. Making data available, comprehensible, and useful across disciplinary boundaries has become a far greater imperative than before these projects existed. This task, however, is complicated by the fact that scientific data do not constitute a uniform, easily accessible body of information. For example, scientific data may be categorized in many ways: by form or coding (numeric, symbolic, still image, animation, or other); by content; by means of generation; by level of quality and complexity; by the source of support for the data-accumulating activity; by time and space, in the case of observational, geospatial records; and by the institutional structures through which the data are distributed and stored. Certain of these characteristics, such as level of quality

OCR for page 1
--> (including degree of review and certification) and institutional origin, have given rise to additional complications associated with the increasingly pervasive electronic distribution of scientific data. Some data issues are more discipline specific. Perennial problems affecting access to data in the observational sciences, for example, include gaps in quality control, incompatibility of data streams, inadequate documentation of data sets, and difficulty in meeting the requirements for long-term retention of data. In the biological sciences, the variety of attributes and qualifiers included with each observation and differences in terminology and usage put a heavy burden on any supplier of data to identify and specify the character of the data precisely enough to prevent misinterpretation. In the laboratory physical sciences, as in many other branches, fragmentation of data into numerous, autonomous, and often incompatible databases with different formats and levels of quality is a chronic problem. Putting scientific data to use rapidly in sectors outside the immediate discipline of origin poses additional challenges to the longer-term effort to provide full and open access. In the observational environmental sciences, for instance, massive archives and reliable institutional memory are necessary to keep the data accessible and intelligible. Simultaneously, however, data also must be available to meet the public's need for warnings of natural hazards and disasters and for commercial use by the private sector. In addition, availability of data can be affected by governmental concerns related to national security, foreign policy, and international trade. Newly adopted or proposed restrictions on previously open and unrestricted data have caused particular concern in the Earth science communities, for example. Another significant concern regarding full and open access to scientific data is related to commercialization of electronic publication and electronic databases. Science operates according to a "market" of its own, one that has rules and values different from those of commercial markets. While protection of intellectual property may concern a scientist who is writing a textbook, that same scientist, publishing a paper in a scientific journal, is motivated by the desire to propagate ideas, with the expectation of full and open access to the results. To commercial publishers (including many professional societies), protection of intellectual property means protection of the rights to reproduce and distribute printed material. To scientists, protection of intellectual property usually signifies assurance of proper attribution and credit for ideas and achievements. Generally, scientists are more concerned that their work be read and used rather than that it be protected against unauthorized copying. These conflicting viewpoints pose challenging problems for science and the rest of society. Current discussions are seeking a balance between protecting publicly supported activities that advance the public welfare and strengthening individual rights to intellectual property. Associated with the internationalization of scientific data collection and use has been the growth of data centers—dedicated, stable institutions supporting collaborative data sharing across international boundaries and providing verifica-

OCR for page 1
--> tion, documentation, archiving, and dissemination of large, accumulating data sets. The scientific community is increasingly dependent on these data centers—on their skills in data management and distribution and on their capacity to support international scientific efforts. Finally, an important concern in global access to scientific data is the need to improve capabilities for electronic communication by researchers working in developing countries. A two-way communication capability is needed: scientists in developing countries, like scientists everywhere, generate data that are just as important to science as the data they acquire. Finding ways to help less developed nations acquire affordable electronic network services is an effort that can and should be undertaken by concerned national and international organizations with the help of the telecommunications sector. The constraints caused by inequalities among nations in access to scientific data are especially damaging to those sciences concerned with inherently international issues, such as food production, biodiversity, the prevention and cure of communicable diseases, global climate change, and other Earth system processes. Each of these sciences requires the generation of globally compatible, accessible, and usable data sets related to terrestrial ecosystems, the physical environment, and human activities. Collaboration among members of the scientific communities in every nation, rich and poor, in developing global observational data sets and in ensuring the subsequent full and open availability of those data is imperative; its importance cannot be emphasized too strongly. Economic Aspects of Scientific Data As the quantities and uses of scientific data have expanded, and as nations' discretionary budgets have become increasingly constrained, some governments have begun to privatize activities previously delivered by the public sector and have sold some products and services on a commercial basis—including the generation and distribution of scientific data. This development has stimulated fears that scientific data may become priced beyond the means of the scientific communities, even in the more developed countries, despite the fact that the conduct of basic scientific research, like other government activities related to public health and safety, serves the public welfare and thus is appropriately supported by government funding. Although economists may initially see privatization as a positive development for science, careful analysis suggests that a market model different from that of ordinary commerce is more appropriate for scientific activity for several reasons. First, the conduct of some scientific research is itself tightly tied to the collection, maintenance, and distribution of the data generated by that research. In particular, in the observational sciences, whose databases can be massive, separating the gathering, archiving, and maintenance of data from their distribution is likely to be more costly and inefficient than keeping them integrated.

OCR for page 1
--> Second, the contributors of scientific data, particularly in basic research, are frequently also the consumers of such data, and nonmonetized exchange of data may be most efficient in such cases. Third, in many situations, the market for scientific data is not large enough to support more than a single commercial supplier, if that. Finally, most basic research is necessarily funded from public sources. Privatizing the distribution of those data would mean that the funds now provided in grants to institutions supplying data would be channeled instead (if such funds were still available) to grants to individual scientists as users of data. Such funds in small grants to individuals are likely to be vulnerable to even the slightest budgetary pressure, thus potentially compromising the long-term health of science. Direct appropriation or block grant support to institutions with broad responsibilities for data management, preservation, and distribution, while not assured of continuity, is typically more stable and secure and is fortified by institutional memory that recognizes and supports the continued utility of archived data. At issue now is whether or when the government should remove itself entirely as a distributor of scientific data. (There is no question here regarding the continued support by government of data generation; it is a part of the process of doing basic research that falls outside the charge of this study.) Largely because of the possibility of monopoly control and the potential threat to the principle of full and open availability of data, the government should not remove itself as a primary distributor of the scientific data that its funding has produced, without adequate safeguards as discussed below.3 The concern that privatization, accompanied by high prices and legal restrictions, would limit scientists' access to data needed for their work is paralleled by a similarly serious concern among economists about the possibilities for unrestricted monopolization, particularly by any party whose objectives do not include advancing the public interest. Whether they are private or governmental in nature, profit-making monopolies would endanger science, whereas privatization structured so as to encourage competition in supplying value-added data to multiple user communities could well represent good public policy. Any pricing policies that bear on the availability of scientific data should reflect this information's characteristics as a public good—a resource that is both nondepletable (cannot be diminished by repeated use) and indivisible and nonexcludable (once having been supplied to some, cannot easily be denied to others). Because there is no social cost from repeated use, price differentiation may be justified in many situations, to ensure that the needs of the scientific community are met. Pricing of government-funded data in a differentiated system should ensure that data are available at no cost to those who provide them or otherwise contribute substantively to any given data set; for others, including commercial users, prices for data should cover the costs of serving those users. Because there is a cost associated with repeated distribution, marginal pricing has been the policy in many of the sciences. It allocates the smallest nonzero cost to users and thus is consistent with the principle of full and open exchange of data.

OCR for page 1
--> Internet congestion, a growing problem for transnational exchange of scientific data, has obvious economic aspects and will be resolved only if participating nations and network providers work together. For the scientific community, a partial solution may involve the creation of separate intranets. Intellectual Property Rights in Data: Legal Constraints on Full and Open Access? The emergence of a new intellectual property rights model that protects the contents of electronic databases as well as those in print has the potential to significantly affect the international flow of scientific data. The problem has reached a crux with the current attempts, national and international, to establish a legal framework that threatens to subordinate the needs of data users working in the public interest to the desires of those seeking protection of investments in creating and maintaining databases. Unfortunately, and until very recently, the input into this legislative process at all levels by the scientific and educational communities has been all but nonexistent. Sustained action by those sectors is needed to avert possible restrictions on the full and open exchange of scientific data. The U.S. Constitution articulates the legal protection of technological inventions and of literary and artistic works through the patent and copyright systems, which attempt to balance incentives to create against the public interest in free competition. Any publicly disclosed technology or information that does not meet the eligibility requirements for protection under U.S. patent and copyright laws becomes public domain matter that anyone can appropriate freely. Moreover, the special needs of libraries, educators, and researchers for access to the copyrighted literature has been recognized under the concept of fair use.4 But this traditional balancing of private and public rights has become more complex in the information age. Many information goods with commercial value, notably the contents of most electronic databases, are not eligible for patent or copyright protection, and database producers consequently face the threat of rapid duplication by free-riding competitors who do not contribute to the costs of collecting, managing, or disseminating the relevant data. In its 1991 decision in Feist Publications, Inc. v. Rural Telephone Service Co.,5 the U.S. Supreme Court raised the threshold of eligibility for copyright protection, requiring significant original and creative authorship in the selection and arrangement of contents and not simply industrious compiling efforts. Earlier, the Commission of the European Communities (CEC) had started to develop a new protection framework for databases to encourage their commercialization in Europe. This culminated in the formal adoption of a new European Directive on Databases by the CEC in March 1996, which reflected influences by the Feist decision, as well as other concerns in Europe. In May 1996, legislation similar to the final European Directive, but even more protective, was introduced in the U.S. House of Representatives (H.R. 3531), and in August 1996, a proposal almost identical to

OCR for page 1
--> the proposed U.S. legislation was placed before a Diplomatic Conference under the auspices of the World Intellectual Property Organization (WIPO) with a view to adopting a new protocol to the Berne Convention that would protect non-copyrightable databases in a tailor-made legal regime. Action on this proposal has been postponed until later in 1997. Scientific data already largely compiled and distributed in electronic form constitute one of many types of data and information that will be affected by the legal framework now evolving in response to conflicting needs. Although new forms of legal protection may be needed to attract private investment to finance the creation and maintenance of electronic databases, including those for use in science and technology, current European and U.S. initiatives would confer a monopoly on database developers far broader and stronger than is needed to avert market failure. The pending legislation would create exclusive, monopolistic property rights of virtually unlimited duration, but without public policy limitations. If adopted in their current form, these legal proposals could jeopardize basic scientific research and education, eliminate competition in the markets for value-added products and services, and raise existing thresholds to entry into insuperable legal barriers to entry. If put into practice, such measures could restrict the full and open access to data on which scientists and educators have depended. Neither the already adopted European Directive on Databases nor the proposed WIPO protocol and pending U.S. legislation would provide adequate fair use safeguards that recognize the needs of the scientific and educational communities for unrestricted access to data at affordable prices. They take little or no cognizance of the public-good character of scientific data for research and educational purposes. More generally, such an approach ignores the contribution of basic science to the ability of U.S. firms to predominate in markets for technology and information goods. Despite a general consensus on the need for sustained levels of investment in research and development, the proposed database laws could change the status quo—without anyone's wanting it to happen—by elevating the price of the one raw material to which U.S. researchers have always had ready access. If less available scientific information were to translate to fewer applications of economic importance, the end result would be a loss of U.S. technological competitiveness in an integrated world market. It is therefore essential to retain a "fair use" zone in cyberspace and in other media to protect the strong public interest in ensuring that certain uses and certain users, including the scientific and educational communities, are neither priced out of the market nor forced to cut back the basic research that has played a crucial role as a public good in the economic and technological growth of the United States. The pending legislative proposals, which the committee considers to be precipitous and radical attempts to alter the terms and conditions under which scientific data may be accessed and used on a worldwide basis, have the potential to do severe damage to the scientific enterprise. The scientific commu-

OCR for page 1
--> nity and its defenders must step in quickly to insist on further, open debate before these changes reach implementation. RECOMMENDATIONS General Guideline Based on its deliberations and understanding of the issues involved, the committee believes that the following overarching principle should guide all policy decisions concerning the management and international exchange of scientific data in the natural sciences: The value of data lies in their use. Full and open access to scientific data should be adopted as the international norm for the exchange of scientific data derived from publicly funded research. The public-good interests in the full and open access to and use of scientific data need to be balanced against legitimate concerns for the protection of national security, individual privacy, and intellectual property. Recommendations on Data Issues in the Natural Sciences Governmental science agencies and intergovernmental organizations should adopt as a fundamental operating principle the full and open exchange of scientific data. By "full and open exchange" the committee means that the data and information derived from publicly funded research are made available with as few restrictions as possible, on a nondiscriminatory basis, for no more than the cost of reproduction and distribution. The International Council of Scientific Unions (ICSU), together with the scientific Specialized Agencies of the United Nations, the Organization for Economic Co-operation and Development Megascience Forum, and the national science agencies and professional societies of member countries, should consider developing a distributed international network of data centers. Such a network should draw on the strengths of successful examples of international data exchange activities as described in Appendix C of this report, including, in particular, the ICSU World Data Centers, and become a prominent part of the global information infrastructure that has been proposed by the "Group of Seven" nations. To facilitate the international dissemination and interdisciplinary use of scientific data, all public scientific data activities, including the network of data centers, should plan for and commit to providing human and financial resources sufficient for carrying out the following functions: Involve experts from the relevant disciplines, together with information resource managers and technical specialists, in the active management and preservation of the data;

OCR for page 1
--> Develop and maintain up-to-date, comprehensive, on-line directories of data sources and protocols for access; Provide documentation (metadata) adequate to ensure that each data set can be properly used and understood, with special attention given to making the data usable by individuals outside the core discipline area. This problem is particularly acute within the biological sciences, in which imprecision and variations in taxonomic definitions and nomenclature pose significant barriers to communication, even among the biological subdisciplines. The committee suggests that the CODATA Commission on Standardized Terminology for Access to Biological Data Banks be enhanced into a true international consultative body and that similar mechanisms be developed for other disciplines, as needed; Incorporate advances in technology to facilitate access to and use of scientific data, while overcoming incompatibilities in formats, media, and other technical attributes through vigorous coordination and standardization efforts; Institute effective programs of quality control and peer review of data sets; and Digitize all key historical data sets and ensure that every important condition for the long-term retention of data be met, including the adoption of appropriate retention and purging criteria and the timely transfer of all data sets to new media to prevent their deterioration or obsolescence. The ICSU and other professional scientific societies should encourage the study of, and publication of peer-reviewed papers on, effective data management and preservation practices, as well as promote the teaching of those practices in all institutions of higher learning. All scientists conducting publicly funded research should make their data available immediately, or following a reasonable period of time for proprietary use. The maximum length of any proprietary period should be expressly established by the particular scientific communities, and compliance should be monitored subsequently by the funding agency. As a corollary to recommendation 2.a above, publicly funded scientific databases should be maintained either directly or under subcontract by the government science agencies with the requisite discipline mission and need. In the United States, the Office of Science and Technology Policy should develop an overall policy for the long-term retention of scientific data, including a contingency plan for protecting those data that may become threatened with the loss of their institutional home.6

OCR for page 1
--> With regard to improving access to scientific data in developing countries, the committee makes the following recommendations: International development organizations, together with professional societies, should provide targeted training programs for scientists in the use of computers, with emphasis on the management of digital data in specific disciplines. Foreign aid agencies should (i) make available to individual scientists in developing countries more direct, peer-reviewed grants that include support for access to data, and (ii) facilitate the involvement of scientists in such nations in their own countries' capacity-building initiatives, research policy decisions, and national database construction efforts. Scientists in developing countries should be encouraged to organize to promote the policy of full and open access to scientific data in their own countries, as well as to make their data available internationally. The ICSU, together with funding agencies and nongovernmental bodies, should strengthen its efforts to assist developing countries in undertaking their own scientific studies and encourage scientists engaged in such studies to take active roles in the international scientific community, where their efforts can be appreciated and used. Legal and procedural protocols must be developed to provide for fair and equitable sharing of any resulting intellectual property. Until affordable and ubiquitous electronic network services are available, national and international scientific societies and foreign aid agencies should establish or improve their existing efforts to send extra stocks of scientific publications to libraries and research institutions in developing countries that need them. Finally, the ICSU, together with the principal national and international scientific organizations mentioned in Recommendation 2 above, should convene a series of major international meetings to initiate meaningful action on these recommendations. Recommendations on Issues in Information Technology The principal scientific societies and the Internet Engineering Task Force (IETF) should begin a long-term planning effort to assess the carrying capacity and distribution capability of the Internet, using projections of storage and transmission capacity and of demand and taking into account the next generation of Internet protocols. Scientific societies should encourage their publication committees to maintain contact with the IETF and keep their members abreast of advances in technologies useful for scientific information management. One option that scientific societies and government science agencies should evaluate

OCR for page 1
--> is the creation of dedicated international science networks, such as the Internet II now being developed. To improve the technical organization and management of scientific data, the scientific community, through the government science agencies, professional societies, and the actions of individual scientists, should do the following: Work with the information and computer science communities to increase their involvement in scientific information management; Support computer science research in database technology, particularly to strengthen standards for self-describing data representations, efficient storage of large data sets, and integration of standards for configuration management; Improve science education and the reward system in the area of scientific data management; and Encourage the funding of data compilation and evaluation projects, and of data rescue efforts for important data sets in transient or obsolete forms, especially by scientists in developing countries. U.S. government science agencies, working with their counterparts in other nations, should improve data authentication and apply security safeguards more vigorously. They also should continue funding for research and development in information technologies that are important to the pursuit of science. A consortium of intergovernmental and nongovernmental organizations, including the International Telecommunications Union, the World Bank, the Specialized Agencies of the United Nations, the International Council of Scientific Unions, and other concerned bodies, should mount a global effort to reduce telecommunications tariffs to scientists in developing countries through differential pricing or direct subsidy. Foreign aid to developing countries in the form of computers, computer networks, and associated software, coupled with the training and resources necessary to operate and maintain those technologies, should be given high priority, on the basis of the potential for long-term socioeconomic returns. The communication systems must have adequate carrying capacity to meet growing demand. Recommendations Regarding Economic Aspects of Scientific Data The committee recommends that the economic aspects of facilities for storage and distribution of scientific data generated by publicly funded research be evaluated according to the following criteria:

OCR for page 1
--> Does the scientific research depend on a substantial public investment in one or more facilities that generate the data of interest? If so, the data distribution facilities are most likely to benefit by being vertically integrated with the observational or experimental facilities themselves. Does the (non-facilities-based) distributed scientific research involve coordination among researchers, possibly in different countries ? If so, then data distribution becomes a means of communication among contributing scientists, and for this community, the price of the data alone should be zero. If the distributor subsequently adds value to the data, then the price should be no higher than the marginal cost of adding value.7 Is the community of users roughly the same as the community of contributors? If so, then data distribution should be priced at zero (or at marginal cost, if value is added). If there are many users who are not contributors, such as commercial customers, then some form of price discrimination to ensure zero or low prices to contributing scientific users, with possibly higher prices to others, may be appropriate. Is the user community large enough to support more than one data distributor? If so, then privatization of data distribution may be a viable policy option. If not, then privatization should occur only if the contractual arrangements are adequately protective of the needs of the scientific community. Necessary—but not necessarily sufficient-conditions for privatization to be desirable are as follows: The distribution of data can be separated easily from their generation. The scientific data set is used by others beyond the research community. It is easy to price discriminate/product differentiate between scientific users and other users, and it is easy for the government to contractually mandate low prices to scientific users for government-funded data. Privatization will not result in the unrestricted monopoly provision of the data. The appropriate price ceiling for nonscientific users of scientific data generated through government research is incremental cost, as defined in the section titled ''Pricing Publicly Funded Scientific Data" in Chapter 4. The price of scientific data to the contributing scientific community should be zero, or at most marginal cost. Recommendations Regarding Legal Developments Affecting Access to Data The new proposals supporting an overly protectionist property rights regime for the contents of databases and for on-line transmissions of data and other scientific information have reached an advanced stage of legislative consider-

OCR for page 1
--> ation at both the national and the international levels. The committee believes that these legislative changes do not reflect adequate consideration of the potential negative impacts on scientific research and education and that they have been proposed for implementation at an unnecessarily precipitous pace. The committee therefore recommends that the Office of Science and Technology Policy, leaders from the science agencies and professional societies, and all those concerned with sustaining the health of the scientific enterprise should immediately take the following actions: Present to all relevant legislative forums the principle of full and open exchange of scientific data resulting from publicly funded research, and clarify the importance of sustaining such exchange to the nation's future whenever these forums consider laws that would apply to exchange of scientific data. Demand that national and international legislative processes now in progress slow to a rational pace, and that the deliberations become more public to allow the scientific and educational communities to present their views and concerns to lawmakers. Advocate the incorporation of equivalents of "fair use" as part of any regulatory structure applying to databases as such, or to on-line storage and transmission of data and other scientific information. As a corollary, ensure that the public-good aspects of scientific data are preserved and promoted in laws and regulations governing intellectual property on the Internet and in any future electronic networked environments. Work with Congress and the official U.S. representatives to the World Trade Organization and the World Intellectual Property Organization to ensure that the nation's interests in maintaining preeminence in science and technology are not undermined. Pursue these issues not only within the United States, but also internationally through international scientific organizations and U.S. foreign-policy channels as they deal with trade and other agreements affecting intellectual property protection. NOTES 1.   By "full and open exchange" the committee means that data and information derived from publicly funded research are made available with as few restrictions as possible, on a nondiscriminatory basis, for no more than the cost of reproduction and distribution. This definition is adapted from a basic tenet regarding availability of scientific data in global change research. See "Policy Statements on Data Management for Global Change Research" (July 1991), Office

OCR for page 1
-->     of Science and Technology Policy, DOE/EP-0001P, Washington, D.C., and National Research Council, Committee on Geophysical and Environmental Data (1995), On the Full and Open Exchange of Scientific Data, National Academy Press, Washington, D.C., p. 2. 2.   Throughout this report, the term "scientific data" refers to data in the natural sciences. 3.   The Landsat privatization effort, described in Chapter 4, is one example of unrestricted monopolistic data distribution under which the scientific community suffered loss of access. Nevertheless there may be situations in which the scientific community would benefit if a body of data were distributed either by a competitive set of private firms or by a single adequately constrained private source. 4.   Before the electronic era, copyright evolved as a protection for authors and their assignees; under copyright, a document could be reproduced only with the approval of the copyright holder, under whatever terms that person chose. Copying machines made possible, even easy, violations of this protection. A doctrine of "fair use" then evolved to allow very limited copying by scholarly, educational, scientific, and other not-for-profit users, but not by any who would make commercial use of the copies. The fair use doctrine has become a principal protection of the right of the public-and thus of the scientific community-to have ready, low-cost access to copyrighted material; its economic and cultural justification rests on the nature of information as a public good that benefits users. 5.   Feist Publications, Inc. v. Rural Telephone Service Co., 111 S. Ct. 1282 (1991). 6.   See the recommendations in National Research Council (1995), Preserving Scientific Data on Our Physical Universe: A New Strategy for Archiving the Nation's Scientific Information Resources, National Academy Press, Washington, D.C. 7.   By "adding value" in this case is meant any transformation of the data beyond that necessary for scientific research that increases the value of the information for some or all potential users of the data.