Basic scientific research fuels most of our nation's—and the world's—progress in science.1 Society uses the fruits of such research to expand the world's base of knowledge and applies that knowledge in myriad ways to create new wealth and to enhance the public welfare (see Box 1.1). Yet few people understand how scientific advances have made possible the ongoing improvements that are basic to the daily lives of everyone. Fewer still are aware of what it takes to achieve advances in science, or know that the scientific enterprise is becoming increasingly international in character.
Freedom of inquiry, the full and open availability of scientific data on an international basis, and the open publication of results are cornerstones of basic research that U.S. law and tradition have long upheld. For many decades, the United States has been a leader in the collection and dissemination of scientific data, and in the discovery and creation of new knowledge. By sharing and exchanging data with the international community and by openly publishing the results of research, all countries, including the United States, have benefited. In this century's dramatic growth of scientific knowledge—an expansion motivated by a combination of forces including military, commercial, public benefit (especially health), and purely intellectual—a necessary component has been the wide availability of scientific information, ranging from minimally processed data to cutting-edge research articles in newly developing fields. This information has been assembled as a matter of public responsibility by the individuals and institutions of the scientific community, largely with the support of public funding.
Data are the building blocks of scientific knowledge and the seeds of discovery. Activities in the recording, analysis, and dissemination of data are motivated
Box 1.1 Examples of Benefits Derived from Scientific Research
New scientific understanding and its applications are yielding benefits such as the following:
today by the same forces that have impelled humans for thousands of years: curiosity to understand the natural world; desire to pass that understanding to succeeding generations; self-aggrandizement; and personal or national power.2 Data challenge us to develop new concepts, theories, and models to make sense of the patterns we see in them. They provide the quantitative basis for testing and confirming theories and for translating new knowledge into useful applications for the benefit of society. The assembled record of scientific data is both a history of events in the natural world and a record of human accomplishment.3 The international availability of these scientific data for fundamental research on a full and open basis and issues associated with ensuring global access are the primary concerns of this report.
Technological advances in recent years have led to an exponential increase in the amount of data collected, stored, and transmitted. New, ever more sophisticated sensors record observations on objects ranging from the smallest particles of matter to the largest objects in our known universe. It is now commonplace to control such large instruments as telescopes remotely, during the observation of an event, from a point hundreds or thousands of miles from the instrument. Satellites in orbit around Earth provide us with electrooptical observations, collecting billions of bits of data about our planet on a daily basis. Powerful machines unravel genomes to reveal the genetic code of life and help us decipher the secrets of heredity. In addition, rapid advances in computing, data processing and storage, and, most recently, in global telecommunications have given us the power to communicate and share the information produced by these remarkable observational and experimental tools, almost as quickly as it is generated. The
exponential accumulation of these electronic data—these bits of power—and our expanding capacity to manipulate them are in turn changing the nature of scientific inquiry and its application to the great challenges facing mankind.
As in the past, generating data in the natural sciences is only the first step in the process of creating, organizing, and applying knowledge. Other elements of this endeavor include discovery of new principles, integration of information across disciplines, dissemination by formal and informal education, and application by many sectors of society. Today, however, larger interdisciplinary research efforts such as the International Geosphere-Biosphere Programme,4 the Human Genome Project,5 and other international "megascience" research programs6 are creating new frameworks of knowledge not only about the universe and what constitutes it, but also about living organisms, human behavior, and their mutual interaction. In addition, traditional disciplinary research continues in field studies, the laboratories of individual scientists, and at large joint facilities.
Increasingly, all forms of research involve both formal and informal international scientist-to-scientist contact and exchanges of data. This increase in international collaboration is owing partly to changing political and economic conditions and also to the growing availability of electronic communication. Whether carried out on a large scale under cooperative agreements or less formally among individual researchers, these collaborations have become integral to the search for scientific understanding. Their success—as well as progress in achieving the public benefits of science—depends on the full and open availability of scientific data.
CHARGE AND SCOPE OF STUDY
The purpose of this report is to describe and develop new insights into the trends, issues, and problems that are shaping the transnational exchange of scientific data. Specifically, the Committee on Issues in the Transborder Flow of Scientific Data was charged with the following tasks:
- Outline the needs for access to data in the major research areas of current scientific interest that fall within the scope of CODATA—the physical, astronomical, geological, and biological sciences.
- Characterize the legal, economic, policy, and technical factors and trends that have an influence—whether favorable or negative—on access to data by the scientific community.
- Identify and analyze the barriers to international access to scientific data that may be expected to have the most adverse impact in discipline areas within CODATA's purview, with emphasis on factors common to all the disciplines.
- Recommend to the sponsors of the study approaches that could help overcome barriers to access in the international context.
Perhaps the most obvious aspect of this charge is its wide scope. The broad nature of the committee's inquiry precluded a comprehensive analysis of all the issues and trends in all the disciplines and across all geographic areas. Moreover, many activities beyond the sphere of science impinge on the transnational exchange of scientific data, a fact that required the committee to establish practical limits on its treatment of these topics.
This report focuses primarily on issues pertaining to scientists' effective access to data in numerical, symbolic, and image form, rather than bibliographic or purely textual data, for research in the natural sciences. However, the committee is acutely aware that distinctions among these categories of data are fading. Most of the discussion concerns digital rather than analog data, since practically all scientific data are now collected and stored digitally and most older data are being transferred to digitized electronic formats.
With regard to the needs for data in the physical, astronomical, geological, and biological sciences, the report incorporates by reference the more detailed and thorough analyses of research strategies produced in recent years by the National Research Council for the various natural sciences. The importance of data for fundamental research across these disciplines is described in a summary overview at the beginning of Chapter 3 and is highlighted in various examples throughout the report.
Because the sponsors of the study are U.S. federal government science agencies, the committee has emphasized trends, issues, and barriers that have an impact on international access to data collected and used in the context of publicly funded, basic research programs. Despite this emphasis, the committee took into account the continua between fundamental and applied research, between raw data and processed information, and between public and private uses of scientific data. Indeed, the most vexing public policy issues facing the international scientific community involve defining the appropriate balance of competing interests.
In addressing the international aspects of data exchange, the committee conducted a widely disseminated informal inquiry to develop at least an anecdotal sense of what data issues trouble the international scientific community today.7 The issues were broadly divided into those affecting the economically most developed nations, defined as the countries belonging to Organization for Economic Cooperation and Development, and those confronting the developing countries. The committee recognizes, however, that the developing countries encompass a wide spectrum of economic and technical capacities; illustrative examples of major issues are provided with reference to specific regions, countries, or institutions.
Finally, in its deliberations the committee discovered certain matters that were central to the subject but were not explicitly included in its charge. Although expressly requested to provide its advice to the agencies that supported this study, the committee became aware that many of the issues and barriers pertinent to global access to scientific data could only be addressed collectively
by the world's international scientific community in concert with a broad range of national and international governmental and nongovernmental bodies. Therefore the committee considered it necessary to make several recommendations of broader scope in keeping with these concerns.
UNDERLYING ASSUMPTIONS AND CONCERNS
Several assumptions underlie the committee's work. The first is that international collaboration enhances scientists' capacity to better understand the natural world and thus strengthens the science base that is a source of important benefits to society. From this assumption, the rest follow.
Science is one of the most internationally cooperative of activities. Today, the improving means and ease of communication and travel, as well as their decreasing costs, have made transnational interactions a normal, daily part of carrying out scientific research. National boundaries are invisible in scientists' daily interactions, whether they are engaged in face-to-face discussion in a single laboratory or across great distances by electronic mail. Joint multinational authorship is common, and many funding institutions have encouraged such efforts. With the end of the Cold War, international collaboration can be expected to increase further.
The handling of scientific information—one of the results of such collaboration—has also changed dramatically. In fact, the distinction between "data" and "information" has itself become blurred. Data now include not only numerical data, but also symbolic data and images, and, for many scientists, textual data. Much of this convergence is the consequence of powerful electronic capabilities affecting the acquisition, storage, and exchange of scientific data. Primary data collected by a detector now frequently go directly into a computer for storage and processing before the person who generates the data ever sees them. In such an experiment, how are ''primary" data to be defined? Fortunately, the integration of electronic methods has occurred so naturally that this question is unimportant to the working scientist, who might ask instead, "Is this the best way to collect and analyze the data?"
The storage and exchange of scientific data have been more problematic than their collection. Even within the particular communities that generate and initially use data, their storage and dissemination traditionally have required attention and expense. Moreover, many scientific data have value outside the community of origin. Entire institutions have evolved to provide data services, among them public and private data centers. Electronic media have changed the means, costs, capacities, and time scales associated with the handling of scientific data. Some of these changes are already well established, some are evolving, and some are still only imagined. What is certain is that change will continue and that our management and use of scientific data in 10 years will differ from current practice.
Another of the committee's assumptions is a corollary of the first and reflects what the committee believes is virtually a consensus of the global scientific community: that the most valued goal of scientists is that other scientists should learn of their work and use it. The common interests of all scientists, of science, and indeed of society in general thus are best served by as full and open an exchange of scientific information as possible, consistent with the preservation of scientists' capacity to continue their investigations. This assumption can sometimes put scientists at odds with other sectors of society, as discussion and examples in this report illustrate. Because the scientific community is not the only sector with an interest in the handling of scientific data and information, scientists need to remain involved in the current policy debate that will affect the prospects for continuing open, global access to scientific data.
This study has been motivated by a concern for ensuring the continuing strength of the scientific enterprise as a source of international well-being and progress; hence the analysis and recommendations reflect that motivation. The extent to which the committee's recommendations are adopted may require balancing this motivation against the motivations of others, whose objectives are not necessarily the same.
The chapters that follow (a) describe the information technology tools and capabilities that are transforming the handling and use of scientific data, and some of the principal impacts on data exchange arising from these technological developments; (b) summarize the underlying factors in international scientific data exchange, how scientists use data, and what data issues confront them as they carry out their research; (c) examine the economic aspects of data obtained from publicly funded research; and (d) analyze the conflicts arising from information technology's impact on the domain of intellectual property law that regulates scientists' access to data. Technical terms and acronyms are defined, and examples of successful data exchange activities given, in the appendixes.
J.H. Westbrook, (1992), "A History of Data Recording, Analysis, and Dissemination," pp. 430460 in Data for Discovery: Proceedings of the Twelfth International CODATA Conference, P. Glaeser, ed., Begell House, New York.
National Research Council (1995), Preserving Scientific Data on Our Physical Universe: A New Strategy for Archiving the Nation 's Scientific Information Resources, National Academy Press, Washington, D.C.
See the International Geosphere-Biosphere Programme's World Wide Web site at <http://www.igbp.kva.se/index.html>. Note: In keeping with the subject and message of this report, the reader will find, in addition to references to texts and personal communications, many
references to sites on the World Wide Web. Most of these are uniform resources locators, or URLs; a few are uniform resources names, or URNs. Although the validity of all of these Web addresses was determined at the time of publication, the reader is cautioned that URLs sometimes change, and that one of the shortcomings of the current state of electronic communication is inadequate tracking capability to lead someone from an old to a new URL when the address changes. The replacement of URLs by URNs is a likely solution to this problem in the coming years, but it has not yet happened.
For additional information about the Human Genome Project, see the National Human Genome Research Institute's Web site at <http://www.nhgri.nih.gov/HGP>.
Organization for Economic Co-operation and Development (OECD) Megascience Forum (1993), Megascience and Its Background, OECD, Paris, France.