Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
1 Introduction Standing at the intersection of past and future, we humans are fascinated with the events of yesteryear and intrigued with what tomorrow will bring. Our prehistoric ancestors began the process of recording aspects of the environment that were important to them (Marshack, 1985; Boorstin, 1992~. Today we are curious about many more worlds, ranging from those of atomic size to those of cosmic scale. With instruments on Earth and in space, we seek to capture views of reality that will help us understand nature and our relationship to it. Scientific data reflect both the organization and the chaos of the natural world. They stimulate us to develop concepts, theories, and models to make sense of the patterns they represent. The resulting abstractions are the product of scientific endeavor, the goal being to develop the formal and systematic ideas that constitute the understanding of relationships between causes and consequences and perhaps may enable prediction of future sequences of events. Because scientists transform data from the material world into ideas, the observations of objects and processes in the physical world are the stimuli of scientific thought. Data are thus the seeds of scientific ideas. Science generally works by proceeding from data to understanding through a process of organizing the data and analyzing their implications. The following definitions, adapted from Setting Priorities for Space Research: Opportunities and Imperatives (NRC, 1992a), indicate how the process works: · Data are numerical quantities or other factual attributes derived from observation, experiment, or calculation. · Information is a collection of data and associated explanations, interpretations, or other textual material concerning a particular object, event, or process. · Knowledge is information organized, synthesized, or summarized to enhance comprehension, awareness, or understanding. · Undlerstanding is the possession of a clear and complete idea of the nature, significance, or explanation of something; it is the power to render experience intelligible by ordering particulars under broad concepts. This process is cyclical. New data confirm or refute existing theories and stimulate new understand- ing, which generates new and deeper questions that often need entirely new sets of observations to begin the process of answering them. New understanding also leads to increased technological capability, and 10
Introduction 11 that in turn makes new observations possible and again allows us to contemplate more sophisticated questions. Thus observations and scientific progress are intertwined; data from the physical world ensure that science is founded on reality as we try to answer the unending "how" and "why" questions that are part of being human. The answers become understanding that enables us to develop schemes for predicting or not being surprised by future events. And understanding, we hope, ultimately leads to wisdom about our interactions with the world around us. IMPERATIVES FOR PRESERVING DATA ON OUR PHYSICAL UNIVERSE The scientific reasons for preserving data derive from the fact that observations, knowledge, and understanding are cumulative. Thus we believe that the more complete the record, the more we can extract from it. Many observations about the natural world are a record of events that will never be repeated exactly. Examples include observations of an atmospheric storm, a deep ocean current, a volcanic eruption, and the energy emitted by a supernova. Once lost, such records can never be replaced. Observed data provide a baseline for determining rates of change and for computing the frequency of occurrence of unusual events. The longer the record, the greater our confidence in the conclusions we draw from it. Our traditional observational records have portrayed frozen instants of reality. If preserved, they will continue to provide insights, but if neglected, they will melt away. A data record is also worth preserving because it may have more than one life. As scientific ideas advance, new concepts emerge in the same or entirely different disciplines-from study of observa- tions that led earlier to different kinds of insights. New computing technologies for storing and analyzing data enhance the possibilities for finding or verifying new perspectives through reanalysis of existing data records. Thus, the relative importance of data, both current and historical, can change dramatically, often in entirely unanticipated directions. This means that the reanalysis of data, even in the distant future, may bring new understanding, which will again increase the value of those data over that which we might have assigned to them at the time of their archiving. Finally, the substantial investments made to acquire data records usually justify their preservation. The cost of preservation will almost always be small in comparison with the cost of observation. Because we cannot predict which data will yield the most scientific benefit in years ahead, the data we discard today may be the data that would have been invaluable tomorrow. The assembled record of observational data thus has dual value: it is simultaneously a history of events in the natural world and a record of human accomplishment. The history of the physical world is an essential part of our accumulating knowledge, and the underlying data form a significant part of that heritage. They also portray a history of our scientific and technological development. With appropriate explanatory documentation, often referred to as metadata, the data demonstrate the increasing sophistication of our attempts to understand our natural surroundings and the technological capabilities we apply to the task. Preserved for study by future generations, the data will speak across the years about what we tried to do, where we succeeded, and where we failed. With increasing capabilities for analyzing and conceptualizing patterns in data, those who follow may find in our archived data important clues that we could not or did not see. At the same time, our descendants will be grateful that we preserved a sufficiently long history of their world that they can make important decisions about their own future. There are numerous socioeconomic reasons, in addition to the compelling scientific and historical motivations, for the long-term retention of observational, as well as certain types of experimental, data. For example, historical climate data have had well-documented uses in a broad range of applications in manufacturing, energy, agriculture, transportation, communications, engineering, construction, insur- ance, and entertainment (OTA, 1994~. Such applications are common for other types of observational
12 Preserving Scientific Data on Our Physical Universe data on the Earth's environment. Experimental data in the physical sciences also have many industrial and other practical uses. Additional examples of the long-term uses of the various physical science data are provided in the next chapter. A NEW FUTURE FOR SCIENTIFIC DATA The collections of scientific data acquired with government and private support are the foundation for our understanding of the physical world and for our capabilities to predict changes in that world. In the years ahead, the volumes of those collections of data will increase dramatically. They will stimulate advances in our scientific understanding and in our applications of that understanding to pursue important national goals. The scientific data in federal, state, and private databases thus constitute a critical national resource, one whose value increases as the data become more readily and broadly available. Today, we can foresee the possibility of using the national resource of scientific data more advanta- geously than ever before, as technological advances open new vistas for managing and accessing scientific information. Growing computational power enables new approaches to the analysis, manage- ment, and application of data. Advances in data storage technologies make the Tong-term retention of virtually all data both feasible and affordable. The existence of the Internet and of the emerging National Information Infrastructure (NTI) enable unprecedented nationwide sharing and application of data that reside in appropriately configured databases. Automatic search procedures, file transfer capabilities, and the accelerating use of the World Wide Web functions on the Internet illustrate the power of the contemporary technology. It is important to note that these enabling technologies have emerged in a short time span; equally rapid advances can be anticipated in the years ahead, which will further facilitate the search for and access to the nation's data resources. Our new power to store and distribute data and information is changing the way we work and think. However, the communities involved in the creation, retention, and use of scientific data about the physical world are not optimally organized. They commonly work toward disparate goals, are not well connected, and do not take full advantage of technological and conceptual advances in data management and communication. An entirely new approach to the long-term preservation of scientific data is now both feasible and essential. It must take advantage of advancing technology and of distributed communi- cations and management structures to empower both the creators and the users of such data. This study identifies the major issues regarding existing efforts to archive and use data in the physical sciences, establishes retention criteria and appraisal guidelines for those data, reviews important techno- Togical advances and related opportunities, and proposes a new strategy to ensure access to the data by future generations.