These questions are important because of the nature of scientific understanding—it depends upon accumulated knowledge—and of the ways in which scientific understanding and technological knowledge have become woven into the fabric of our society and civilization.

What Data Should Be Preserved?

The areas of science being considered by this panel—physics, chemistry, and materials sciences—are laboratory physical sciences. Data from physics, chemistry, and materials sciences differ from data in fields being considered by other panels in this study because much of the data stem from experiments that, in principle, could be reproduced. However, closer examination reveals that it is simply not feasible to reproduce many data sets because the samples, apparatus, or expertise that led to them cannot be reproduced at an acceptable cost. Furthermore, some types of data cannot be reproduced at any cost. This leads immediately to the major criterion to be used in this report: if the data will be important to science in the future,

the primary criterion for determining whether a laboratory science data set is a candidate for long-term preservation is whether or not it is feasible to reproduce it.

The discussion and examples in this panel report will illustrate the factors that make data sets impossible, or impractical, to reproduce.

Who Should Save the Data?

Data should be saved by organizations and in formats so that they are maximally available to the primary user—the scientific and technical community. In most cases, this means primary responsibility will continue to be held by technical libraries, government agencies, and professional societies that currently archive and make accessible scientific and technical data, records, and publications. The system for generating, preserving, disseminating, and accessing scientific information is evolving—some would say too rapidly, others, not rapidly enough—as data storage, scientific communications, and data acquisition are revolutionized by electronics and computers. But the system is not broken. There is little evidence that the organizations that have been meeting the needs of the scientific community will be unable to do so in the near future. Thus, we see primary responsibility for preserving electronically stored scientific data remaining with those currently preserving and supplying scientific information. There is, however, an enhanced role for NARA to play: in preserving electronically stored scientific information for access outside the scientific-technical community; in helping facilitate access to electronically stored scientific records by providing (or cooperating in the provision of) locator information; and by being ready to step in and preserve records when they might otherwise be lost.

Contents of This Report

Following this introduction, the panel's report begins, in Section 2, by considering some of the characteristics of scientific data generated and used in physics, chemistry, and materials sciences, particularly characteristics that distinguish data in these fields from data in fields covered by other panels in this study. Section 3 describes general issues and requirements critical to the subsequent use of data from physics, chemistry, and materials sciences. Section 4 describes six example databases. The examples illustrate data that are not feasible to reproduce because they are from a no-longer reproducible “mega-experiment,” represent tremendous accumulated effort, or are measurements on unique samples. Section 5 summarizes criteria to be used in deciding what data are worth preserving for a very long time. The criteria are modeled upon the processes that have been developed and used over the past 300 years by the scientific community in preserving printed records. Section 6 turns to the panel 's suggestions of how NARA might add to the nation's ability to preserve and utilize irreplaceable scientific data. The panel's conclusions and recommendations are summarized in section 7. The documents consulted by the panel are listed in the bibliography at the end of this report.

2 CHARACTERISTICS OF DATA IN PHYSICS, CHEMISTRY, AND MATERIALS SCIENCES

The primary purpose of data generated or stored by scientists and engineers in physics, chemistry, and materials sciences was, and in the great majority of cases will continue to be, to provide specific information to other



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement