from the effort put into collecting or processing them (e.g., extensive critical compilations). Archivists should be able to apply these criteria of retention value with a modest amount of outside input.
When determining whether scientific data should be preserved, the panel suggests a few simple questions (analogous to those addressed before a scientific paper is published in an archival scientific journal):
Do the originators or current holders of the data think the data are of sufficient long-term value to be preserved as part of the archived scientific record?
Have they demonstrated this by annotating the data and by providing a written description sufficient for others to use the data set; i.e., have they made the effort to provide the necessary metadata? As noted earlier, this step is analogous to preparing research results for publication.
Have the data and metadata been certified by peer review or the appropriate equivalent? This review would attest to both the quality and value of the data set.
If the data are not preserved, but are needed in the future, could they be reproduced at reasonable cost?
If the organization considering preservation does not have the in-house expertise to answer these questions, it can easily request recommendations from outside referees as is done in refereeing technical articles and research proposals.
The above questions are modeled purposely upon those that have proven useful to the scientific community in preserving printed records over the past 300 years. The panel believes that a change in medium requires adapting and modifying the methods that have worked for science, not throwing them out and starting anew. Most of the data sets that contain accumulated knowledge about natural phenomena and measured properties will continue to be useful to and maintained by the scientific community for a long time. Their value will diminish only if and when theory advances sufficiently that such items as material properties can be computed rapidly, as needed, from first principles. As already noted, the scientific community's mechanisms for maintaining, updating, and disseminating such data are, on the whole, quite satisfactory. However, two areas that are of particular importance for electronically stored data sets and that are worthy of considerable attention are (1) establishment of an effective locator system so a potential user can determine in a short time whether searched-for data exist, where they are, and how to access them; and (2) an ability to identify and preserve data sets (meeting the above criteria for preservation) that might otherwise be lost because the responsible institution is no longer able or willing to maintain them.
As discussed in Section 1, the panel's answer to the question of who should save scientific data is that these data should, whenever feasible, be saved by those institutions best equipped to make them accessible to the primary users of the data, the scientific community. In most cases, this means primary responsibility will continue to be held by technical libraries, government agencies, and professional societies that currently archive and make accessible scientific and technical data, records, and publications.
In addition, in response to the third question raised in Section 1, the panel suggests that there is a helpful, enhanced role for NARA to play in the preservation of data from physics, chemistry, and materials sciences. NARA might provide:
A repository of last resort;
A focus for interagency cooperation and communication;
A locator system;
Collaborative standards; and
Education and assistance with preservation and archiving.
To meet these needs it is necessary that NARA modify some of its practices regarding:
The definition of archivable (and perhaps of “secondary user”) for scientific data;
Data formats and storage media;
Customer orientation as opposed to rule promulgation; and
Distributed versus centralized record holdings.