There is a need for a more general locator function, or “directory of directories.” Some sort of broad locator is needed even though it is not clear that NARA is the most appropriate government agency to perform this function. The National Technical Information Service (NTIS) in the Department of Commerce may be a more appropriate candidate. During the course of this study, NTIS expanded its FedWorld bulletin board gateway into such a locator (Jones, 1993). In addition, the Office of Management and Budget has organized an interagency effort to develop a Government Information Locator Service.
Nevertheless, there is a need for a NARA-maintained directory of archival data within its own system. This should include archival records maintained by other government agencies that are recognized as part of a distributed archival system overseen broadly by NARA.
The preceding sections of this report have emphasized two needs: (1) scientific data should be preserved if they will be of value to future scientists and cannot practically be reproduced; and (2) the single major criterion in how they are preserved is that they must be usable.
Other aspects to consider are the possible broadening of the definition of data to include different data types (graphics, images, multi-media, audio), formats (word processing languages, optical disks, and others that violate one or more of the NARA records restrictions), and interlinkages (such as interactions between environmental, physical, and chemical data and environmental modeling, or the way that chemical thermodynamic data can have very widespread ramifications).
Turning to some of the more technical, as well as policy, aspects of an ideal archiving system, the panel concurs with, and endorses, the recommendations made by the National Academy of Public Administration in their 1991 report (NAPA, 1991). That report contains 13 detailed recommendations for NARA in its executive summary, 11 of which are pertinent to scientific data and are reproduced in Appendix B of this volume of panel reports.
The overall goal of an ideal archiving system is to “protect the interests of the next generation of researchers” (Thibodeau, 1993). In order to help protect those interests, the panel suggests that NARA improve its capabilities to be the archivist of last resort for scientific and engineering data.
What data should be preserved for the “long term”? The primary criterion for determining whether a laboratory science data set is a candidate for long-term preservation is whether or not it is feasible to reproduce it.
Federal data sets in the laboratory sciences that are candidates for long-term preservation can be classified into three generic types:
Massive records and data from an original experiment, particularly a “mega-experiment,” that there is no realistic chance of replicating, even though it is, in principle, reproducible.
Critically evaluated compilations of data from a large number of original sources that represent tremendous accumulated effort.
Unique, perhaps time- and environment-dependent, engineering data collected at federal facilities or as part of a government project (that may or may not ever be completed), much of which never reaches the published literature.
In summary, data can have long-term retention value either because of the difficulty of reproducing them (e.g., nuclear test data, materials property data) or from the effort put into collecting and processing them (e.g., extensive critical compilations). Archivists should be able to apply these criteria of retention value with a modest amount of outside input.
When determining whether scientific data should be preserved, the panel suggests a few simple questions (analogous to those addressed before a scientific paper is published in an archival scientific journal):
Do the originators or current holders of the data think the data are of sufficient long-term value to be preserved as part of the archived scientific record?
Have they demonstrated this by annotating the data and by providing a written description sufficient for others to use the data set; i.e., have they made the effort to provide the necessary metadata? This step is analogous to preparing research results for publication.