notched uniaxial specimens, including notch-yield ratio; results of plane-strain fracture toughness tests including degree to which all 13 criteria of validity were satisfied. Detailed data are quite complete for individual tests, and unique in that degree of validity of each individual test result is clearly shown, in accordance with ASTM Test Method E-399. There are about 30,000 test data sets, each representing about 50 individual pieces of information. All data are raw test results.

The data are used to judge average and statistical distribution of toughness of aluminum alloys for critical applications, and to estimate the relationships among different test indices and their relative value for assessing fracture resistance.

Panel comments Because the cost-effectiveness of making such data commercially available is marginal, this massive amount of information representing many commercial and military structures may someday be lost unless it is archived by the government.

These examples and the earlier discussion illustrate a number of aspects of the burgeoning data of physics, chemistry, and materials sciences stored in electronic form. The data are very diverse in type and content; scientific data are not only sets of numbers. Considered as data in the information science sense, most of the data sets are small. Finally, there are many ways in which valuable, or useful, data in the reproducible laboratory sciences can be nonreproducible, and, if not preserved, irretrievable.

NARA's Laboratory Data Holdings

While picking illustrative examples and considering the relation of the above examples to the National Archives and Records Administration, the panel asked about the physical sciences data in NARA's Center for Electronic Records. Of the thousands of holdings in the Center, very few are related to the physical sciences and apparently none contain scientific data of the kind represented by the above examples. NARA's policy has been to acquire laboratory data only if they are of high historical value, under the presumption that the data would be repeatable. The decision whether or not to archive has not been influenced greatly by how often the data are or will be accessed. NARA's current electronic holdings of laboratory science records are limited to such items as a National Register of Scientific and Technical Personnel; a 1971 survey of scientists and engineers; and records of the investigation of the “Challenger” accident (NARA, 1992b). Paper holdings are somewhat more extensive and include such items as historical records from the Lawrence Berkeley Laboratory, including scientific notes from Nobelists McMillan, Alvarez, and Calvin and from some major high energy physics experiments; computations, calibrations, and comparisons on weights and measures from the National Bureau of Standards; reactor and reactor safety records from Argonne National Laboratory; and environmental contamination and toxic substance exposure records from the Atomic Energy Commission's Idaho Operations Office. The holdings illustrate that the agency is not yet playing a significant role as a repository for scientific data, especially not of electronically recorded data.

A study performed in 1991 for NARA by the National Academy of Public Administration (NAPA) examined a large number of databases for possible consideration for accessioning by NARA (NAPA, 1991). When the panel went over the approximately 300 databases listed in Appendix C of the NAPA report, it found six titles that might be similar to the kind being considered here: USNRC Nuclear Materials Management and Safeguard System (item 183); DOD Naval Environmental Protection Support Service Data Files (item 289); EPA Environmental Monitoring and Assessment Program (item 369); NSF Academic Research Equipment in Selected Science/Engineering (item 385); USNRC Reactor Safety Data Bank (item 185); and DOE Test Data Database (item 935).

The fact that NARA does not currently play a role in the preservation of scientific data sets does not mean it has none. However, to take on this responsibility, NARA will have to operate somewhat differently than it has in the areas it has served well in the past. The remainder of the panel's report deals with the issues of what to save, who should save it, the challenges and opportunities that arise because of involvement by multiple institutions, and how NARA might modify its methods if it is to serve future generations most effectively as a preserver of scientific data. Just as the traditional means of transmitting and storing scientific information—technical journals and libraries—will have to change to adapt to new mechanisms of doing science and of scientific information transfer, so will the approaches to archiving and mechanisms of long-term preservation.

5 DATA RETENTION AND RECORD PRESERVATION CRITERIA

What data are worth keeping for a long time? Data can have long-term retention value either because of the difficulty of reproducing them (e.g., nuclear test data, data from obsolete accelerators, materials property data) or



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement