For many years, atmospheric scientists have voiced concern that they will not be able to obtain and use long time series data sets collected by government agencies. However, most members of the scientific community are not particularly well trained nor well motivated to ensure the long-term preservation and usefulness of the data they themselves collect. Further, scientists in other specialties, engineers, planners, emergency managers, lawyers, historians, and other members of the public, all have retrospective uses for atmospheric data and related records.
When one thinks of “federal records,” one usually imagines records associated with transactions of business by federal agencies. However, by law, federal records can include records “made or received by an agency . . . preserved or appropriate for preservation . . . because of the informational value of the data in them.” Nevertheless, agencies do not always consider their scientific data nor the scientific data resulting from research they fund to be federal records. The panel believes that federal agencies should be more inclusive with regard to designating scientific data, either in their possession or created with their funds, as federal records and ensuring the long-term survival of those data.
For some, the term “archiving” conjures up images of records gathering dust in a safe place. That is not the image the panel wishes to convey. Preservation without easy access is of little value. Further, it is difficult to justify the effort and cost needed to maintain a set of records if they are never accessed, though such access may not occur until some distant future date. In developing the suggested recommendations, the panel has concentrated on the actions necessary for records to be useful after extended periods of time.
This new look at the long-term preservation of scientific information has been requested by NARA, NOAA, and NASA as they consider taking a more active and responsive role in the protection and management of scientific and technical records, especially digital records that are machine readable. As the volume of new data continues to grow rapidly and some of the older records continue to decay or lose supporting information, the panel commends the federal agencies for their increased interest in the issue of long-term archiving of scientific and technical records.
Retaining records for long periods of time, on the order of decades to centuries, is not an effortless process. The physical media on which records are stored—be they paper, magnetic, optical, or some other type—degrade over time. This degradation makes necessary the migration of the information. For records to be useful long after they are prepared, and by other than those who prepared them, the full provenance and associated “metadata” must be available to those later users; preparation of this descriptive information requires significant effort. Physical storage space is needed to house the accumulating records; this space must be secure and climate controlled to retard degradation. All of these requirements lay claim to human and financial resources.
The retention problem is being exacerbated by the rapidly increasing collection of atmospheric data by federal agencies—truly a data explosion. Depending on how one defines a single data set, the atmospheric sciences have perhaps 2000 to 6000 identifiable data sets. Throughout the 1960s and 1970s, atmospheric data available for archiving were accumulating at a rate of about 2 terabytes/year. In the 1980s, expanded operational weather satellite systems pushed the rate to nearly 15 terabytes/year. In the 1990s, as new operational weather radar and satellite systems are installed, the atmospheric data-collection rate is expected to reach 120 terabytes/year. Present expectations are that this acceleration of data rates will slow so that collection rates will be “only” about 150 terabytes/year by 2005.
The claim to resources for records retention leads NARA, NOAA, and all archivists to ask which records are worthy of retention and which might be discarded. The panel has attempted to address this vexing question. It also attempted to address what procedures are necessary for saved records to have value in the distant future and what institutional arrangements are likely to encourage such procedures.
This is not the first study to examine issues in the archiving of scientific data. The panel endorses many assessments and recommendations made by other recent (1976-1992) studies and reports, and concurs with the following results of previous efforts:
The Climatological Data Users Workshop, 27-28 April 1976, Asheville, North Carolina, recommended the establishment of a Scientific Advisory Panel to advise NOAA on data needs, formats, and retention periods. (This was finally done in 1991.)
The Climate Data Management Workshop, 8-11 May 1979, Harpers Ferry, West Virginia, recommended:
“undertak[ing] a program to determine and record the metadata of station histories and local geography of observing sites;”
developing more complete information on the spatial coverage and resolution of available data sets; and,