can find out about all of the component data sets if interested. Some of the data sets for GALE are small subsets (in time and space) of much larger data sets that extend for many years. A user who needs to find the larger data set should not be hampered by having too many catalog “hits” to small component data sets associated with various projects.

Archives should not be viewed as data cemeteries, with only rare and dwindling visits after deposition of data. Access needs to be part of archive construction. The panel recommends that NARA, NOAA, NCAR and other agencies, such as NASA and DOE, coordinate an efficient and effective plan for mutual access to and dissemination of joint and disparate weather, climate, and other atmospheric data holdings to users and researchers within and outside these (and other) agencies. The panel can envisage a future with transparent on-line access to most scientific data holdings of the federal government, regardless of which agency has physical custody.

Motivation and Incentives

Preparing data for archiving and maintaining archives are viewed by most scientists as activities that are neither very exciting nor well rewarded. The benefits for a job well done are largely gained by others after those preparing the archive have moved on to other things, and the difficulties resulting from failures often occur far in the future.

The current trends toward use of data by other than those who collected them and the increased interest in long-term changes in geophysical processes have been leading to improvements in the status of those who prepare and manage data sets. Nevertheless, it is likely that data gathering always will have greater attractiveness than data preservation (NRC, 1995).

6 SUMMARY

The increased attention that long time series of data are receiving in the earth sciences in general, and in the atmospheric sciences in particular, is a very encouraging trend. This increased attention is creating a consensus on the importance of archiving issues and the long-term retention of scientific records. Nevertheless, most members of the scientific community still are not very experienced in thinking on very long time scales regarding data retention.

Scientific data and the technical records needed for the interpretation of data are sufficiently different from other federal records that they merit special treatment. Agencies do not always consider their scientific data nor the scientific data resulting from research they fund to be federal records. The panel suggests that federal agencies should be more inclusive with regard to designating scientific data, either in their possession or created with their funds, as federal records and ensuring the long-term survival of those data. There is still a need for the federal government to articulate clearly an integrated national policy covering its obligations and limitations in the retention and archiving of weather, climate, and other atmospheric data. For all environmental data networks and scientific projects funded with federal extramural grants, the funding agencies should, on a case-by-case basis for all grants, evaluate whether the resulting data would be useful as a national resource. If so, the funding contract should require the submission of the resulting data to some federal agency for archiving as federal records.

A great impediment to the preservation of data is that archiving often requires large amounts of preparation and packaging of the data, rather than just a physical transfer. To ease the archiving process, for projects that involve the collection of scientific data, all agencies (especially NOAA for projects involving the atmospheric sciences) should integrate into the initial planning considerations of data storage, metadata preparation, data set indexing, information retrieval, and data archiving. For data sets to be useful in the future, it is essential that they be accompanied by full documentation of the instrumental calibration and precision, site exposures, quality control, compaction techniques, and related information; this information is the metadata for a data set. The metadata file accompanying a data set must contain enough information so that the data can be understood and used effectively after all of the experts who created or worked with them are gone.

Archiving will be much easier if agencies prepare data initially so that they can be archived internally or passed on to NARA without significant additional processing later. Agencies may even discover that it is easier for them to use the data operationally if these additional preparation efforts are made when the data are first stored. In particular, metadata and associated manuals should travel with the data sets in the same medium. In cases where low-level data are archived not in their most convenient form, additional files containing descriptions of algorithms or samples of computer software may need to be bundled with the data.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement