samples for very large data sets. However, the panel did not have the resources to fully examine these issues, and recommends that a focused study should be undertaken to evaluate the tradeoffs and problems of archiving for the long term only a sample of our largest data sets in the atmospheric sciences, with an eye toward finding acceptable sampling techniques.
The databases at NCDC represent the largest collection of historical weather information in the United States. They have undergone many changes in formats, storage media, data management, and handling and processing procedures, often with inadequate funding as priority. NCDC has pursued various data rescue, archival, and systems modernization efforts with limited success. However, data-set preparation for archiving and care is expensive.
In recent years, there have been significant accomplishments in data management, archival procedures, data conversion, and data availability. Much, however, remains to be done. Specifically:
there are vast holdings at risk (with neither compression nor backup);
many holdings are not properly inventoried nor prioritized;
computer systems at NCDC lag behind the current state-of-the-art, and are overloaded, without an adequate replacement plan;
many of the data collection systems are inefficient, antiquated, and not compatible with others;
backup tapes, for a portion of the data, are in close proximity to the operations, placing those data at risk; and
systems planning for data protection has lagged.
At one time the USAF had nearly 15 years of thunderstorm summary data from the Soviet Union. These data were totally lost when NCDC was unable to accept them due to resource constraints. Most of these problems are a direct result of decades of inadequate funding and have been solved only partially by the dedication and heroic efforts of highly motivated NCDC staff.
Though NCDC is the active archive for NOAA weather data, it does not have all the NOAA atmospheric data. It has a substantial fraction of the operational NOAA data sets, and a much smaller fraction of the research data sets generated by NOAA funding. It is barely keeping up with the quantity of information pouring into the Center. The panel questions whether it is a good idea for NCDC to make efforts to physically obtain yet more databases without funding specifically allocated to the acquisition and maintenance of that specific data set. Otherwise, their very limited resources would be diluted even further. However, the panel can think of no better home for most U.S. atmospheric data.
A great difficulty in the funding of archives is deciding who pays for long-term retention and access. The users who will most benefit from the archive may not yet anticipate their own need or even yet be born. Along with the initial planning for the collection of data, there should be initial budgeting for data management and preservation. For example, DOE's ARM program budget estimates about 15 percent of the total program budget for data management and archiving. EOSDIS will claim a similar share of the EOS funds. This amount is significantly more than many projects might need, but the panel supports the concept.
Often there is the attitude that if an agency accepts or archives data at all, full support with extensive resources must be made available. NARA and NOAA should explore providing a range of levels of support with different levels of storage media quality and different levels of access convenience for data sets, decided deliberately rather than by default depending on short-term budget fluctuations. As has already been noted, the best technology is not always appropriate.
The panel reminds the agencies, as have many prior studies, that maintaining archives should be among agencies' highest priorities because the costs of maintaining and archiving data are sufficiently small compared to original data collection and preparation costs. On the whole, putting more money into data management usually realizes much more on our investment than collecting more data. On a broad level, the federal government should examine priorities for additional funding of data management and archiving.
NARA has the general responsibility for overseeing the disposition of all federal records and retaining all federal records worthy of indefinite archiving. This is a daunting responsibility, given the magnitude of the federal