including engineers, lawyers, and statisticians. The private sector is also a major collector and source of pertinent data, to the extent that private sector entities are willing to release data from proprietary claims and place them in the public domain.

A representative spectrum of geoscience data sources and types includes:

agronomic and soil reserves

metallic and nonmetallic mineral resources

paleontology

marine geology

fuels/energy resources—oil, coal, geothermal, gas, drilling records, etc.

hydrology—ground water, surface water, stream flow, water quality

tectonics—structural geology

mapping on and below the surface, land classification

physical properties—elemental composition, thermodynamic properties, conductivity, magnetic properties, etc.

glaciology—snow, ice

earthquake prediction and studies

volcanic and landslide hazards

Geoscience data relate to the Earth's surface and below; they describe the things that are found on and in the earth (minerals, oil) and natural phenomena (earthquakes, floods) and include descriptive material (maps). All of these data have a commonality—a locational aspect and a three dimensional or X, Y, Z coordinate system whether in space, on Earth, or below the surface, and they are likely to have a temporal coordinate as well. Essentially, a common locational code or structure provides the basis for indexing, identifying, evaluating, and synthesizing the widely diverse information involved.

The remainder of this report is organized according to the statement of task provided by the steering committee. Section 2 describes some representative examples of databases in the geosciences and identifies some lessons learned that are applicable to long-term archiving. Section 3 addresses the retention criteria and related issues, supported by references to the examples discussed in the previous chapter. Section 4 briefly discusses the major policy considerations associated with archiving of data in the geosciences. The report concludes with a summary of the most important recommendations.

2 DATABASE EXAMPLES

Four geoscience database examples are described in this section: the Landsat archive at the U.S. Geological Survey (USGS) Earth Resources Observing System (EROS) Data Center; the Water Data Storage and Retrieval System/National Water Information System-II (WATSTORE/NWIS-II) operated by the USGS; seismic data held by several federal agencies and other institutions; and the National Snow and Ice Data Center (NSIDC) supported by the National Oceanic and Atmospheric Administration (NOAA).

These data collections are illustrative of the scope of earth science databases. Moreover, many of the issues raised by these examples reflect the concerns of those involved in storing, utilizing, and archiving geoscience data.

The Landsat data archive is of great value as a continuing record of changes on the surface of the Earth. As an illustration of its value, it should be noted that in the mid-1980s, the EROS Data Center decided that it could no longer afford to keep all of the data and intended to discard those from the first few years of collection. When the related user community was notified, there was almost unanimous agreement that nothing should be destroyed. This was strongly indicative of the importance of the continuity of these data, and of their value.

The WATSTORE/NWIS-II is the largest hydrologic database in the United States. It is the basic source of such information and it is continually being expanded. The database is structured so that it is easily accessible to primary users. The metadata are appropriate for a broad range of users, not only the primary researchers. The volume of the WATSTORE/NWIS-II will continue to grow over time, and its rate of growth is also expected to increase. The database is of enduring value because analyses of the nation's water supply will always require access to all of the historical data.

The seismic data example describes a broad range of research and operations applications. It shows that the diverse seismic databases are physically stored in a highly distributed manner. The accumulation of these data over the next several years will exceed all present holdings. It further illustrates a variety of dynamic databases in which



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement