the generally complicated nature of geoscience data and their use, it is unlikely that a nonexpert could get much value from the archive without expert help, at least for a significant fraction of the data that need to be kept in perpetuity.
If NARA were to become the repository for a substantial number of geoscience data sets, in-house personnel with adequate scientific backgrounds or experience in the various disciplines would be required to support a broad range of users in finding the right data, accessing them, and perhaps providing initial use instruction. The documentation and metadata adequate for the primary research community often is insufficient to make geoscience data understandable or fully usable by non-experts.
Thus, if NARA were to be a major scientific data repository, it would have to maintain a staff of scientifically knowledgeable people. Otherwise the archived data would have significantly diminished value. NARA also could make agreements with the originating agencies to supply expertise as needed.
A much more practical approach by NARA for assuring the long-term viability and accessibility of geoscience data would be the designation of the various existing data centers that meet NARA standards as “Affiliated Archives” of NARA. These arrangements could be formalized through Memoranda of Understanding.
In summary, whether NARA is to archive scientific data either in-house or at a remote location, it is clear that it will have to maintain some staff with appropriate scientific capability. Further, it will have to work closely with the various agencies to help them improve and provide adequate documentation for all data.
There are cases in which the dissemination of data may be expressly limited. These include data concerning government leasing of land and drilling permits, data provided in part by private (commercial) sources, and key statistical data involved in fuel or agricultural production estimates. Issues of special consideration include Native American lands and national security. Any such restrictions on data need to be considered on a case-by-case basis when they are being archived.
The panel is encouraged by several recent attempts to better coordinate and manage data in the earth sciences. Two notable examples are the National Spatial Data Infrastructure, coordinated by the Federal Geographic Data Committee, and the Global Change Data and Information System, coordinated by the Interagency Working Group on Data Management for Global Change.
Nevertheless, there is no comprehensive plan or policy in the federal government relating to digital databases, information systems, or archives, even though they pervade our entire society. This is especially true for scientific and technical data across the physical sciences. There has been a concomitant reluctance on the part of government to state any broad policy relating to these areas. Although various groups of scientists and professionals have attempted to encourage government action, they have not yet been successful. Funding for databases, information systems, and archiving is notoriously the first to be cut whenever there are budget constraints, which appear to be perennial. Many issues arise, as indicated in the previous chapter. While the resolution of these issues may be difficult, one aspect appears clear—some type of comprehensive government policy needs to be developed.
The panel's report is concerned specifically with the problems of long-term archiving of geoscience data by the government and the role of NARA. On the one hand, NARA could use a clarification of its mandate relating to digital scientific and technical data in terms of the statutory, administrative, custodial, and institutional responsibilities. On the other hand, lacking a clearer, stated mandate, NARA could develop a coherent program of its own, taking due consideration of the physical, personnel, and funding limitations. Such a plan, properly promulgated and carried out, could respond well to the needs of the scientific community, the government, and the nation.
Two general possibilities exist for a physical archive. One is a centralized installation containing adequate computer and storage facilities, along with appropriate software and in-house scientific expertise. The other is a distributed archive; that is, geographically dispersed locations with similar characteristics as above, but with network connections for remote access. In the distributed model, the optimal locations for data are where the principal collectors and subject matter specialists reside.
In both cases, we are assuming an up-to-date, automated master directory that is remotely accessible for query through a network. The automated directory should contain sufficient data to allow a user accessing the system to