Budget and administrative policies should not be used to justify the abandonment of a data set in lieu of translating it to newer storage media. In particular, older storage equipment often becomes difficult and expensive to maintain, while the purchase of newer technology may be met with budgetary, administrative, and organizational resistance, including inadequately trained personnel. The proper solution is not to abandon data or allow the deterioration of access, but rather to modify organizational policies to encourage the movement to lower-cost media as they become available. At a time when electronic data storage capacities per unit cost double annually, planned migration to new media types should be a central part of planning for data storage.
The planning, implementation, and maintenance of an effective mechanism for long-term archiving of observational data sets must address three critical issues: storage management, accessibility, and assessability. Storage management focuses on various aspects of archiving, including the reliable storage of data for long periods of time, the transfer of data from old to new storage technology, physical data distribution to accommodate institutional policies regarding custodianship or the physical limitations of an institution, and retrieval performance requirements. Accessibility concerns include the provision of capabilities that provide a model of interaction and a mechanism for accepting input from a user on information needs; that locate all data relevant to those needs; and that retrieve, package, and deliver the needed data to the user. Assessability permits the user to clearly determine the significance, relevance, and quality of the data. This section defines a generalized framework for the minimal documentation of observational data that is necessary to ensure adequate accessibility-and assessability.
Metadata are generally considered to be the information necessary for someone who is not previously acquainted with a data set to make full and accurate use of that data set. At a minimum, the metadata associated with a data set must provide a consistent framework that accomplishes the following objectives:
permits assessment of the applicability of a data set to the question or problem at hand;
supports the assessment of the quality and accuracy of the data set;
provides all necessary information to permit a user to access or physically read the values in a data set;
permits the assignment of correct physical units to the values;
supports the translation of logical concepts and terminology between communities; and
supports the exchange of data stored in differing physical formats.
The problem of supplying adequate metadata is receiving increased attention in the context of scientific data management. For example, global climate change research along with general environmental concerns have ignited interest in a more interdisciplinary and long-term approach to conducting science. Interdisciplinary collaboration requires more effective sharing of data and information among individual researchers, programs, institutions, and communities that may operate under different paradigms of knowledge organization or have different terminology for similar concepts. Further, long-term research requires that researchers be able to access and compare data sets that were created by past researchers and collected in different contexts by different technologies. Therefore, to support the interdisciplinary sharing and long-term usefulness of observational data, adequate metadata must be linked inextricably to the data.
Interdisciplinary sharing and long-term usefulness of observational data sets are important goals for any organization involved in the distribution or archiving of scientific data. Such organizations must become increasingly concerned with the provision of high-quality metadata, without which the usefulness of the associated observational data will be severely compromised. Existing information retrieval and data management technologies already provide the scientific and archiving communities the technical means to a satisfactory solution; the problems that exist in this area are more the consequence of the human tendency to ignore the value of metadata during the collection and production of the data. It is at this time that metadata are easiest to produce. The research