Comprehensive and consistent metadata must be maintained with all long-term observational data sets in order to support effective maintenance, access, and usage. Therefore, effective long-term retention of observational data sets requires an underlying compatibility among data management activities at the point of origin, data collection performed during the conduct of research funded by the agencies, discipline-specific data centers, and general archive centers. A mechanism must be defined to coordinate these activities over long periods of time while allowing for autonomous control of technology assimilation and data management approaches. Policies, procedures, and technology must be put into place and coordinated across all phases of the observational data life cycle.
The metadata requirements imposed by long-term retention significantly affect the data management activities at the point of origin and at short-term archive centers. This impact is a result of the requirement for complete and consistent metadata to support future primary and secondary uses of the observational data sets. Although the metadata requirements for the primary users and data originators are not as comprehensive as those for secondary users, it is the primary user group that must bear the burden of attaching the full documentation. Addressing full documentation later in the life cycle of the data will introduce prohibitive costs or may even be impossible.
A fully specified information model of metadata requirements will provide the baseline for intelligent and effective data management standards. However, this will not be enough unless the standards are enforced. Agencies funding data collection activities as part of scientific research must play a key role in ensuring the implementation of the standards. NARA needs to work closely with the federal agencies to communicate the dependence of eventual long-term archiving on effective data management from the point of origin. In many cases, existing research support agreements from these agencies do require grant-supported data to be submitted to a federal repository, such as the NODC. This agreement structure could enhance long-term retention efforts if it were better implemented, financially supported, and enforced.
The institutions that host multiple researchers also can play a key role by providing enhanced data management services as an infrastructure function to all affiliated researchers and users. In addition, research institutions also should be involved in developing and externally promoting improved information technology, as well as data and metadata standards, to improve inter-institutional data exchange and cooperation. Such activities could offset any added burden on individual researchers resulting from increased enforcement of data and metadata standards by funding agencies.
Experience with the continuous management of observational data sets has shown that although the underlying organization of the data may remain fairly constant, small changes in technology, such as software versions or schema evolution, require active planning for eventual long-term archiving throughout the life cycle of the data sets.
Successful data management centers and archives have the following characteristics:
They are “close” to the data originators. Thus, they learn what data are being collected, they know how the technology to measure specific ocean properties has evolved, they are trusted by data originators with their data, and they actively bring data into the archive.
They are “close” to their users, in physical proximity and in intellectual training. Therefore, they understand their users needs for data access and respond in a timely manner to rapidly changing priorities.
They take advantage of evolving computer technologies to minimize the cost of their activity relative to the amount of information held. This requires continuous assessment of market offerings and continuous training of personnel.
They “exercise” their data holdings regularly. Usually in partnership with a researcher, data should be compared, analyzed, summarized, and gridded. This ensures interest in the holdings and intimate knowledge of the current holdings' status.
They not only receive data, but can promptly deliver any holdings.