Summary

The National Research Council (NRC) impaneled this committee to provide high-level advice on how to archive and provide access to the broad range of environmental data collected by the National Oceanic and Atmospheric Administration (NOAA) and its partners. The data managed by NOAA provide a broad range of benefits to society, but rapid increases in data volumes, as well as demand for these data, have created a first-order data management challenge. NOAA has asked for principles and guidelines to help them identify the observations, model output, and other environmental information that must be preserved in perpetuity and made readily accessible, as opposed to data with more limited storage lifetime and accessibility requirements.

This report offers nine general principles for effective environmental data management, listed below, along with a number of guidelines and examples that describe and illustrate how these principles could and should be applied at NOAA. This guidance is based on the following categories of information reviewed: NOAA’s legal and administrative mandates; the data archiving requirements and access capabilities required by NOAA’s mission objectives; the findings and recommendations from many previous reports; the documents and feedback received from NOAA, its agency partners, and its data users regarding current and planned data management activities; and the knowledge and experience of the members of this committee.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 1
Summary The National Research Council (NRC) impaneled this committee to provide high-level advice on how to archive and provide access to the broad range of environmental data collected by the National Oceanic and Atmospheric Administration (NOAA) and its partners. The data man- aged by NOAA provide a broad range of benefits to society, but rapid increases in data volumes, as well as demand for these data, have created a first-order data management challenge. NOAA has asked for principles and guidelines to help them identify the observations, model output, and other environmental information that must be preserved in perpetuity and made readily accessible, as opposed to data with more limited storage lifetime and accessibility requirements. This report offers nine general principles for effective environmen- tal data management, listed below, along with a number of guidelines and examples that describe and illustrate how these principles could and should be applied at NOAA. This guidance is based on the follow- ing categories of information reviewed: NOAA’s legal and administra- tive mandates; the data archiving requirements and access capabilities required by NOAA’s mission objectives; the findings and recommenda- tions from many previous reports; the documents and feedback received from NOAA, its agency partners, and its data users regarding current and planned data management activities; and the knowledge and experience of the members of this committee. 

OCR for page 1
 ENVIRONMENTAL DATA MANAGEMENT AT NOAA PRINCIPLES 1. Environmental data should be archived and made accessible. 2. Data-generating activities should include adequate resources to support end-to-end data management. 3. Environmental data management activities should recognize user needs. 4. Effective interagency and international partnerships are essential. 5. Metadata are essential for data management. 6. Data and metadata require expert stewardship. 7. A formal, ongoing process, with broad community input, is needed to decide what data to archive and what data not to archive. 8. An effective data archive should provide for discovery, access, and integration. 9. Effective data management requires a formal, ongoing planning process. Although the principles are numbered for convenience, all nine should be regarded as equally important for effective environmental data management. The explanations that follow include key guidelines that explain how these principles could and should be applied to improve the effectiveness, reliability, and utility of NOAA’s data management activi- ties. The numbers in brackets indicate the chapter(s) of the report where each principle is discussed. PRINCIPLE #1: Environmental data should be archived and made accessible. [3, 5, 6] In the view of this committee and many other groups, full and open access to data should be a fundamental tenet at all US federal agencies, including NOAA. The environmental data1 collected by NOAA and its partners constitute an invaluable resource that should be securely archived and made broadly accessible so that a diverse group of users can conduct the analyses and generate the products necessary to describe, understand, and predict changes in the Earth’s environment. Although it is impossible to save everything, the goal of NOAA’s data management enterprise should be to ensure that the broadest possible collection of environmental data is archived and made discoverable and accessible to 1 Throughout this report, the term “environmental data” is used broadly to indicate all types of Earth System observations (including physical samples as well as in situ and remotely sensed data), model output, and synthesized products derived from these data.

OCR for page 1
 SUMMARY the widest possible range of users. Doing so will maximize the value of collected data and guarantee that they are available to meet the nation’s current and future economic, social, and environmental needs. PRINCIPLE #2: Data-generating activities should include adequate resources to support end-to-end data management. [3, 7] At the outset of any activity that will generate environmental data, end-to-end data management should be planned and budgeted. It is also essential to plan and secure adequate funding for the manage- ment of environmental data streams that were collected or generated without sufficient resources reserved for long-term data management. Data management planning should begin when plans are first made to acquire a new sensor system or other source of environmental data and continue until the archived data are no longer useful. Plans for data management should explicitly address data storage, preservation, and stewardship responsibilities, and sufficient funds should be reserved to acquire, process, store, maintain, update, and provide access to the data for extended periods of time. These activities all require funding for per- sonnel as well as for hardware. If additional resources can be secured, they should be used to bring additional environmental data into NOAA’s archives, to improve the quality and usefulness of the data already in the archive, and to provide more effective access to both new and existing data sets. PRINCIPLE #3: Environmental data management activities should recognize user needs. [3, 4, 5, 6, 7] The success of any data management enterprise is judged by its use- fulness to current and future users. However, for environmental data, both user needs and the data themselves are constantly evolving. Thus, all environmental data management activities, including decisions regard- ing archiving and access of specific data sets as well as the development of the enterprise-wide data management plan, should incorporate sub- stantial and ongoing user input. Ideally, this input should include both informal feedback from a broad range of users, for instance through user logs, and more formal forms of feedback such as external advisory groups and stakeholder panels. User involvement is critical to make certain that essential data are properly archived and that data access activities are as practical, efficient, and cost-effective as possible. Ongoing stakeholder involvement in the planning process is also essential to ensure adequate integration and coordination across the broad spectrum of NOAA’s data management activities.

OCR for page 1
 ENVIRONMENTAL DATA MANAGEMENT AT NOAA PRINCIPLE #4: Effective interagency and international partner- ships are essential. [3, 5] Any environmental data management planning process or system needs to include substantial coordination and agreement among the rele- vant federal agencies and international partners in order to achieve maxi- mum cost-effectiveness and to ensure proper data stewardship. Different agencies have different missions with respect to collecting, archiving, and providing access to different types of data, and these missions sometimes overlap or leave gaps in responsibility for critical data sets; for instance, the transition from research to operations has been a longstanding chal- lenge. Although no agency can or should be expected to assume com- plete responsibility for all types of environmental data, NOAA should consider taking a leadership role in archiving and providing access to a broad range of environmental data, including data not traditionally regarded as falling under its operational mission. At the very least, NOAA should improve its relationship with several important part- ners to guarantee that all essential environmental data are archived and made available to users. Leadership on data management issues is important if NOAA wants to continue being regarded as the lead federal agency for Earth System research, environmental prediction, and other important data-driven activities such as climate services. PRINCIPLE #5: Metadata are essential for data management. [3, 4, 5, 6] To ensure the enhancement of knowledge for scientific and societal benefit, metadata that adequately document and describe archived envi- ronmental data should be created and preserved. Whenever possible, NOAA’s metadata protocols should be made compatible with inter- national standards and should be expanded, in coordination with other federal agencies and international entities, to ensure proper preserva- tion and stewardship of data and to facilitate data discovery, access, and integration. Metadata ideally should include not only descriptive information about the data and how they were originally collected or produced, but also information on the quality and appropriate uses of the data, how they have been managed and processed, and the relationship of the data to data in other collections. Some of this information could be gleaned from user feedback logs, which should be considered for inclu- sion with the standard metadata associated with each data set.

OCR for page 1
 SUMMARY PRINCIPLE #6: Data and metadata require expert stewardship. [4] Scientific data stewardship encompasses all activities that preserve and improve the information content, accessibility, and usability of data and their associated metadata. These activities include maintaining a scal- able and reliable infrastructure to support long-term access and preserva- tion, preserving data access and archive integrity during media migration and software evolution, providing effective data support services and tools for users, and enhancing data and metadata by adding information that is established throughout the data life cycle. By adding a data management framework that maintains and improves the archive and assures access and understanding for users, the stewardship function supplements and enhances data collection, spans data archiving and access to enable dis- covery and integration, and facilitates the realization of societal benefits. Scientific data stewardship, with assigned organizational responsibil- ity, should be applied to all environmental data sets managed by NOAA and to their associated metadata to make sure this information is pre- served, remains continually accessible, and can be improved as future discoveries build understanding and knowledge. In particular, each data set should have an identified steward who understands the data and is responsible for working with data providers and data users to maintain the archive, assess and improve the data and associated metadata, and make certain that data access systems meet user needs. PRINCIPLE #7: A formal, ongoing process, with broad community input, is needed to decide what data to archive and what data not to archive. [5] NOAA needs to establish a high-level, enterprise-wide approach to decide what data to include in their archives. It is not possible to save everything, so at some point certain data will need to be designated for reduced archiving and/or access requirements. Original observations, being irreplaceable, are the most important type of data to preserve. Can- didates for reduced archiving requirements include data that are obsolete, redundant, or clearly have only short-term uses. Because it is impossible to predict all future applications of a data set, however, it is always pref- erable to reduce accessibility but preserve the data in some form than to consider disposal when resources for data management are limited. For data that can be reproduced or replicated, such as multiple versions of reprocessed data or model output, it may sometimes be more cost- effective to regenerate the data on demand than to archive all versions of the data. In all cases, the decision to archive data (or not to archive data)

OCR for page 1
 ENVIRONMENTAL DATA MANAGEMENT AT NOAA should follow established procedures, should be driven by scientific and societal benefits, and should explicitly incorporate broad commu- nity engagement and coordination with other agencies. PRINCIPLE #8: An effective data archive should provide for discov- ery, access, and integration. [6] To the maximum extent possible, users should have full, open, and timely access to all of NOAA’s environmental data and metadata; data and metadata should be made easily discoverable to a broad range of users; and the data management system should be designed to allow the integrated exploitation of data from multiple sources within and out- side of NOAA to answer environmental questions and support NOAA’s mission. Discoverability and integration depend critically on metadata to guide search results and on compatible data types and structures, while accessibility is predicated on minimizing administrative, technical, and systematic barriers. In addition, all aspects of data access must take the needs of users into account. Since environmental data are generally man- aged in a distributed environment and user capabilities and needs vary widely, different access mechanisms are needed to accommodate different data presentations and user requirements. In general, all aspects of envi- ronmental data access would benefit from an enterprise-wide effort to improve and expand existing data access portals, increase the linkages between different portals, and create new portals targeted at particular applications or user groups. Search tools and other discovery-enhancing features could also be improved at many data access points, and an ongoing process is needed to evaluate user needs, capture user feedback, and use this feedback to build metadata and improve data access tools. PRINCIPLE #9: Effective data management requires a formal, ongo- ing planning process. [7] NOAA faces a formidable data management challenge due to rapidly growing data volumes and data access demands, a diverse spectrum of data types, a heterogeneous population of users and potential users ask- ing increasingly inter- and multidisciplinary environmental questions, and an environment of rapidly evolving technologies and constant bud- getary pressures. NOAA also needs to continue to ensure the consistency and continuity of numerous data sets in order to meet its legal, mission, and administrative requirements and to fulfill its interagency and inter- national agreements. However, the scale and complexity of NOAA’s data archiving and access requirements have reached the point where the ad hoc, discipline-specific, instrument-oriented data management systems

OCR for page 1
 SUMMARY and decision-making processes relied on in the past are unlikely to be able to keep up with future demands. To improve the coordination, integration, flexibility, adaptability, and cost-effectiveness of its future data management activities, NOAA should establish and codify an enterprise-wide data management plan that explicitly incorporates all the principles in this report and empha- sizes the role of environmental data in supporting NOAA’s mission, vision, and goal statements. This plan should formally delineate indi- vidual and shared responsibilities for the archiving, stewardship, and access of all environmental data sets that fall under NOAA’s purview. Creating a complete, publicly available inventory of NOAA’s data would be a logical first step. The data management plan should also include a formal, ongoing planning process—developed and integrated across NOAA with substantial user involvement and coordination with other agencies—to prioritize data management activities and to make sure the system keeps pace with changes in observing systems, models, data stor- age and access technologies, scientific understanding, user demands, and available resources. Fortunately, NOAA has the opportunity to build on a number of successful data management activities that are already providing reliable data archiving, stewardship, and access capabilities for many user com- munities, as well as several prototype projects designed to improve data discovery, utilization, and integration. Recent noteworthy accomplish- ments include the Global Earth Observation Integrated Data Environ- ment (GEO-IDE) concept, the formation of the Data Access and Archiving Requirements (DAAR) Working Group, and the implementation of the Comprehensive Large-Array [data] Stewardship System (CLASS). All these activities will need to be expanded, integrated under a formal enter- prise-wide data management plan, and accompanied by broad ongoing support in order to create a truly integrated and cost-effective system that meets user needs as well as all legal, administrative, and mission requirements. NOAA would be well advised to continue moving toward a federated but integrated “system-of-systems” to manage its environmental data. Such an approach would leverage existing capabilities and facilitate the inclusion of users and other stakeholders in the planning process, while still providing the top-level support and strategic guidance required to improve integration, coordination, and cost-effectiveness. This approach would also mirror and support the evolution toward a system-of-systems concept in the observational network, namely, the Global Earth Observ- ing System of Systems (GEOSS). Implementation will be a difficult and challenging process, particularly in light of projected data volumes and present resource allocations, but it is critical for NOAA to establish and

OCR for page 1
 ENVIRONMENTAL DATA MANAGEMENT AT NOAA maintain a coordinated, enterprise-wide data management plan to ensure that it continues to meet its legal and mission requirements as well as user needs. ——— Understanding and predicting environmental change are likely to remain critically important activities throughout the 21st century. Hence, it is critical to archive and provide access to the environmental data needed to describe, understand, and predict changes in the Earth System and the impacts of these changes on ecosystems, society, and other elements of the system. NOAA is clearly dedicated to providing excellent service for its many user communities, and the agency is to be commended for soliciting external advice to make sure it continues to provide effective stewardship of important national assets. We hope that NOAA and its partners find the guidance offered in this report useful as they continue to improve their data archiving, data stewardship, and data access capabilities.