2
Context

The challenges of data archiving and access have been discussed in many settings for a number of years. Previous reports from the National Research Council, in particular, have helped set the stage for this report by discussing data collection, management, archiving, and dissemination from various perspectives. For example, the NRC has advised the U.S. National Archives and Records Administration and other federal agencies on the long-term retention of scientific and technical data, particularly in electronic formats (NRC, 1995). A more recent report, Climate Data Records from Environmental Satellites, (NRC, 2004a) focuses more specifically on generating, analyzing, and archiving the records that are most useful for understanding climate variability and change. This latter report, although focusing almost exclusively on climate data, offers many insights that apply to archiving other kinds of environmental and geospatial data, including the identification of 14 essential elements of successful climate data record generation programs. Many of its findings and recommendations could also be applied more broadly to environmental and geospatial data from diverse sources with a simple change of wording, such as this paragraph from the executive summary:

Underlying many of the elements of success is early attention to data stewardship, management, access and dissemination policies, and the actual practices implemented. Because a successful [climate data record] program will ultimately require reprocessing, dataset used in their creation, such as metadata, should be preserved indefinitely in formats that promote easy access. The ultimate legacy of long-term [climate data record] programs is the data left to the next generation, and the cost of data management and archiving must be considered as an integral part of every [climate data record] program.

Another relevant report, Government Data Centers: Meeting Increasing Demands (NRC, 2003a), summarizes a workshop exploring how the increasing volume and number of data sets, coupled with greater demands from more diverse users, are making it difficult for data centers to maintain records of environmental change. The report focuses on technological approaches that could enhance the ability of environmental data centers to deal with these challenges, and improve the ability of users to find and use information held in data centers. The NRC has also provided more focused reviews of NOAA’s National Geophysical Data Center (NRC, 2003b), NOAA’s National Climatic Data Center (NRC, 1993), and NASA’s Distributed Access Archive Centers (NRC, 1999), which offer a variety of insights on how to best archive and provide access to environmental and geospatial data.

Perhaps the most relevant and still timely supporting NRC report is Utilization of Operational Environmental Satellite Data: Ensuring Readiness for 2010 and Beyond (2004b), which offers findings and recommendations aimed at defining specific approaches to resolving the potential overload faced by two agencies—NOAA and NASA—responsible for satellite data. The report focuses on the end-to-end utilization of environmental satellite data by characterizing the links from the sources of raw data to the end requirements of various user groups. The “Utilization Report” is an important foundation document because it addresses three still-



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 6
Preliminary Principles and Guidelines for Archiving Environmental and Geospatial Data at NOAA: Interim Report 2 Context The challenges of data archiving and access have been discussed in many settings for a number of years. Previous reports from the National Research Council, in particular, have helped set the stage for this report by discussing data collection, management, archiving, and dissemination from various perspectives. For example, the NRC has advised the U.S. National Archives and Records Administration and other federal agencies on the long-term retention of scientific and technical data, particularly in electronic formats (NRC, 1995). A more recent report, Climate Data Records from Environmental Satellites, (NRC, 2004a) focuses more specifically on generating, analyzing, and archiving the records that are most useful for understanding climate variability and change. This latter report, although focusing almost exclusively on climate data, offers many insights that apply to archiving other kinds of environmental and geospatial data, including the identification of 14 essential elements of successful climate data record generation programs. Many of its findings and recommendations could also be applied more broadly to environmental and geospatial data from diverse sources with a simple change of wording, such as this paragraph from the executive summary: Underlying many of the elements of success is early attention to data stewardship, management, access and dissemination policies, and the actual practices implemented. Because a successful [climate data record] program will ultimately require reprocessing, dataset used in their creation, such as metadata, should be preserved indefinitely in formats that promote easy access. The ultimate legacy of long-term [climate data record] programs is the data left to the next generation, and the cost of data management and archiving must be considered as an integral part of every [climate data record] program. Another relevant report, Government Data Centers: Meeting Increasing Demands (NRC, 2003a), summarizes a workshop exploring how the increasing volume and number of data sets, coupled with greater demands from more diverse users, are making it difficult for data centers to maintain records of environmental change. The report focuses on technological approaches that could enhance the ability of environmental data centers to deal with these challenges, and improve the ability of users to find and use information held in data centers. The NRC has also provided more focused reviews of NOAA’s National Geophysical Data Center (NRC, 2003b), NOAA’s National Climatic Data Center (NRC, 1993), and NASA’s Distributed Access Archive Centers (NRC, 1999), which offer a variety of insights on how to best archive and provide access to environmental and geospatial data. Perhaps the most relevant and still timely supporting NRC report is Utilization of Operational Environmental Satellite Data: Ensuring Readiness for 2010 and Beyond (2004b), which offers findings and recommendations aimed at defining specific approaches to resolving the potential overload faced by two agencies—NOAA and NASA—responsible for satellite data. The report focuses on the end-to-end utilization of environmental satellite data by characterizing the links from the sources of raw data to the end requirements of various user groups. The “Utilization Report” is an important foundation document because it addresses three still-

OCR for page 6
Preliminary Principles and Guidelines for Archiving Environmental and Geospatial Data at NOAA: Interim Report challenging areas: (1) the value of and need for environmental satellite data, (2) the distribution of environmental satellite data, and (3) data access and utilization; it also includes still-pertinent findings and recommendations that will be considered and updated in this committee’s final report. However, some of the findings are particularly relevant to set the stage for this report, including (in brief): Improved and continuous access to environmental satellite data is of the highest priority for an increasingly broad and diverse range of users. The national and individual user requirements for multiyear climate system data sets from operational satellites are placing special demands on current and future data archiving and utilization systems. The Comprehensive Large Array-data Stewardship System (CLASS) is being designed5 by NOAA to catalog, archive, and disseminate NOAA environmental satellite data. Given the magnitude of this effort—and considering the growing volume, types, and complexity of environmental satellite data; the increasingly large and diverse user base; and expectations for wider and more effective use of data—NOAA needs to have a comprehensive understanding of the full scope of the technical requirements for data cataloging, archiving, and dissemination and needs to ensure that implementation is based on that knowledge. Key to successful implementation of a strong system that will serve operational users and the nation well are detailed planning, proactive follow-through, and NOAA’s incorporation of lessons learned from previously developed, similarly scaled initiatives with similar system requirements. Data from diverse satellite platforms and for different environmental variables must often be retrieved from different sources, and these retrievals often yield data sets in different formats with different resolution and gridding. The multiple steps currently required to retrieve and manipulate environmental satellite data sets are an impediment to their use. Early and ongoing cooperation and dialog among users, developers of satellite remote sensing hardware and software, and U.S. and international research and operational satellite data providers is essential for the rapid and successful utilization of environmental satellite data. Many of the greatest environmental satellite data utilization success stories have a common theme: the treatment of research and operations as a continuum, with a relentless team focus on excellence with the freedom to continuously improve and evolve. A number of other reports were of particular importance in helping this committee begin its work. Global Change Science Requirements for Long-Term Archiving: Report of the Workshop, October 28-30, 1998 (USGCRP, 1999), provides a good example of preliminary high-level guidance on long-term archiving of Earth observation data and derived products, lessons learned from current and past experiences, and the guiding principles and essential function necessary for any program to be successful. The report Recommendation for Space Data System Standards: Reference Model for an Open Archival Information a System, (Consultative Committee for Space Data Systems, 2002) is an internationally developed set of 5   CLASS is still being developed; the Committee will address this effort more explicitly in its final report.

OCR for page 6
Preliminary Principles and Guidelines for Archiving Environmental and Geospatial Data at NOAA: Interim Report recommendations about how open archival information systems should be structured. Among other tasks, the “OAIS” document defines a model system that: Provides a framework for the understanding and increased awareness of archival concepts needed for long term digital information preservation and access. Provides the concepts needed by non-archival organizations to be effective participants in the preservation process. Provides a framework, including terminology and concepts, for describing and comparing architectures and operations of existing and future archives. Provides a basis for comparing the data models of digital information preserved by archives and for discussing how data models and the underlying information may change over times. Provides a foundation that may be expanded by other efforts to cover long-term preservation of information that is not in digital form. Expands consensus on the elements and processes for long-term digital information preservation and access. The most recent report from the NRC’s Committee on Earth Science and Applications from Space (NRC, 2005) has also highlighted the critical need for the archival, access and stewardship of climate data records, focusing on satellites, as stated in one of their recommendations: The committee recommends that NOAA, working with the Climate Change Science Program and the international Group on Earth Observations, create a climate data and information system to meet the challenge of ensuring the production, distribution, and stewardship of high-accuracy climate records from NPOESS and other relevant observational platforms. Similarly, the National Science Board at the request of the National Science Foundation has also reported on the importance of long-term archives in “Long-lived Digital Data Collections: Enabling Research and Education for the 21st Century” (NSF, 2005). Here there is recognition that there is need for a broad dialogue between agencies that collect data, and a clear technical and financial strategy along with support data policies are required to preserve the valuable data resources. Data archiving and access are also identified as critical components in the multiple agency Data Management and Communications (DMAC) Subsystem of the Integrated Ocean Observing System (IOOS) plan (Hankin, S. and the DMAC Steering Committee, 2005). As implementation of IOOS/DMAC takes shape with accountability for data preservations two things are obvious, data archiving and accesses are required components in end-to-end data systems, and are necessary focus for all agencies that collect data. Other relevant reports pertaining to the challenges of data archiving include the International Council for Science (ICSU) Report of the CSPR Assessment Panel on Scientific Data and Information (ICSU, 2004) and the Final Report from the Workshop on Research Challenges in Digital Archiving and Long-Term Preservation (NSF, 2003). Together these documents and many others, often available on the websites of facilities responsible for data management,6 make it clear that the development of principles and guidelines for archiving environmental and geospatial data need not be developed from scratch. 6   e.g., World Data Center System, http://www.ngdc.noaa.gov/wdc/

OCR for page 6
Preliminary Principles and Guidelines for Archiving Environmental and Geospatial Data at NOAA: Interim Report There is a wealth of information available that waits to be focused on NOAA’s upcoming challenge—how to be a wise steward of such a broad range of data, given both the reality of limited resources and the enormous growth in data volume that is both inevitable and invaluable. The importance of long-term data collection, the need to archive environmental and geospatial data for maximal societal benefit, and the recognition of the significant costs involved are all issues the Earth System science and data management communities have long discussed, albeit not always focused on the specific challenges faced by NOAA. In this interim report, the committee is thus attempting to synthesize and apply concepts that have gained broad acceptance in the data management community as preliminary principles and guidelines for archiving environmental and geospatial data in particular.