2
Background

The first part of this chapter summarizes current data management activities at NOAA as well as the various requirements, mandates, and national and international agreements that are most relevant to these activities. It also describes how rapid increases in data volume and data diversity are straining existing resources. The following section summarizes findings and recommendations from a number of previous reports that address various aspects of environmental data management. This background material forms the basis for this committee’s understanding of current data management practices at NOAA and other federal agencies, which leads directly to the principles and guidelines provided in the chapters that follow.

DATA MANAGEMENT AT NOAA

Historically, NOAA’s data management activities were driven primarily by the agency’s meteorological, oceanographic, and geophysical operational mission requirements. In recent years, the agency has taken on an increasing role in collecting, archiving, and providing stewardship and access for a broad range of environmental and geospatial data, including data collected internationally and by other agencies for a wide variety of purposes. Currently, NOAA’s National Environmental Satellite, Data, and Information Service (NESDIS) operates three National Data Centers—the National Climatic Data Center (NCDC), the National Geophysical Data Center (NGDC), and the National Oceanographic Data



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 12
2 Background The first part of this chapter summarizes current data management activities at NOAA as well as the various requirements, mandates, and national and international agreements that are most relevant to these activities. It also describes how rapid increases in data volume and data diversity are straining existing resources. The following section summa- rizes findings and recommendations from a number of previous reports that address various aspects of environmental data management. This background material forms the basis for this committee’s understanding of current data management practices at NOAA and other federal agen- cies, which leads directly to the principles and guidelines provided in the chapters that follow. DATA MANAGEMENT AT NOAA Historically, NOAA’s data management activities were driven pri- marily by the agency’s meteorological, oceanographic, and geophysical operational mission requirements. In recent years, the agency has taken on an increasing role in collecting, archiving, and providing steward- ship and access for a broad range of environmental and geospatial data, including data collected internationally and by other agencies for a wide variety of purposes. Currently, NOAA’s National Environmental Satel- lite, Data, and Information Service (NESDIS) operates three National Data Centers—the National Climatic Data Center (NCDC), the National Geophysical Data Center (NGDC), and the National Oceanographic Data 

OCR for page 12
 BACKGROUND Center (NODC)—along with a number of smaller “centers of data.” The National Data Centers are large repositories whose primary functions are to archive, disseminate, and provide stewardship for the data that fall under their purview. “Centers of data” provide specialized data or expertise not readily available from the large data centers (Mock, 2001); they range from well-established archive and access points for specific environmental parameters (for example, the National Snow and Ice Data Center) down to small facilities that maintain certain retrospective records for research or operational applications. For example, the Great Lakes Environmental Research Laboratory holds several specialized data sets, such as hydrology and hydraulics data for the Great Lakes, that are used in research activities and made available to external users.1 Each data cen- ter and center of data represents an important component of NOAA’s data management enterprise, and collectively they are responsible for “acquir- ing, integrating, managing, disseminating, and archiving environmental and geospatial data and information obtained from worldwide sources to support NOAA’s mission.”2 As discussed in the paragraphs that follow, some of NOAA’s data management activities are codified in the form of legislative mandates, administrative orders, or agreements with other entities, but these docu- ments typically do not spell out specific requirements and responsibili- ties for individual data sets or derived products, and in many cases they are not accompanied by dedicated funding to accomplish the required activities. Additionally, little formal guidance is available to help data managers decide on the appropriate level of archiving and access to apply to different kinds of data, such as model output or multiple versions of reprocessed satellite data. A significant fraction of NOAA’s data are thus collected, archived, and disseminated on an ad hoc basis using limited discretionary funds. The dedication and resourcefulness of NOAA per- sonnel have thus far allowed this approach to work, but the ever-increas- ing complexity of environmental data coupled with the anticipated explo- sion in data volumes over the next decade have created a situation where NOAA may not be able to provide the data archiving, stewardship, and access services required to realize the full societal benefits promised by current and future data-generating activities. The background provided in the remainder of this section explains the scope of the data management challenge currently faced by NOAA and describes some of the prelimi- nary steps that the agency has taken to meet this challenge. 1 http://www.grerl.noaa.gov/data/. 2 NOAA Administrative Order 212-15.

OCR for page 12
 ENVIRONMENTAL DATA MANAGEMENT AT NOAA Requirements, Mandates, and Agreements Federal legislation contains numerous mandates requiring NOAA to archive data. Some of these laws were originally established for enti- ties that subsequently became parts of NOAA. For example, the Fed- eral Records Act of 1950 established the National Weather Records Cen- ter, which later became the NCDC, as the official depository of all U.S. weather records. The importance of understanding climate change has led to additional legislation focusing specifically on climate-relevant data. The National Climate Program Act of 1978 authorized “management and active dissemination of climatological data,”3 which led to a correspond- ing increase of the scope of the data managed by NCDC. The full scope of activities performed by NOAA’s three National Data Centers is described in detail in the U.S. Code of Federal Regulations.4 In the broader context of global change research, the Global Change Research Act of 1990 requires “an early and continuing commitment to the establishment, maintenance, global measurements, establishing world- wide observations . . . and related data and information systems.”5 The Climate Change Science Program (CCSP), which bridges all government agencies involved in climate change research, including NOAA, includes the following guidance in its Strategic Plan (CCSP, 2003): “Preservation of all data needed for long-term global change research is required. For each and every global change data parameter, there should be at least one explicitly designated archive.” However, it should be noted that the CCSP Strategic Plan contains no mention of data management. Other pieces of legislation implicitly require archiving through mandates to improve monitoring, analyses, and forecasts for the atmosphere, the ocean, and the land surface (for example, the Ocean and Coastal Observation System Act of 2005 and the Weather Service Organic Act6). In addition, a variety of legislation applies to federal records man- agement at all federal agencies. For instance, the Federal Data Quality Legislation (Act) of 2001 states that “government must assure the quality of the information disseminated.”7 Similarly, all data maintained for legal purposes or in the national interest must be archived using the strin- gent standards of the National Archive and Records Administration. This requirement applies to data held at all Federal Records Centers, including the three NOAA National Data Centers (but not the smaller centers of 3 15 U.S.C. CH29 P.L. 95–357; see also Appendix C. 4 CFR Title 15, Chapter IX, Part 950. 5 P.L. 101-606(11/16/90) 104 Stat. 3096–3104. 6 15 U.S.C. 313 et seq. 7 P.L. 106-554 Section 515.

OCR for page 12
 BACKGROUND data). Other relevant laws include the Paperwork Reduction Act of 1995, 8 the Freedom of Information Act,9 various U.S. National Archives and Records Administration (NARA) Records Management Regulations,10 and other U.S. laws.11 NOAA’s environmental and geospatial data must also be maintained in accordance with applicable Office of Management and Budget (OMB) regulations,12 as well as with Federal Geographic Data Committee (FGDC) approved data standards. All these regulations and pieces of legislation influence data management activities at NOAA and need to be kept in mind when considering any changes to the agency’s data management enterprise. There are also internal mandates and directives regarding NOAA’s environmental data management activities. NOAA’s 2006 Annual Guid- ance Memorandum articulates the need for “Integrated data assimilation and management: archived, interoperable, accessible, and readily usable observations and data products.” NOAA Administrative Order (NAO) 212-15 is the most relevant document for guiding environmental data management across the agency. It contains an assortment of important and encouraging mandates such as “NOAA data management planning will include end-to-end data stewardship”; that NOAA program man- agers should “ensure that during the initial planning of new programs NOAA Line and Staff Office requirements for new data are identified”; and that “data are considered, and are to be treated as, corporate assets.” However, the requirements and mandates provided in this and other documents are often not very specific, leaving data managers with consid- erable flexibility, but also little guidance, for determining which data sets to archive. For example, NAO 212-15 specifies that centers of data should transfer their data holdings to one of the three National Data Centers “when continued storage at the Center of Data is no longer appropriate,” but the order offers no further explanation of what constitutes an appro- priate transition. NOAA has established working relationships with several other fed- eral agencies to address certain data management issues. For example, NGDC archives marine seismic reflection data originally collected by the U.S. Geological Survey (USGS), while NCDC shares data distribution responsibilities for the data generated by the Defense Meteorological Satellite Program (DMSP), which is operated by NOAA on behalf of the Department of Defense. In terms of data retention and archiving, all of 8 44 U.S.C. 3501 et seq. 95 U.S.C. § 552, as amended in 2002; http://www.usdoj.gov/oip/foiastat.htm. 10 36 C.F.R. 1220–1238. 11 For example, 44 U.S.C. 3101–3107. 12 For example, OMB Circulars A-16 and A-130.

OCR for page 12
 ENVIRONMENTAL DATA MANAGEMENT AT NOAA NOAA’s National Data Centers are considered Agency Record Centers and are therefore required to follow NARA’s stringent archiving stan- dards and to provide disposition schedules for their records. For weather and climate data, NCDC and NARA have signed agreements designating NCDC as the operational archive. As discussed in Study on the Long-term Retention of Selected Scientific and Technical Records of the Federal Goernment: Working Papers (NRC, 1995a), “NCDC has its special statutory author- ity because of the great importance of weather and climate records, the great length of time for which those records remain important, [and] the particular expertise with respect to those records found within NOAA.” These same considerations also apply to NGDC and NODC. Other federal agencies have data collection programs and data man- agement responsibilities that overlap to varying extents with activities at NOAA. For example, the Data and Sample Policy at the National Science Foundation’s (NSF’s) Division of Ocean Sciences requires its principal investigators to submit all collected environmental data to designated National Data Centers, four out of five of which are operated by NOAA. USGS archives and distributes terrestrial global data sets derived from NOAA’s Advanced Very High Resolution Radiometer (AVHRR) satel- lite. One of the most notable environmental data archiving activities of the past decade has been the evolution of NASA’s Earth Observing Sys- tem Data and Information System (EOSDIS), which was developed over many years—and at great expense—to serve the needs of NASA’s Earth Observing System. The successes, failures, and “lessons learned” from the development of EOSDIS have been explored in a number of reports (e.g., NRC, 2004a). One of the most serious challenges for EOSDIS, which was so severe that it led to several mission delays (Asrar 1998), was that the information technology and personnel available at the time had difficulty dealing with the volume and complexity of the data; this problem was exacerbated by data processing requirements and heavy user demands. Fortunately, technological advances have made some of these particular challenges less daunting today than they were when EOSDIS was origi- nally being planned and implemented, although other issues remain. For example, EOSDIS was originally conceived as a short-term archive, so the responsibility for long-term management of EOSDIS data remains a subject of major concern and discussion (Hrastar, 2003). Archiving and providing access to satellite data, which tend to be especially voluminous and complex, has been a particularly challeng- ing area for interagency cooperation. NOAA and NASA signed memo- randums of understanding (MOUs) in 1989 and 1992 stating that the two agencies will “exchange near real time data”; that NASA will fund “active short-term archives” and “transfer to NOAA responsibility for active long-term archiving . . . under plans to be developed under this

OCR for page 12
 BACKGROUND MOU”; and that NASA’s “access to and use of non-real time archive data and products under NOAA’s responsibility will be in accordance with legislative mandates.”13 Unfortunately, as discussed in further detail in Chapters 3 and 5, these agreements appear to have fallen short of original expectations, due in part to a lack of agreement regarding the types of data that fall under each agency’s mission but also to ongoing difficulties in securing adequate resources for archiving and provide access to large amounts of data for extended periods of time. It should also be noted that data management responsibilities are just one of the issues that have impeded the transition from research to operations, as discussed in the next section. The United States is also a signatory to a number of international agreements relating to climate and global change that either explicitly require NOAA to archive certain data or call for actions that can be carried out only if reliable data archives are in place. Several World Data Centers operating in the United States under the International Council for Science are housed within NOAA. For example, the World Data Center for Paleo- climatology is operated by NCDC and collocated with NGDC in Boulder, Colorado. World Data Centers operate under rules that require sustained archiving and convenient access to data. Other international agreements, both global and regional, are also predicated on the existence of archived data sets to perform analyses and assessments. For example, the Global Climate Observing System’s (GCOS’s) Second Adequacy Report and the GCOS Implementation Plan were both requested and accepted by the Conference of the Parties to the United Nations Framework Convention on Climate Change, of which the United States is a signatory, to assess and improve global observations of climate change. The Committee on Earth Observation Satellites (CEOS), of which NOAA is a member, provided a coordinated response by space agencies for mitigation of the inadequacies identified by GCOS; these included several actions with respect to NOAA sensors and missions (CEOS, 2006). More recently, the United States has joined many other countries in the formation of the Earth Observation Summit and the subsequent Group on Earth Observations (GEO). The United States has played a leading role within the GEO framework in designing the Global Earth Observation System of Systems (GEOSS). Its 10-�ear Implementation Plan, adopted in February 2005, includes the task “To exchange, disseminate, and archive shared data, metadata, and products.” The Global Earth Observation Integrated Data Environment (GEO-IDE) concept of operations (NOAA, 2006c), discussed in further detail later in this chapter, specifically notes that “NOAA must be able to successfully integrate information from all of 13 NOAA Solicitation No. EA133E-07-RP-0041, issued March 27, 2007.

OCR for page 12
 ENVIRONMENTAL DATA MANAGEMENT AT NOAA its goal areas and exchange data with partners in the national US-Global Earth Observation System (US-GEO) and the international Global Earth Observation System of Systems (GEOSS).” These agreements place addi- tional burdens on NOAA’s data centers, but they also offer opportunities to collaborate with other countries in order to increase the realized ben- efits of archived data. Expanding Data Volumes In the NOAA 2007 budget request,14 it is noted that “Collectively, the three national data centers acquire over one petabyte (1015 bytes) of new data annually, provide access to an archive exceeding 3.5 petabytes, and support over 100 million worldwide [individual data requests] per year, providing data transfers to over two million customers.” NOAA’s data archive and access volumes are growing rapidly, as demonstrated by the NCDC data access statistics shown in Figure 2-1. This explosive growth is expected to continue over the next decade and beyond, even with the recent de-scoping of the National Polar Orbiting Environmental Satellite System (NPOESS) and Geostationary Operational Environmental Satellite (GOES) programs (NASA-NOAA, 2007), as illustrated in Figure 2-2. Improvements in data storage technology, data processing speeds, and other technological advances (e.g., data compression techniques) will help NOAA accommodate the anticipated increase in data volumes, assuming that data management budgets and historical rates of techno- logical change either stay the same or increase. It is less clear, however, if improvements in internet bandwidth will be able to keep pace with increasing data access demands, especially since NOAA’s user base is expanding. The challenges associated with providing effective data inte- gration strategies, discovery tools, and support services for such a large and diverse volume of data are also likely to persist. Together, these chal- lenges will make it difficult for NOAA’s data centers to continue to meet their mission requirements, much less expand their archival and access capabilities, under current and projected funding levels. The archive volume growth shown in Figure 2-2 is driven mainly by a few large but relatively homogeneous data streams, such as satellite observations, model output, and radar data. Simply archiving and pro- viding access to these “large-array” data streams poses a significant data management challenge, but this challenge is magnified when considered in the context of NOAA’s other data management activities. For exam- ple, many of the nation’s most pressing environmental problems require 14 NOAA 2007 budget request “blue book,” available at http://www.corporateservices. noaa.gov/~nbo/07bluebook_highlights.html.

OCR for page 12
 BACKGROUND Data Delivered to Users each Month - NCDC 44 40 36 32 28 Terabytes 24 20 16 12 8 4 0 Jan- Apr- Jul- Oct- Jan- Apr- Jul- Oct- Jan- Apr- Jul- Oct- Jan- Apr- Jul- Oct- Jan- 03 03 03 03 04 04 04 04 05 05 05 05 06 06 06 06 07 NEXRAD CLASS NOMADS CDO TOTAL NEXRAD = Radar CLASS = Currently Satellite NOMADS = Models CDO = Climate In-Situ FIGURE 2-1 Monthly data downloads Data Center -National Climatic Data Center National Climatic from the Monthly Report to NESDIS (NCDC), in terabytes, for the period January 2003–March 2007: total (brown dots and line) and broken into four categories—NEXRAD, or NEXt-generation RADar data (blue); CLASS, the Comprehensive Large-Array [data] Stewardship System, which currently archives only satellite data (pink); NOMADS, or National Opera- tions Model Archive Distribution System (yellow); and CDO, or in situ climate data available through Climate Data Online (light blue). (SOURCE: Thomas Karl, NCDC, personal communication.) 2-1 integrated access to multiple data sets, so the large-array data need to be made available to users in formats compatible with corresponding data from medium- and small-volume data sets. Data also need to be made available at varying levels of complexity and resolution to meet the requirements of a user base with a wide range of technical and scientific sophistication. Data diversity is a further challenge. NOAA’s consolidated observa- tion requirements include more than two thousand diverse variables, ranging from hyperspectral satellite imagery to the stomach contents of fish, and these data come from a broad range of platforms, including (but not limited to) satellites, fixed and mobile radars, research aircraft, buoys, ships of opportunity, and mesoscale networks. NOAA also generates and collects a large and rapidly growing volume of reanalysis data and model output, as well as multiple versions of various reprocessed observational data, each of which raises its own data management issues (see Chapter 5). In addition, NOAA produces a wide variety of informational and other

OCR for page 12
0 ENVIRONMENTAL DATA MANAGEMENT AT NOAA FIGURE 2-2 NOAA NESDIS data archive volume projections, including backup copies, in petabytes. (SOURCE: Updated March 2007 from NOAA, 2003.)

OCR for page 12
 BACKGROUND products (see Appendix C), many of which need to be archived and made accessible for a variety of applications. All the different data streams managed by NOAA are important for answering environmental questions, and each is associated with unique data management challenges. The data centers have evolved in part to address these unique challenges. Each center serves an important and diverse user community and has developed data management strategies designed to best meet the needs of their customers. The data centers also have a long history of providing excellent, discipline-specific user sup- port. For example, NCDC has worked with a consortium of government agencies, under the auspices of the American Society of Civil Engineers (ASCE), to produce a national climatology of ice thickness due to freez- ing rain. Ice thickness (and therefore weight) is a key engineering design consideration in the construction of many structures that are subject to outdoor weather. The resulting design values of ice thickness, computed on the basis of an extreme value analysis, were included in ASCE Stan- dard 7-05, Minimum Design Loads for Buildings and Other Structures (ACSE, 2006). The value of NOAA’s current data archiving and access activities cannot be overemphasized. However, data management activities across NOAA could be better coordinated and integrated to help future users address increasingly inter- and multi-disciplinary environmental prob- lems. For example, the solution of environmental problems in coastal or estuarine areas often requires the integration of diverse terrestrial, hydro- logical, biological, and physical oceanic observations as well as atmo- spheric data. Similarly, the response of permafrost areas to global warm- ing will require the integration of atmospheric observations with surface and subsurface cryospheric properties as well as surface hydrological and ecological observations. Chapter 7 provides additional examples and fur- ther discussion of how the coordination and integration of NOAA’s data management activities could be improved. Addressing the Data Challenge The diverse nature and rapidly growing volume of NOAA’s data holdings, in an environment of increasing user demands and flat or declining overall agency funding, represents a formidable data manage- ment challenge. NOAA clearly recognizes the magnitude of this challenge and has already taken several significant steps to address the ability of its data centers to handle the current and future needs of their users. The NOAA Observing Systems Council (NOSC) serves as the principal advisory body to the NOAA Administrator and is the focal point for the agency’s observing system activities and interests. An additional purpose

OCR for page 12
 ENVIRONMENTAL DATA MANAGEMENT AT NOAA of NOSC is to coordinate data management activities across NOAA.15 The NOAA Data Management Committee (DMC) is the NOSC agent for coordinating the development and implementation of data management policy across NOAA. The DMC has broad latitude and authority to coor- dinate data management activities and make data management decisions for NOAA.16 One of the primary objectives of this report is to provide the NOSC and DMC with the information they need to integrate sound data management principles with the NOAA Observing Systems Architecture (NOSA) and throughout the entire NOAA data management enterprise. NOAA has submitted several reports to Congress that assess its data management activities and describe its future plans (NOAA, 2003; NOAA, 2005; NOAA, 2006b). The most recent of these reports, NOAA’s Eniron- mental Data Management: Integrating the Pieces (NOAA, 2006b), provides a resource needs assessment for each of NOAA’s mission goals and indi- vidual programs across a broad range of end-to-end data management functions. These reports have raised the level of awareness—both inside and outside of NOAA—of the challenges faced by the agency. While examining these reports reveals that considerable progress has been made over the past several years, the 2006 report concludes that “NOAA [still] faces ongoing challenges in managing increasing diversity and volumes of data, enabling data integration, and addressing real-time dissemination demands.” NOAA has also started to recognize the value of seeking external help when making important data management decisions. In addition to this report, which was requested by NOAA to provide high-level principles and guidelines that it can use as it continues to develop its data manage- ment plans, NOAA has formed a Data Archive and Access Requirements (DAAR) Working Group, organized under its Science Advisory Board, to provide ongoing advice on data management activities. The DAAR Work- ing Group is tasked to prioritize the data sets and products that NOAA should archive and to provide specific recommendations for data access, using the principles and guidelines enumerated in this report, along with other materials, to aid their decision process (see Appendix D). NOAA has also attempted to communicate more effectively with its user communi- ties via Web surveys and annual user workshops. The continuation and expansion of these types of activities are essential to ensure that NOAA achieves its vision of “an informed society that uses a comprehensive understanding of the role of the oceans, coasts and atmosphere in the global ecosystem to make the best social and economic decisions.” 15 http://www.nosc.noaa.gov/purpose.html. 16 http://www.nosc.noaa.gov/dmc/dmc_tor.html.

OCR for page 12
 BACKGROUND GEO-IDE and CLASS NOAA’s GEO-IDE is envisioned as a “system-of-systems” that will improve the exchange and integration of data between NOAA and its partners and thus provide easier and more cost-effective access to their diverse environmental data holdings by users (Figure 2-3). This frame- work parallels the international effort, spearheaded by NOAA, to build and maintain GEOSS, a federated but coordinated system of systems for global observations (Group on Earth Observations, 2005). The goals of GEO-IDE, as outlined in the GEO-IDE Concept of Operations (NOAA, 2006c), are to (1) “take full advantage of the opportunities presented by internet technology to make access to environmental data and informa- tion as easy and effective as access to digital documents over the Web is today,” and (2) “improve efficiency and reduce costs by bridging the barriers between existing, independent ‘stovepipe’ systems and integrat- ing the data management activities of all NOAA programs, while avoid- ing a fully centralized approach.” The GEO-IDE concept of operations specifically advocates a federated approach to achieve these goals and FIGURE 2-3 A vision for GEO-IDE (SOURCE: Lautenbacher, 2006.)

OCR for page 12
 ENVIRONMENTAL DATA MANAGEMENT AT NOAA emphasizes the importance of using thoroughly documented and sup- ported standards. The vision for GEO-IDE articulated in the Concept of Operations doc- ument is similarly ambitious. It includes provisions for access (“friendly and flexible mechanisms to locate and access data and data products”); integration (“products in multiple formats and communication proto- cols”); standards (“utilization of current information technology stan- dards where they are mature, and best practices where accepted standards are still evolving”); user needs (“satisfying the diverse requirements of operations, research, monitoring, and archives”); and user feedback (“a continuous, vigorous outreach process to identify and remedy difficulties encountered by any users”). NOAA has also developed a draft implemen- tation plan for GEO-IDE (NOAA, 2006b) that addresses, among other areas, management and leadership needs, standards (and processes to arrive at standards), forward-looking technology, and a vision for user access. The vision, goals, and overall concept of GEO-IDE appear to be both well conceived and, for the most part, compatible with the principles and guidelines offered in the following chapters, particularly those that pertain to data discovery, access, and integration. However, GEO-IDE remains very much in the conceptual phase, as can be inferred from the first two steps of the implementation plan, which are to “establish the project management structure” and to “secure funding.” It should also be noted that the “hood” in Figure 2-3 represents a significant challenge due to the diversity of the incoming data, the projected increases in future data volumes, and the difficulty in making these data discoverable, accessible, and understandable to a broad range of users. In addition, the GEO-IDE planning documents were developed primarily by a single person, and only one full-time employee is currently allocated for implementation. Thus, GEO-IDE appears to be an important and well-thought-out project that could eventually lead to the effective, integrated, end-to-end data management system demanded by NOAA’s mission, but the project is significantly under-resourced given the scope of the effort. CLASS currently provides storage and online access to NOAA’s large- array satellite data and satellite-derived products, but CLASS is planned to evolve into a “unified enterprise data access system that centralizes NOAA’s numerous data systems” and provides “long-term, secure stor- age and access to data, information, and metadata of NOAA’s archived assets” (NOAA, 2006d). Hence, CLASS will first need to handle the high- volume data streams depicted in Figure 2-2, but it will have the poten- tial to eventually handle a broader spectrum of NOAA’s environmental data. Rather than a static, top-down approach, the current development plans for CLASS (NOAA, 2006d) call for “an open system architecture

OCR for page 12
 BACKGROUND whose design will easily accommodate interoperability with existing and new systems as they come on line.” CLASS is also expected to “support NOAA’s archive and science data stewardship missions by providing the IT [information technology] portion of an archive.” Hence, CLASS will provide the hardware, software, and related support services needed to store and provide access to incoming data, while stewardship respon- sibilities presumably remain the province of experts at NOAA’s data centers. The vision for CLASS has evolved rapidly since the formation of this committee from a holistic, top-down system providing a variety of archive, stewardship, and access functions toward a more limited, feder- ated system that will primarily provide archiving and access services for the high-volume data streams shown in Figure 2-2. The current plans for CLASS (NOAA, 2006a, 2006c, 2006d) appear to be consistent with the goals and vision of GEO-IDE and with many, but not all, of the principles and guidelines provided in this report. For example, it is not clear how data sets will be evaluated for inclusion into CLASS as the system evolves; the principles introduced in the next chapter, and explored in further detail in the chapters that follow, demand that these decisions should be made on a case-by-case basis, with input from users. The extent to which the data handled by CLASS can be effectively integrated with other envi- ronmental data streams, including those collected by other federal agen- cies, also remains to be seen. Finally, the huge data volumes projected to be handled by CLASS will require substantial, ongoing resources to support the hardware and personnel needed to provide reliable archive and access capabilities. In addition to the challenge of scaling, the collective activities required to move from the successful, but not well-coordinated or inte- grated, user-support capabilities at the various data centers toward the integrated, mission-focused applications encapsulated by the GEO-IDE vision will require both outside guidance and focused funding. These requirements have proven elusive in the past. Given the rapid evolu- tion of GEO-IDE and the many changes in the scope of CLASS that have taken place, the required integration of GEO-IDE, CLASS, and other data management efforts does not appear to be very well coordinated across NOAA. A related concern is that the rapid evolution of both GEO-IDE and, especially, CLASS might cause the scientific focus of these projects to become lost within the acquisition processes. Improved mechanisms will be needed to ensure that data management activities receive sufficient resources and that they are effectively coordinated across all elements of NOAA’s data management enterprise. NOAA will also need to solicit and incorporate input from disciplinary experts both inside and outside of the agency, as well as other potential users, in order to make sure that the

OCR for page 12
 ENVIRONMENTAL DATA MANAGEMENT AT NOAA integrated data management system effectively translates environmental data into the societal benefits required to meet NOAA’s mission require- ments through the GEO-IDE framework. PREVIOUS WORK The importance of long-term data collection, data stewardship, the need to archive and provide access to environmental data for societal benefit, the importance of consulting with users when making data man- agement decisions, and the recognition of the significant costs and other challenges involved in these activities are all issues that the scientific and data management communities have discussed for a number of years and for a variety of reasons. Previous reports from the National Research Council (NRC) and other entities have helped set the stage for this report by documenting and analyzing data management practices from various perspectives and by providing findings and recommendations that can be readily applied to the environmental data streams and data management activities at NOAA in particular. For example, Presering Scientific Data on Our Physical Unierse: A New Strategy for Archiing the Nation’s Scientific Information Resources (NRC, 1995b) provided a wide range of advice to NARA and other federal agencies on the long-term retention of scientific and technical data, particularly in electronic formats. This advice included fundamental principles for retaining scientific information, many of which have direct analogs in this report, as well as an extensive discussion of general retention criteria for many different types of data. A more recent report, Climate Data Records from Enironmental Satel- lites (NRC, 2004a), focuses more specifically on generating, analyzing, and archiving the records that are most useful for understanding climate variability and change. That report, while focusing almost exclusively on climate data, offers many findings and recommendations that could be applied more broadly to environmental data from diverse sources with a simple change of wording from “climate data record” to “environmental data management,” such as this paragraph from the executive summary. Underlying many of the elements of success is early attention to data steward- ship, management, access and dissemination policies, and the actual practices implemented. Because a successful climate data record program will ultimately require reprocessing, data sets used in their creation, such as metadata, should be presered indefinitely in formats that promote easy access. The ultimate legacy of long-term climate data record programs is the data left to the next genera- tion, and the cost of data management and archiing must be considered as an integral part of eery climate data record program. [Emphasis added]

OCR for page 12
 BACKGROUND Two recent reports from the NRC’s Committee on Earth Science and Applications from Space have highlighted the critical issues facing the U.S. environmental satellite program. The overall conclusion of that com- mittee’s interim report, Earth Science and Applications from Space: Urgent Needs and Opportunities to Sere the Nation (NRC, 2006), is that budgetary pressures have severely altered the scope of future environmental satellite missions, particularly NPOESS, to the point where the overall continuity and security of entire classes of space-based environmental observations are at risk of collapse. However, that committee also offered the following recommendation regarding data management at NOAA: The committee recommends that NOAA, working with the Climate Change Science Program and the international Group on Earth Obserations, create a climate data and information system to meet the challenge of ensuring the production, distribution, and stewardship of high-accuracy climate records from NPOESS and other releant obserational platforms. Awareness about the threatened satellite systems has expanded beyond the agencies and is now the focus of discussions throughout aca- demia and the U.S. Congress. The same committee’s final report, Earth Sci- ence and Applications from Space: National Imperaties for the Next Decade and Beyond, (NRC, 2007), proposes a series of satellite missions designed to preserve the continuity of essential space-based observations and to meet the most essential national needs in a cost-effective manner. The extent to which the recommendations of this “Decadal Survey” will be embraced by the current administration and relevant federal agencies remains unclear. It should be noted, however, that data access improvements that bring current satellite data archives to bear more effectively on problems with demonstrable societal benefits would support the argument to maintain an adequate space-based environmental observing system. The future evolution of U.S. satellite observations will no doubt have a significant impact on future archiving and access requirements at NOAA, NASA, and other federal agencies. However, even if the increases in the volume of incoming satellite data are not as large as originally projected, NOAA still faces an acute data management challenge. In fact, the de-scoping of NPOESS may represent a window of opportunity to make fundamental improvements to existing observation and information systems. Several reports specific to government data centers, including those at NOAA, have also been published. The report Goernment Data Cen- ters: Meeting Increasing Demands (NRC, 2003a) summarizes a workshop exploring how the increasing volume and number of data sets, coupled with greater demands from more diverse users, are making it difficult for data centers at a number of federal agencies to maintain records of envi-

OCR for page 12
 ENVIRONMENTAL DATA MANAGEMENT AT NOAA ronmental change. The report focuses on technological approaches that could enhance the ability of environmental data centers to deal with these challenges and to improve the ability of users to find and use the infor- mation held in these centers. The NRC has also provided more focused reviews of NOAA’s NGDC (NRC, 2003b), NOAA’s NCDC (NRC, 1994), and NASA’s Distributed Active Archive Centers (NRC, 1998), which offer a variety of insights on how to best archive and provide access to envi- ronmental data. NRC (1998) recommends that “to function optimally, the Data Archive and Access Centers need to be intimately involved with the scientific community they serve. The Data Archive and Access Centers should deliberately pursue and improve routine, daily interactions with active scientists who use their data holdings.” NRC (2003b) states that “NGDC should develop an integrated approach to the stewardship of environmental data and operate in such a way that shareable services and functions (for example, database management, software develop- ment) serve all NGDC disciplines” and “improve scientific involvement of center personnel with the data sets by recruiting scientists to work with the data, establishing a vigorous program of external visiting scientists, and/or creating strong partnerships with other agencies, industry, and academia to supplement staff expertise.” Because the rapid increase in satellite data volumes is one of the most important forcing factors for NOAA’s data management planning, a rel- evant and timely NRC report is Utilization of Operational Enironmental Satellite Data: Ensuring Readiness for 00 and Beyond (2004b), which offers findings and recommendations aimed at defining specific approaches to resolving the potential overload faced by two agencies—NOAA and NASA—responsible for satellite data. The report focuses on the end-to- end utilization of environmental satellite data by characterizing the links from the sources of raw data to the end requirements of various user groups. It is an important foundation document because it addresses three areas that still present challenges: (1) the value of and need for environ- mental satellite data; (2) satellite data distribution and storage; and (3) satellite data access and utilization. Some of its findings are particularly relevant in the context of this report, including (in brief): • Improved and continuous access to environmental satellite data is of the highest priority for an increasingly broad and diverse range of users. • The national and individual user requirements for multiyear climate system data sets from operational satellites are placing special demands on current and future data archiving and utilization systems. • Data from diverse satellite platforms and for different environ- mental variables must often be retrieved from different sources, and these

OCR for page 12
 BACKGROUND retrievals often yield data sets with different formats, resolutions, and/or grid properties. The multiple steps currently required to retrieve and manipulate environmental satellite data sets are an impediment to their use. • Early and ongoing cooperation and dialog among users, develop- ers of satellite remote sensing hardware and software, and U.S. and inter- national research and operational satellite data providers are essential for the rapid and successful utilization of environmental satellite data. Many of the satellite data utilization success stories have a common theme: the treatment of research and operations as a continuum, with a relentless team focus on excellence with the freedom to continuously improve and evolve. Several other NRC reports have focused on the challenges, many of which remain unresolved, associated with the transition from research to operations in satellite data. Among those reports are From Research to Operations in Weather Satellites and Numerical Weather Prediction: Crossing the Valley of Death (NRC, 2000) and Satellite Obserations of the Earth’s Eni- ronment: Accelerating the Transition of Research to Operations (NRC, 2003c). There are still many benefits to be gained by promoting the effective tran- sition of research satellite data to operational usage, and one way these benefits could be more fully realized is through more effective data access and integration. Satellite data can seldom stand accurately on their own; other in situ observations are critical to provide “ground truth” to cali- brate and supplement space-based observations. Since NOAA is obliged to be a steward for both types of data, it is therefore well positioned to realize the potential benefits of improved data access and integration for a broad user community. A number of other reports are of particular importance. Global Change Science Requirements for Long-Term Archiing (USGCRP, 1999) provides high-level guidance on long-term archiving of Earth observation data and derived products, including the guiding principles and essential function- ality necessary for any long-term data archive to be successful, along with lessons learned from current and past experiences. Many of the principles offered in that report have direct analogs in the principles offered in this report, and the report also inspired some of the more detailed definitions and guidelines offered in the chapters that follow. The report Recommendation for Space Data System Standards: Reference Model for an Open Archial Information System (Consultative Committee for Space Data Systems, 2002) is an internationally developed set of rec- ommendations about how open archival information systems should be structured. Among other tasks, the “OAIS” document defines a model system that

OCR for page 12
0 ENVIRONMENTAL DATA MANAGEMENT AT NOAA • provides a framework for the understanding and increased aware- ness of archival concepts needed for long-term digital information pres- ervation and access; • provides the concepts needed by non-archival organizations to be effective participants in the preservation process; • provides a framework, including terminology and concepts, for describing and comparing architectures and operations of existing and future archives; • provides a basis for comparing the data models of digital informa- tion preserved by archives and for discussing how data models and the underlying information may change over time; • provides a foundation that may be expanded by other efforts to cover long-term preservation of information that is not in digital form; and • expands consensus on the elements and processes for long-term digital information preservation and access. Similarly, the National Science Board, at the request of NSF, has reported on the importance of long-term archives in Long-lied Digital Data Collections: Enabling Research and Education for the st Century (NSB, 2005). Here there is recognition of the need for formal and established data policies and procedures, a clear technical and financial strategy, and a broad dialogue among agencies that collect and archive data. Data archiving and access are also identified as critical components in the mul- tiagency Data Management and Communications (DMAC) subsystem of the Integrated Ocean Observing System (IOOS) plan.17 As implementa- tion of IOOS/DMAC takes shape, two things are obvious: data archiving and access capabilities are required components in all end-to-end data management systems; and all agencies that collect data need to ensure these capabilities through a formalized data management plan. Other relevant reports pertaining to the challenges of data archiving include the International Council for Science (ICSU) Report of the CSPR Assess- ment Panel on Scientific Data and Information (ICSU, 2004) and the Final Report from the Workshop on Research Challenges in Digital Archiving and Long-Term Preservation (NSF, 2003). Together, these documents and many others—including those avail- able on the Web sites of facilities responsible for data management—pro- vide a solid foundation on which to develop principles and guidelines for archiving and providing access to environmental data at NOAA. In the present report, the concepts that have gained broad acceptance in the data management community have been synthesized down to the most 17 http://dmac.ocean.us/dacsc/imp_plan.jsp.

OCR for page 12
 BACKGROUND essential elements and then applied to NOAA’s current data management challenge. The challenges faced by NOAA include not only how to be a wise steward of such a broad range of data managed at its three National Data Centers and centers of data, but also how to meet its growing data archive and data access demands with limited resources. The principles and guidelines offered in the chapters that follow are designed to be applicable across the broadest possible range of circumstances, yet with attention to the practical considerations that should always be kept in mind when managing such an important national asset.