1
Introduction

BACKGROUND

There can be no question that the National Aeronautics and Space Administration's (NASA's) Earth Observing System (EOS)—and its implementation into the Mission to Planet Earth (MTPE), now renamed the Earth Science Enterprise (ESE)—is the most ambitious effort ever attempted to study our planet and its set of complex interlocking systems. Although the original concept has evolved substantially over the years, the overall scientific goal endures, and "earth system science" (see NASA, 1988) has gained the respectability of a bona fide discipline.

One of the major components of the enterprise has been, and remains to this day, the EOS Data and Information System (EOSDIS). EOSDIS itself is an ambitious endeavor, not only because it must deal with data volumes and acquisition rates that are several orders of magnitude larger than ever before in the earth sciences, but also because its stated goal is to permit seamless access to multidisciplinary data in a timely manner. This has never been attempted before on such a scale, and the hope is that EOSDIS will facilitate research that would not be contemplated otherwise. Providing access to EOSDIS data is the job of the Distributed Active Archive Centers (DAACs). Although the DAAC system per se is not the largest component of EOSDIS in terms of cost, it holds a critical position in the overall architecture of the system, first because the user interacts with the DAACs and second because the burden of maintaining operational access to large and complex multidisciplinary data sets befalls them most directly.

Ever since they were chosen by NASA, the DAACs have drawn both praise



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 9
Review of NASA'S Distributed Active Archive Centers 1 Introduction BACKGROUND There can be no question that the National Aeronautics and Space Administration's (NASA's) Earth Observing System (EOS)—and its implementation into the Mission to Planet Earth (MTPE), now renamed the Earth Science Enterprise (ESE)—is the most ambitious effort ever attempted to study our planet and its set of complex interlocking systems. Although the original concept has evolved substantially over the years, the overall scientific goal endures, and "earth system science" (see NASA, 1988) has gained the respectability of a bona fide discipline. One of the major components of the enterprise has been, and remains to this day, the EOS Data and Information System (EOSDIS). EOSDIS itself is an ambitious endeavor, not only because it must deal with data volumes and acquisition rates that are several orders of magnitude larger than ever before in the earth sciences, but also because its stated goal is to permit seamless access to multidisciplinary data in a timely manner. This has never been attempted before on such a scale, and the hope is that EOSDIS will facilitate research that would not be contemplated otherwise. Providing access to EOSDIS data is the job of the Distributed Active Archive Centers (DAACs). Although the DAAC system per se is not the largest component of EOSDIS in terms of cost, it holds a critical position in the overall architecture of the system, first because the user interacts with the DAACs and second because the burden of maintaining operational access to large and complex multidisciplinary data sets befalls them most directly. Ever since they were chosen by NASA, the DAACs have drawn both praise

OCR for page 9
Review of NASA'S Distributed Active Archive Centers and criticism for their ability to serve their users. In some cases, the DAACs were criticized by implication as a result of reviews of EOS or EOSDIS. However, the DAACs have never been systematically assessed as components of a comprehensive data and information system. NASA has initiated such an assessment through a "recertification" process, which is being conducted in two stages. The first stage, an external peer review of the DAACs, has been conducted under the auspices of the National Research Council's (NRC's) Committee on Geophysical and Environmental Data (CGED). This report gives the results of that peer review, the first that the DAACs have ever undergone. The second stage of the recertification process will be conducted by a NASA panel, which will evaluate the results of the peer review in the context of NASA's programmatic and budgetary priorities. NASA management will then decide whether to recertify, place on probation, or close individual DAACs. Each DAAC manages a different kind of scientific data—atmospheric, oceanic, solid-earth, polar, biospheric—and serves a unique blend of user communities. Because no single committee has the appropriate composition to review all the centers, seven review panels were established to conduct the site visits. The DAACs reviewed are located at the Goddard Space Flight Center (GSFC), Langley Research Center (LaRC), Earth Resources Observation Systems (EROS) Data Center (EDC), Alaska Synthetic Aperture Radar (SAR) Facility (ASF), Jet Propulsion Laboratory (JPL; Physical Oceanography DAAC [PO.DAAC]), National Snow and Ice Data Center (NSIDC), and the Oak Ridge National Laboratory (ORNL). (At NASA's request, the Socio-Economic DAAC [SEDAC] located at the Consortium for International Earth Science Information Networks [CIESIN] was not reviewed.) The CGED established the criteria for review and provided the panels with similar agendas and briefing materials so that the site visits would be conducted as uniformly as possible. A description of the study process is given in the Preface to this report. In this report the CGED does not recommend whether or not any DAACs should be closed or placed on probation. Rather, based on the criteria for review listed in Appendix B, the committee and its panels commend the DAACs' successes and identify issues that require greater attention, with the overall goal of improving the DAACs' ability to serve their users. EOS is a science program that is designed to serve scientists. Thus, the CGED review focuses primarily on how well the DAACs serve the scientific community and secondarily on other types of users. Finally, because the DAACs exist as part of a system rather than as independent entities, the committee also addresses overarching issues regarding the DAACs as components of EOSDIS. (In keeping with its charge, the committee evaluated the DAAC's performance against their mission, but not against alternative ways of achieving the same goals. Similarly, the committee did not review EOSDIS as a whole, or the role of EOSDIS in global change research. The latter is addressed in NRC, 1998a.) This report was written at a time when technical difficulties, budgetary pressures, technological advances, and new management

OCR for page 9
Review of NASA'S Distributed Active Archive Centers approaches were fast changing the face of EOSDIS. Nonetheless, the committee believes that the report provides a baseline from which the progress of individual DAACs or the DAAC system as a whole can be measured. ROLE OF THE DAACS IN EOSDIS EOSDIS was designed to perform a variety of functions, from spacecraft command and control, to data acquisition, processing, distribution, and archive. Linking these disparate functions are the hardware and software that comprise the EOSDIS Core System (ECS). The architecture of EOSDIS, as defined by NASA, is illustrated schematically in Figure 1.1. In general, data acquired from spacecraft will be captured and processed by the ECS contractor to Level 0 (see Table 1.1), then transferred to the DAACs. Some DAACs also receive in situ data, usually from principal investigators of field experiments or process studies. Such data will be used to calibrate and validate the satellite measurements and gain a more complete understanding of the phenomena being studied. Using algorithms developed by the science and instrument teams, the DAACs and/or science teams will process the Level 0 data into Level 1 and higher standard data products. Some of the data will be processed in near real time, although this is not an EOSDIS requirement, given the complexity of many of the algorithms. Typically, the higher-level products will be processed by the science teams. The data products will then be disseminated by the DAACs, which will also provide support services to users and archive the data and data products. The DAAC System The DAACs' partners in EOSDIS include the ECS contractor, science and instrument teams, and the Earth Science Data and Information System (ESDIS) Project. Their primary roles, which are discussed in more detail below, are as follows: the ECS contractor builds the information system to capture and process data, and link the DAACs together; instrument and science teams develop algorithms for processing data and generate data products; DAACs process and disseminate data and provide user services; and the ESDIS Project sets the requirements for the information system and coordinates the DAAC system. An organization chart, with the reporting lines within EOSDIS is shown in Figure 1.2. The DAACs themselves form a loose consortium for solving problems in

OCR for page 9
Review of NASA'S Distributed Active Archive Centers FIGURE 1.1. EOSDIS architecture, from flight operations and data acquisition (TDRSS = Tracking and Data Relay Satellite System) to data processing and dissemination. NASA has divided the DAACs into centers handling remote sensing data (ASF, EDC, GSFC, JPL [PO.DAAC], LaRC, and NSIDC DAACs) and centers handling in situ and other types of data (ORNL DAAC and SEDAC). SOURCE: NASA Headquarters.

OCR for page 9
Review of NASA'S Distributed Active Archive Centers TABLE 1.1. Data Set Processing Levels Data Level Description Level 0 Reconstructed unprocessed instrument or payload data at full resolution; any and all communications artifacts (e.g., synchronization frames, communications headers) removed Level 1A Reconstructed unprocessed instrument data at full resolution, time referenced, and annotated with ancillary information, including radiometric and geometric calibration coefficients and georeferencing parameters (i.e., platform ephemeris) computed and appended, but not applied, to the Level 0 data Level 1B Level 1A data that have been processed to sensor units (not all instruments will have a Level 1B equivalent) Level 1C TRMM-specific for quality content of Level 1B precipitation radar and ground validation data Level 2 Derived geophysical variables at the same resolution and location as the Level 1 source data Level 3 Variables mapped on uniform space-time grid scales, usually with some completeness and consistency Level 4 Model output or results from analyses of lower-level data (i.e., variables derived from multiple measurements) NOTE: TRMM = Tropical Rainfall Measuring Mission. SOURCE: Asrar and Greenstone (1995). common. For example, the DAACs worked collectively to implement Version 0 of EOSDIS, which permits users to browse the holdings of all the DAACs, although not in a seamless or transparent manner. Instrument interdependencies, which require data products to be transferred between and distributed by several DAACs (e.g., Moderate Resolution Imaging Spectroradiometer [MODIS] products will be transferred among three DAACs), also draw the DAACs together. The current consortium of DAACs includes eight discipline centers, although the SEDAC has never been fully integrated into the group. Each DAAC has a User Working Group whose membership is tailored to the mission and objectives of the DAAC. The DAACs, their host institutions, and their scientific specialties are listed in Table. 1.2. Most of the DAACs were created from preexisting data operations. As a result of this heritage, each DAAC has its own information system for managing data and serving users, and holdings, sometimes going back decades. The DAACs' responsibilities within EOSDIS also vary, which has led to large differences in size, budget, and numbers of personnel among them. For example, the DAACs primarily responsible for managing data from the AM-1 platform and other near-term missions (GSFC, LaRC, and EDC) tend to have the largest staff

OCR for page 9
Review of NASA'S Distributed Active Archive Centers FIGURE 1.2. Reporting lines within EOSDIS as of December 1998. The program executives at NASA Headquaters are advocates for various programs at Goddard, Langley, and JPL, but do not provide oversight.

OCR for page 9
Review of NASA'S Distributed Active Archive Centers TABLE 1.2. NASA'S Distributed Active Archive Centers DAAC Host Institution Scientific Specialty ASF DAAC Alaska SAR Facility University of Alaska Sea Ice, polar processes EDC DAAC EROS Data Center U.S. Geological Survey Land processes GSFC DAAC Goddard Space Flight Center NASA Upper atmosphere, atmospheric dynamics, global biosphere, hydrologic processes LaRC DAAC Langley Research Center NASA Radiation budget, aerosols, tropospheric chemistry NSIDC DAAC National Snow and Ice Data Center University of Colorado Snow and ice, cryosphere ORNL DAAC Oak Ridge National Laboratory Department of Energy Biogeochemical Fluxes and Processes PO.DAAC Jet Propulsion Laboratory NASA-Caltech Ocean circulation, air-sea interaction SEDAC CIESIN Columbia University Socio-economic data and applications and the most ECS-supplied hardware, software, and personnel (Table 1.3). DAACs that will receive data mainly from later missions (e.g., PO.DAAC and NSIDC) tend to have smaller numbers of personnel and lower budgets. (The NSIDC DAAC will not receive Level 0 data from the AM-1 platform, but will receive Level 2 AM-1 data pertinent to the cryosphere from the GSFC DAAC.) Table 1.4 lists the instruments, what they measure, and the anticipated launch dates of the EOS missions. The ORNL DAAC, which manages field-based observations rather than satellite measurements, is the smallest of the DAACs. The ECS contractor (currently Raytheon Systems Corporation, formerly Hughes Information Technology Company) is responsible for designing an information system for DAAC functions ranging from ingest to archive. It has supplied the AM-1 DAACs (GSFC, LaRC, EDC, and NSIDC) with equipment, early versions of toolkits, software, subsystems, and personnel for installing and maintaining the ECS. The other DAACs are scheduled to receive more complete versions of the ECS software several years from now. The DAACs played an advisory role while the ECS was being designed, but had no authority to change the functionality, implementation, or architecture of the system. Similarly, the DAACs have little control over how many ECS contractors are assigned to the DAAC to install the system or what tasks they will perform. Now that early versions of the software have been released, the DAACs

OCR for page 9
Review of NASA'S Distributed Active Archive Centers TABLE 1.3. DAACs at a Glance DAAC Number of Unique Users 1997a Average Annual Budget ($M) FY 1994–2002b Number of Staff FY 1998c Current Holdings (TB) AM-1 Instruments ASF 400 DAAC: JPLd: ECS: 6.0 6.6 0.6 DAAC: ECS: 65 1 110 None—foreign spacecraft: ERS-1,2; JERS-1; RADARSAT EDC 1,156 DAAC: ECS: 5.6 5.0 DAAC: ECS: 72 38 9 Landsat 7, ASTER, MODIS GSFC 12,216 DAAC: ECS: 5.1 10.2 DAAC: ECS: 74 40 4 TRMM, MODIS LaRC 804 DAAC: ECS: 7.2 3.6 DAAC: ECS: 84 6 3 CERES/TRMM, CERES/AM-1, MISR, MOPITT NSIDC 506 DAAC: ECS: 3.1 1.1 DAAC: ECS: 27 6 1 MODIS ORNL 1,143 DAAC: ECS: 2.2 0.1 DAAC: ECS: 14 1 6-7 GB None—data are most field based PO.DAAC 15,527 DAAC: ECS: 4.6 0.8 DAAC: ECS: 28 1 15 None—US-foreign collaborative missions: AMSR, SeaWinds, Jason NOTE: AMSR = Advanced Microwave Scanning Radiometer; ASTER = Advanced Spaceborne Thermal Emission and Reflection Radiometer; CERES = Clouds and the Earth's Radiant Energy System; ERS = European Remote Sensing Satellite; JERS = Japanese Earth Remote-Sensing Satellite; MISR = Multi-angle Imaging Spectroradiometer; MOPITT = Measurements of Pollution in the Troposphere; TRMM = Tropical Rainfall Measuring Mission. a Includes only users who received data. b DAAC, ECS, and JPL portions of the total budget are managed under separate contracts. c DAAC includes DAAC staff and civil servants. d Includes some non-DAAC data acquisition expenses. are responsible for remaining informed about the contents of the releases, testing the system in an operational setting, and identifying bugs and missing capabilities. One or two ECS liaisons are assigned to each DAAC to facilitate the necessary two-way communication. Instrument and science teams are responsible for developing algorithms for processing data from a particular instrument. The team either processes the data itself or transfers the algorithms to the DAAC for product generation. The DAAC

OCR for page 9
Review of NASA'S Distributed Active Archive Centers then disseminates the data products and provides user services. This arrangement requires good communication between the instrument teams, science teams, and the DAACs, not only to ensure that high-quality products are produced, but also to ensure that the DAACs are sufficiently knowledgeable about the products to provide a high level of user services. Each EOS instrument—and, for that matter, each remote sensing instrument flown by NASA—is developed by an instrument team working in concert with at least one science team. Thus, each DAAC will have to interact with several teams, depending on the complexity of the instrument and the number of instruments collecting data at any given time. The ESDIS Project, which is staffed by NASA engineers, computer programmers, and experts in budget and contract management, has several roles in the DAAC system. First, it is responsible for establishing the performance requirements to which the DAACs and the ECS contractor must adhere. The performance requirements were developed in consultation with the DAACs, the EOSDIS Panel, and the Mission to Planet Earth Program Office (now part of the Earth Science Enterprise). For the DAACs, the performance requirements focus on user service, data products, metadata, and processing. Second, ESDIS provides funding to the DAACs and evaluates requests for additional funds to create DAAC-unique extensions to the ECS. Such extensions permit the DAACs to support the specialized needs of their user communities. Similarly, requests for additional reprocessing or processing of nonstandard data products are evaluated by ESDIS. (The funding for the DAACs is managed separately from the ECS contract.) Third, ESDIS coordinates interaction between the DAACs and the ECS contractor. In the early stages of ECS development, questions or suggestions from the DAACs to the ECS contractor were routed through ESDIS. Now that the ECS is becoming operational, however, communications between the DAACs and the ECS contractor are more direct. Finally, ESDIS is responsible for the system-wide management and coordination of EOSDIS. Alaska SAR Facility DAAC: A Unique Arrangement The satellites contributing data to the ASF DAAC were launched several years ago. To manage the data, an information system had to be developed separately from the ECS, which was still in the design stages. The Jet Propulsion Laboratory was awarded the contract to develop processing, distribution, and archive capabilities for the ASF DAAC. (The JPL developers are separate from the PO.DAAC, which is also located at JPL.) In this sense, JPL plays the same role for the ASF DAAC as the ECS contractor plays for the other DAACs. JPL also serves as the instrument team for the ASF DAAC because it provides the processing algorithms. Within ESDIS, a separate office has been established to oversee the ASF DAAC and the JPL developers. A more complete description of this relationship and its impact on DAAC operations is given in Chapter 6 (ASF DAAC).

OCR for page 9
Review of NASA'S Distributed Active Archive Centers TABLE 1.4. Future EOS and Related Missions Satellite Acronym Measurement Launch Datea EOS AM-1 Platform • Clouds and the Earth's Radiation Energy System CERES/AM-1 Measures the Earth's radiation budget and atmospheric radiation from the top of the atmosphere to the surface March 1999 • Multi-angle Imaging Spectroradiometer MISR Determines planetary and surface albedo and aerosol and vegetation properties March 1999 • Moderate Resolution Imaging Spectrometer MODIS/AM-1 Measures biological and physical processes occurring on the surface of the Earth, in the oceans, and in the lower atmosphere March 1999 • Advanced Spaceborne Thermal Emission and Reflection Radiometer ASTER Provides high-resolution images of the land surface, water, ice and clouds March 1999 • Measurement of Pollution in the Troposphere MOPITT Measures thermal emissions to determine tropospheric CO profile and total columns of CO and CH4 March 1999 EOS PM-1 Platform • Atmospheric Infrared Sensor AIRS Measures the Earth's atmosphere and surface to obtain temperature and moisture profiles December 2000 • Advanced Microwave Sounding Unit AMSU Provides temperature and water vapor profiles for worldwide weather forecasting December 2000 • Clouds and the Earth's Radiation Energy System CERES/PM-1 Measures the Earth's radiation budget and atmospheric radiation from the top of the atmosphere to the surface December 2000 • Moderate Resolution Imaging Spectrometer MODIS/PM-1 Measures biological and physical processes occurring on the surface of the Earth, in the oceans, and in the lower atmosphere December 2000 • Humidity Sounder, Brazil HSB Measures atmospheric humidity to improve weather prediction December 2000 • Advanced Microwave Scanning Radiometer AMSR Observes atmospheric, land, oceanic, and cryospheric parameters, including precipitation, sea surface temperatures, and ice concentrations December 2000

OCR for page 9
Review of NASA'S Distributed Active Archive Centers Satellite Acronym Measurement Launch Datea EOS Chemistry • Microwave Limb Sounder MLS Measures microwave thermal emission from the limb of the Earth's atmosphere to determine vertical profiles of atmospheric gases, temperature and pressure December 2002 • Tropospheric Emission Spectrometer TES Measures the state of the Earth's troposphere and the interaction of ozone with other chemicals December 2002 • High-Resolution Dynamics Limb HIRDLS Sounds the upper troposphere, stratosphere, and mesosphere to determine temperature, concentrations of oxides and aerosols, and the locations of polar stratospheric clouds and cloudtops December 2002 Selected Complementary Missions • Tropical Rainfall Measuring Mission TRMM Monitors tropical rainfall and the associated release of energy that helps to power the global atmospheric circulation November 1997 • Land Remote Sensing Satellite-7 Landsat-7 Provides the longest continuous record of the Earth's continental surfaces February 1999 • Stratospheric Aerosol and Gas Experiment SAGE III Measures stratospheric aerosols, nitrogen dioxide, water vapor, and ozone July 1999 • Active Cavity Radiometer Irradiance Monitor ACRIM Measures the Sun's total output of optical energy from ultraviolet to infrared wavelengths October 1999 • Ocean Altimetry Mission—with France Jason-1 Determines the circulation of the ocean and the distribution of sea surface topography April 2000 • Earth Probe Total Ozone Mapping Spectrometer TOMS Maps the spatial distribution of total ozone, measures sulfur dioxide to detect volcanic eruptions, and measures the albedo of the Earth's atmosphere August 2000 • SeaWinds—with Japan SeaWinds Measures wind vector fields near the sea surface November 2000 • RADARSAT—Canada RADARSAT-2 Measures topography and surface roughness, regardless of weather conditions. November 2001 • Geoscience Laser Altimeter System GLAS Detects surface height changes in the ice sheets July 2002 • Solar Stellar Irradiance Comparison Experiment SOLSTICE Provides long-term, accurate measurements of the solar ultraviolet (UV) and far ultraviolet (FUV) radiation December 2002 a Schedule as of July 30, 1998. SOURCES: Asrar and Greenstone (1995); http://www.earth.nasa.gov/missions/spacecraft.html.

OCR for page 9
Review of NASA'S Distributed Active Archive Centers EVOLUTION OF EOSDIS A number of excellent histories of the EOS program and EOSDIS have been written (e.g., NRC, 1995b, 1998a). The following focuses on the history of EOSDIS, as it relates to the DAACs. Original EOSDIS Concept NASA's original plans called for the creation of a single DAAC to process, disseminate, and archive data from the entire EOS program, with the goal of creating ''one-stop shopping'' for researchers interested in studying the Earth as a system. However, NASA advisory groups, such as the EOSDIS Panel, objected that this arrangement would place too much responsibility in the hands of a single center, and recommended that data management functions be collocated with the relevant scientific expertise. A model for establishing geographically distributed discipline centers was first described in a 1986 report of the National Research Council's Committee on Data Management and Computation. The committee envisioned a set of active database sites, which would receive regular scientific use and guidance by associated scientists in the corresponding discipline (NRC, 1986). Such sites differ from data centers in that they exist for a fixed period of time—the period when the data are being used intensively for research—and are thus not responsible for long-term maintenance of the data (see Chapter 2, "DAAC Versus Data Center"). NASA adopted the model, and by the early 1990s, eight DAACs had been established by program leaders at NASA Headquarters, and the ninth, SEDAC, was established by congressional fiat shortly afterward. (The DAAC at Marshall Space Flight Center was closed due to budgetary pressures in 1997.) In a multi-DAAC system, one-stop shopping meant that users would be able to access the system through any DAAC, search all the EOSDIS holdings, and obtain the relevant data, regardless of where it resided. The use of common formats (e.g., Hierarchical Data Format [HDF]-EOS) and standards across the DAAC system would permit users to integrate data of different types with a wide range of temporal and spatial scales. To test these concepts, the DAACs participated in two key prototype exercises—the Pathfinder Program and EOSDIS Version 0. Pathfinder data sets were developed by science teams to support global change research and to gain experience in reprocessing and transferring massive data sets in the pre-EOS era. Because Pathfinder products incorporate data from many disciplines (land, ocean, and atmosphere), sources (NASA, the National Oceanic and Atmospheric Administration [NOAA], U.S. Geological Survey [USGS], and Environmental Protection Agency [EPA]), and spatial and temporal scales, the program also illustrated some of the difficulties inherent in integrating disparate data types. Version 0 was developed largely from existing hardware and software at the DAACs. It was designed to provide an early operational capability and to test

OCR for page 9
Review of NASA'S Distributed Active Archive Centers selected EOSDIS tools and services. However, Version 0 had limited ingest, processing, and archive capabilities. These capabilities were to be provided in a comprehensive new information system, the ECS, which would replace Version 0. A contract to develop the ECS was awarded to a single contractor (Hughes Information Technology Company) in 1993. Early peer reviews found that the system being developed by the ECS contractor would likely be too rigid to permit users to manipulate the data in new ways or to evolve to meet new user needs (e.g., NRC, 1994). A 1994 NRC report concluded that the DAACs were in a good position to understand the needs of their users and should therefore become intimately involved in the development of the ECS. Their involvement would help ensure that the information system supported the scientific community for which it was built (NRC, 1994). This recommendation was never implemented. Federation and Recertification Although ESDIS and the ECS contractor took steps to make the information system less centralized and more flexible, rapid technological changes called into question the original EOSDIS paradigm. In particular, the growth of the World Wide Web (WWW) and the widespread availability of powerful desktop computers made it possible for individuals to manipulate and store large data sets for the first time. (The scale of the data management problem is discussed in Box 1.1.) As a result, many traditional data management tasks no longer have to be performed by DAACs or data centers, and a more truly distributed system for EOSDIS could be created. With this goal in mind, an NRC committee recommended that certain DAAC functions, such as product generation, publication, and user services, be transferred to a federation of partners selected competitively from academia, industry, and government (NRC, 1995a). Given the imminence of the AM-1 launch, the committee also recommended that NASA federate EOSDIS in stages, beginning with an initial limited set of pilot projects (NRC, 1996). In 1998, NASA initiated a prototype federation with participation by three types of Earth Science Information Partners (ESIPs). The Type 2 and Type 3 ESIPs (see Box 1.2) were selected through a competitive process to create products and offer services not currently provided by EOSDIS (see NRC, 1998b); the DAACs represent Type 1 ESIPs. The responsibilities of the ESIPs are described in Box 1.2. The issues likely to be faced by the ESIPs in creating the prototype federation were examined at a 1998 NRC workshop. The report from that workshop examined the federation concept; compared governance models from a diversity of federated structures—libraries, international organizations, industry, and academia; and offered some lessons for managing scientific data in an ESE federation (NRC, 1998b). A new NRC report about to be released (NRC, 1998a)

OCR for page 9
Review of NASA'S Distributed Active Archive Centers BOX 1.1. Scale of the EOSDIS Data Management Problem This report uses a number of terms relating to the size of EOSDIS data sets. These terms are relative and reflect the committee's view of the manageability of data at the time of writing. A major concern in early EOSDIS planning was the sheer size of the data streams that must be processed routinely and the availability of adequate computer power and communications bandwidth to handle them. As the technology and installed infrastructure have improved, the emphasis has shifted to distributed operations and the interface to scientific users, in particular for the science teams overseeing the development and production of standard products. However, it remains true that the data volumes from the AM platform will be unprecedented, and approaches that are normal for a desktop workstation may not be applicable on the scale of EOSDIS operations. What is "routine" is very much a shifting target, but discussions of strategy must reflect realistically the orders of magnitude involved. Based on the survey of users (see Appendix D), a typical user request for data from a DAAC is as follows: small: <10 Mbyte, typical: 10 Mbyte to 1 Gbyte, or large: 1–100 Gbyte. These distribution limits are determined primarily by ease of transmission over the Internet and by standard capabilities on workstations and personal computers. One gigabyte of data can be fitted onto two CD-ROMs. Nevertheless, any assessment of such scaling issues should be cognizant of the fact that current Internet bandwidth is doubling every three months or so. From the perspective of a scientific data center, data sets could be characterized as: small: 1 Tbyte, large: 100 Tbyte, or very large: 1 Petabyte. These limits are determined primarily by the availability of mass storage systems and associated data management software. One Petabyte would require a stack of CD-ROMs several kilometers high! Of course, the effort required to handle a data set effectively depends on many factors in addition to its size, such as the complexity of its structure, patterns of use, and user understanding of its content.

OCR for page 9
Review of NASA'S Distributed Active Archive Centers BOX 1.2. Responsibilities of the Prototype Federation Partners Type 1 ESIPs. These ESIPs are responsible for standard data and information products whose production, publishing or distribution, and associated user services require emphasis on reliability and adherence to schedules. Type 1 ESIPs include DAACs and science teams for specific instruments. Type 2 ESIPs. These ESIPs are responsible for producing innovative science information products and services, which primarily serve the global change and earth science communities. Type 2 ESIPs include science teams and global change scientists. Type 3 ESIPs. These ESIPs are responsible for providing innovative, practical applications of earth science data to a broad range of users beyond the global change research community. Type 3 ESIPs include science teachers, college earth science students, policy analysts, interested public, research scientists working outside their discipline, and for-profit businesses. SOURCE: Modified from NRC (1998b). will provide additional recommendations on refining the ESE federation model and the responsibilities of the partners. If the prototype is successful, NASA plans to phase in development of an Earth Science Enterprise federation through a series of competitions focusing on production, publication, and user services, following launch of the AM-1 platform and other near-term missions. Additional DAAC-type activities may also be competed. Only recertified DAACs will be permitted to compete for these functions and, if successful, become partners in the ESE federation. Consequently, a DAAC recertification process may accompany each competition. Recent Developments As the previous discussion illustrates, plans for implementing EOSDIS have undergone considerable change over the past five years. Further change, particularly in the EOSDIS Core System, is likely to occur as the system approaches operational readiness. Delays in the ECS led NASA's 1997 Biennial Review Committee to recommend stronger managerial oversight of the ECS, the creation of backup plans, and a reduction in the data requirements for EOSDIS (Indepen-

OCR for page 9
Review of NASA'S Distributed Active Archive Centers dent External Review Panel, 1997). A subsequent demonstration of the ECS showed that full functionality of the system would likely not be achieved in time for launch of the first EOS satellite, the Tropical Rainfall Measuring Mission (TRMM). Consequently, NASA decided that the DAACs would still plan to generate Level 1 products beginning 90 days after launch, but that production of higher-level products would be delayed. Only 25% of the standard Level 2, 3, and 4 products would be produced the first year, reaching 100% in the fourth year after launch. This strategy became known as the 25-50-75 scenario. NASA also solicited backup plans from the DAACs and science teams for each instrument. The backup systems of the GSFC and LaRC DAACs are already being used to manage data from TRMM, which was launched in November 1997. Meanwhile, NASA directed the ECS contractor to focus system development on preparing for the AM-1 data streams. A failure in the flight operations segment of the ECS (under subcontract to a different developer) in April 1998 delayed launch of the AM-1 platform and gave the ECS contractor at least six more months to prepare. By July 1998, however, it became clear that these efforts would not be sufficient, and NASA is now considering using backup plans more extensively. The backup systems developed by the DAACs rely on Version 0 and their own home-grown information systems to process the data. On the other hand, backup plans developed by the science and instrument teams mostly call for the data to be processed at the Science Computing Facilities, where the algorithms were developed and where the interdisciplinary science teams will eventually analyze the data. Which plans are implemented will be decided on an instrument-by-instrument basis, but in either case, the ECS would be used only for data distribution and archive. ORGANIZATION OF REPORT In this report the Committee on Geophysical and Environmental Data examines the DAAC system as it was configured from October 1997 to September 1998, and those elements of EOSDIS that pertain directly to the DAACs. Its assessment, presented in Chapter 2, draws on the reports of the DAAC review panels, interviews with system developers and NASA management, and a survey of DAAC users. The individual DAAC reports were written by specialized site visit panels and are presented without modification by the CGED. They are given as Chapters 3 through 9, and they appear in the order in which the DAACs were visited. Each report is based on a site visit, subsequent e-mail discussions with DAAC personnel, and the personal experiences of panel members with the center. The site visit agenda followed by all the panels is given in Appendix A. For consistency the DAAC reports follow a similar format. The chapter sections reflect the criteria for review, which are divided into five categories: holdings, users, technology, management, and the relationship between the DAAC and other components of

OCR for page 9
Review of NASA'S Distributed Active Archive Centers the Earth Science Enterprise. The criteria for review and suggested measures of performance appear in Appendix B. In each chapter, an abstract provides an overall assessment of the DAAC and identifies the panel's key recommendation(s). Throughout the report, suggestions are made for improving the effectiveness of the DAACs, but only the most important are phrased as recommendations to NASA, ESDIS, or the DAAC. To prepare the panels for their discussion of the relationship between the DAACs and the Earth Science Enterprise, the CGED interviewed ESDIS in advance of the site visits. Questions prepared by the CGED and the formal written responses of ESDIS are given in Appendix C. The questions were based on the strategic management plan for the DAAC system, which is prepared yearly by the DAACs and ESDIS, and focus on the following topics: (1) ESDIS expectations of the DAACs and (2) DAAC expectations of ESDIS. In addition, Appendix C includes the ESDIS response to issues raised by informal advance teams who visited the DAACs and by previous NRC reports. The committee solicited input from the broader community by conducting a user survey. The survey was e-mailed to the Investigator Working Group list, which includes nearly 1,000 individuals associated with EOSDIS and the Earth Science Enterprise, including instrument and science teams, program managers, and NASA advisory panels. It was subsequently forwarded to other users in the United States and abroad. The results of the survey can be found in Appendix D. Nearly 400 users responded, including scientists, educators, and the general public. The survey was not rigorously controlled; therefore the committee did not perform a statistical analysis of the results. Nevertheless, the patterns of responses shown in Appendix D illustrate the range of experiences that different users groups have had with the DAACs. Finally, an acronym list, which defines the many organizations, satellite missions, and science projects discussed in this report, appears at the end of the report.

OCR for page 9
Review of NASA'S Distributed Active Archive Centers This page in the original is blank.