3
Goddard Space Flight Center DAAC

Panel Membership

J.-BERNARD MINSTER, Chair, Scripps Institution of Oceanography, La Jolla, California

FERRIS WEBSTER, Vice Chair, University of Delaware, Lewes

SYDNEY LEVITUS, NOAA National Oceanographic Data Center, Silver Spring, Maryland

RICHARD S. LINDZEN, Massachusetts Institute of Technology, Cambridge

TERENCE R. SMITH, University of California, Santa Barbara

JOHN R.G. TOWNSHEND, University of Maryland, College Park

ABSTRACT

The Goddard Space Flight Center (GSFC) DAAC is the largest of the EOSDIS DAACs. It manages a variety of data sets related to climate, the biosphere, and the upper atmosphere, and it will also process, disseminate, and archive data from the flagship EOS instrument, the Moderate Resolution Imaging Spectroradiometer (MODIS). The DAAC understands its role and is doing a good job with its current data sets. However, the large data volumes and complex algorithms of the MODIS data stream present a significant challenge to the DAAC, and the panel's main recommendation is that the DAAC continue to focus its efforts on preparing for the AM-1 platform, and particularly the MODIS instrument.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 53
Review of NASA'S Distributed Active Archive Centers 3 Goddard Space Flight Center DAAC Panel Membership J.-BERNARD MINSTER, Chair, Scripps Institution of Oceanography, La Jolla, California FERRIS WEBSTER, Vice Chair, University of Delaware, Lewes SYDNEY LEVITUS, NOAA National Oceanographic Data Center, Silver Spring, Maryland RICHARD S. LINDZEN, Massachusetts Institute of Technology, Cambridge TERENCE R. SMITH, University of California, Santa Barbara JOHN R.G. TOWNSHEND, University of Maryland, College Park ABSTRACT The Goddard Space Flight Center (GSFC) DAAC is the largest of the EOSDIS DAACs. It manages a variety of data sets related to climate, the biosphere, and the upper atmosphere, and it will also process, disseminate, and archive data from the flagship EOS instrument, the Moderate Resolution Imaging Spectroradiometer (MODIS). The DAAC understands its role and is doing a good job with its current data sets. However, the large data volumes and complex algorithms of the MODIS data stream present a significant challenge to the DAAC, and the panel's main recommendation is that the DAAC continue to focus its efforts on preparing for the AM-1 platform, and particularly the MODIS instrument.

OCR for page 53
Review of NASA'S Distributed Active Archive Centers INTRODUCTION The GSFC DAAC was created in 1993 to archive and distribute data related to climate change, atmospheric dynamics, global biosphere, hydrology, and upper atmospheric chemistry (Box 3.1). Its roots are in the NASA Climate Data System and the Pilot Land Data System. The first data sets archived by the DAAC included data collected by the Total Ozone Mapping Spectrometer (TOMS) and the Nimbus-7 Coastal Zone Color Scanner (CZCS). Today the DAAC manages data sets from a variety of missions and experiments, supports the Goddard Data Assimilation Office, and also manages some of the hydrology holdings of the Marshall Space Flight Center DAAC, which was closed in 1997. With a staff of 114 and current holdings of 4 TB, the GSFC DAAC is one of the largest DAACs in the EOSDIS system. In the EOS AM-1 era, DAAC holdings will increase in size by a factor of 500 (Box 3.1). The Sea-Viewing Wide-Field-of-View Sensor (SeaWiFS) and Tropical Rainfall Measuring Mission (TRMM) instruments, which have already been launched, will produce 65 TB of data, and MODIS, which will be launched in early 1999, will produce nearly 2,000 TB. To prepare for these large data streams, the DAAC is staffing up. Approximately 40 EOSDIS Core System (ECS) contractors have been added to process MODIS data, and about 12 permanent staff have been added to manage DAAC operations. The average budget for the DAAC, which includes DAAC personnel and functions, civil servants, ECS contractors, and ECS-supplied hardware, is about $15 million per year. Managing the enormous MODIS data stream poses daunting managerial and technological challenges for the GSFC DAAC. Of most concern is whether the information system, particularly the ingest system, can be scaled up to accommodate increasing loads (see "Technology," below). To prepare for the new data streams, the DAAC will start "day-in-the-life" exercises and operations rehearsals several months before launch. As of June 1998, the ECS was still not ready for day-in-the-life exercises, but so far, it has been sufficient to test the science algorithms. Delays in the launch of the EOS satellites will provide additional preparation time. The Panel to Review the GSFC DAAC held its formal site visit on October 20–21, 1997. To ensure that its report and recommendations reflect recent developments, several panel members visited the DAAC again in June 1998. The following report is based on findings from both visits and e-mail discussions with DAAC managers in July and September 1998. HOLDINGS Even before the launch of TRMM and AM-1, the GSFC DAAC has been managing and distributing numerous data sets of substantial size. These include in particular the Advanced Very High-Resolution Radiometer (AVHRR) and the

OCR for page 53
Review of NASA'S Distributed Active Archive Centers BOX 3.1. Vital Statistics of the GSFC DAAC History. The GSFC DAAC was created in 1993 out of the NASA Climate Data System and the Pilot Land Data System. Its holdings go back to 1978. Host Institution. NASA Goddard Space Flight Center in Greenbelt, Maryland. Disciplines Served. Atmospheric science and hydrology; data are available on the chemistry of the upper atmosphere, global biosphere, atmospheric dynamics, and climatology. Mission. To maximize NASA's investment benefits by providing data and services that enable its customers to fully realize the scientific and educational potential of data and information from the Earth Science Enterprise. Holdings. The DAAC holds 4 TB of heritage data sets and anticipates receiving more than 2000 TB of data from the AM-1 platform. Users. There were 12,216 unique users in FY 1997, based on log-in addresses. Staff. In FY 1998 the DAAC had 74 staff (9 of them civil servants) and 40 ECS contractors. Budget. Approximately $9.2 million in FY 1998 (including DAAC costs and ECS-provided hardware, software, and personnel), increasing to $17 million in FY 2000. Television and Infrared Observation Satellite Operational Vertical Sounder (TOVS) Pathfinder data sets, which have been used extensively by EOS investigators to prepare for the processing of AM-1 data (Box 3.2). These holdings consist primarily of imagery and remotely sensed data, and constitute one of the best resources available to date to support research on the atmosphere and global climate change. Figure 3.1, for example, illustrates changes in the size of the ozone hole, as detected by several remote sensing instruments, whose data are managed by the GSFC DAAC.

OCR for page 53
Review of NASA'S Distributed Active Archive Centers BOX 3.2. Data Holdings as of January 1998 Total Ozone Mapping Spectrometer (TOMS)—Data from the Nimbus-7 and Meteor-3 satellites for November 1978 to December 1994. Upper Atmosphere Research Satellite (UARS)—Products from nine instruments for September 1991 to present. Television and Infrared Observation Satellite (TIROS) Operational Vertical Sounder (TOVS)—1-degree resolution data for 1978 to 1994. Sea-Viewing Wide-Field-of-View Sensor (SeaWiFS)—Data now available provide local, regional, and global coverage. Greenhouse Effect Detection Experiment (GEDEX)—Global, regional, or local data sets for the 1980s. International Satellite Land Surface Climatology Project (ISLSCP) Initiative I: Global Data Sets for Land-Atmosphere Models—Monthly, monthly-six-hourly, and six-hourly data are available globally on a 1-degree grid for 1987 to 1988. Nimbus-7 Coastal Zone Color Scanner (CZCS)—Data at 1-km, 4-km, or 20-km resolution for November 1978 to June 1986. Pathfinder Advanced Very High-Resolution Radiometer (AVHRR)—8-km-resolution data for 1981 to 1994. Goddard Data Assimilation Office (DAO)—2-X 2.5-degree-resolution data for 1985 to 1993. Moderate Resolution Imaging Spectroradiometer (MODIS) Airborne Simulator (MAS)—Data from nine campaigns and other data sets are available on tape. Tropical Ocean Global Atmosphere-Coupled Ocean Atmosphere Response Experiment (TOGA-COARE)—Data from surface, aircraft, and satellite measurements for November 1992 to February 1993. Marshall DAAC hydrology data sets. Interdisciplinary Climatology Data Collection—Monthly data for land, oceans, and atmosphere are available globally on a 1-x 1-degree grid. SOURCE: NASA (1998).

OCR for page 53
Review of NASA'S Distributed Active Archive Centers FIGURE 3.1. Change in the size of the ozone hole between 1979 and 1998, based on data from the Total Ozone Mapping Spectrometer (TOMS). Vertical lines represent the maximum and minimum from September 7 to October 13. SOURCE: Atmospheric Chemistry and Dynamics Branch of NASA Goddard Space Flight Center.

OCR for page 53
Review of NASA'S Distributed Active Archive Centers Metadata The current metadata ''model'' is meager, and a much richer metadata model must be developed if the system is to migrate toward a greater degree of content-based access. In particular, the system must be able to support better levels of spatial query if it is to fulfill the goal of providing better access to information in general and content-based access and subsetting of information in particular. It is important that geospatial access be supported for scientific images and data sets. This requires good, generalized gazetteer services and the ability to represent spatial footprints for items in the collections. The combination of these two sets of services, together with the use of standards for representing such geospatial metadata, greatly increases a user's ability to access appropriate information. The current support for these services in the GSFC DAAC is still relatively primitive. The center, however, could make significant progress in this area by using resources developed elsewhere. Processing Plans At the time of the site visit in October 1997, the DAAC was to have been responsible for Level 0 ingest and archive, and Level 1 through 3 production and archive of MODIS products (see Table 1.1 for a description of processing levels). ESDIS is now considering funding the MODIS science teams to process the products (Level 2 and higher) because of concerns that the ECS will not be available in time for launch. Similarly, the MODIS instrument team may process MODIS Level 2 and higher land products and snow and ice products, which are currently scheduled to be processed at the EDC and NSIDC DAACs, respectively. Until a final decision is made, the GSFC DAAC will continue testing the MODIS Product Generation Executables (PGEs) and integrating them into the processing system. As of June 1998, the DAAC had successfully integrated seven of the 40 PGEs into the ECS data processing system and had run an additional eight PGEs outside the ECS. In addition, a chain of three PGEs has been successfully tested and longer chain tests are being planned. On the other hand, if the Level 2 and higher products are to be processed by the MODIS science team, the GSFC DAAC will have to integrate only three PGEs into the system. If all the processing is to be done by the science team, the DAAC's role will be reduced to archive, dissemination, and user services. Reprocessing Strategies As research using DAAC data progresses, better algorithms will be devised by the scientists, and errors will likely be discovered in the data products. Ultimately a need will arise to reprocess the entire data set ab initio. Consequently, the DAAC must plan to allocate resources for reprocessing tasks. The science

OCR for page 53
Review of NASA'S Distributed Active Archive Centers teams decide when reprocessing should be done, and the ESDIS Resource Allocation Board and the Science Review Committee arbitrate when several science projects request reprocessing. If the choice is between reprocessing an existing data set and processing data from a new instrument, the DAAC's priority is generally on the new mission. It appeared to the panel that the DAAC considers reprocessing an additional thing to do, rather than an integral part of its data management role. The panel is concerned about the consequences on the user community of the DAAC's decisions on what or when to reprocess. Recommendation 1. The DAAC should incorporate reprocessing as an integral part of its data management strategy and plan for adequate resources over time for reprocessing needs arising from errors in product generation or algorithm improvements. Subsetting Strategies MODIS files will be so large that many users will not have the hardware, software, or personnel capability to produce subsets they can work with. Even though TRMM data sets may not be unmanageably large, the DAAC plans to make its data sets easier to obtain and use by preparing canned subsets in simple formats. The products are customized for particular disciplines or types of customers. The DAAC then advertises the packages and tries to get other customer groups to use them. Custom subsets for individuals are too expensive to produce, and the DAAC hopes to develop on-the-fly subsetting capability to meet their needs. The panel agreed that the production of canned subsets is a good strategy, and it encourages the development of on-the-fly subsetting. Treatment of Model-Derived Data When dealing with EOS data, it is crucial to distinguish between data products, which consist simply of the results of measurements, and derived data, which depend on specific model calculations. The latter include not only the meteorological variables derived by the Goddard Data Assimilation Office (DAO), but also all variables that depend on first guesses from the data assimilation model. Such derived data are dependent for their utility on the accuracy and reliability of the model. At present, the pressure of the AM-1 launch date has all but frozen efforts to correct errors in the DAO's GEOS model so that derived data products can be immediately available on launch. The panel believes that such time pressure might be appropriate to data stemming directly from AM-1 measurements, and is appropriate for activities such as archiving, but is not appropriate for derived products whose utility for climate studies depends on the adequacy of the underlying model. Moreover, the panel's view is that the notion that climate data are needed in

OCR for page 53
Review of NASA'S Distributed Active Archive Centers real time is a contradiction in terms. Climate data extend over sufficiently long periods to define climate. Under the circumstance, there is no meaningful time pressure to have a model regardless of its problems. For derived data, model performance is a critical component of data quality control, to which the GSFC DAAC has paid inadequate attention. The common practice of model validation in which one searches for similarities between model output and directly observed data is insufficient for this purpose. Rather a program of testing and evaluation is needed. This should be of as much concern to the GSFC DAAC as to the DAO. The panel felt that neither the DAAC nor the DAO liaison with the DAAC seemed to be sensitive to this issue. Under these circumstances, the DAAC seems to be part of a diffusion of responsibility that leaves no one accountable. There should be some mechanism whereby data products can be accepted by the DAAC for dissemination only if these products meet strict scientific criteria. So far, there is no evidence of such a mechanism. The panel encourages the DAAC to set up these procedures as soon as possible. Long-Term Archive NASA and NOAA are negotiating a Memorandum of Understanding for the long-term archive of EOS data sets. To help ensure the long-term vitality of the data, NASA has provided some funding to NOAA to prototype an archive. The prototype system is based on the GSFC DAAC's Version 0 system, but the DAAC has no role either in developing the prototype or in ensuring the long-term (the so-called 20-year test) usability of the data. Because the DAAC understands its data sets, the panel believes that it should become involved in the crucial process of transferring responsibility for the long-term archive at the earliest possible stage. USERS Characterization of the User Community The DAAC does not have a well-defined user model. Instead, it divides "customers" into three levels of sophistication: (1) research scientists; (2) application users and college students; and (3) high school teachers and students. The first group—EOS instrument team members, EOS interdisciplinary science team members, and non-EOS investigators—includes both users and providers of data. The DAAC does not seem to have a clear idea of which customers are its highest priority to serve, and no user community feels as if it "owns" the DAAC. Moreover, although the DAAC solicits input from its user communities via surveys and annual User Working Group and "voice-of-the-customer" meetings, there seems to be no systematic process by which the DAAC gains an improved understanding of what its priorities should be. To serve the needs of its user

OCR for page 53
Review of NASA'S Distributed Active Archive Centers community effectively, the DAAC will have to constantly refine its understanding of the user community's characteristics and needs. User Working Group The User Working Group (UWG) membership is weighted heavily toward data providers. In the past, the UWG focused on issues such as guiding the development of the ECS and setting priorities on data sets. Now that most of these issues have become settled, the UWG is trying to define a new role for itself. Some members want the UWG to function like an external review panel that has clout with NASA, and others want it to continue functioning like an advisory panel composed predominantly of data providers. The DAAC director implements most of the recommendations of the UWG and also uses the UWG to provide protection and endorsement of the market approach to data management (see "Management," below). The panel felt that less emphasis on the latter and more emphasis on critically reviewing DAAC activities would improve the effectiveness of the UWG. Interaction with the Scientific Community Although the DAAC is customer oriented, its relationship with its primary user community, the scientific community, is substantially weaker than it should be. The DAAC is situated within a large research facility, but there is little interaction with ESE scientists at Goddard Space Flight Center or elsewhere. In fact, the absence of on-site scientists is seen as an advantage by the DAAC because it gives the DAAC independence. Consequently, there appears to be no mechanism for scientists to provide feedback. DAAC staff, including those hired from the Earth Science Division at Goddard, are generally not carrying out research using the data sets, and it is not clear that they maintain a working relationship with those who are. Hence when users approach the DAAC about the data sets the DAAC distributes, it is not clear that they will get appropriate high-level responses about the quality of the data. There seems to be no well-structured process that would allow "complex" inquiries to be passed quickly to those who had generated and/or were using the data sets. Recommendation 2. Interactions between the DAAC and the science community should be improved. Examples of actions the DAAC might consider include (1) establishing a visiting scientist program, (2) hiring a full-time DAAC scientist, and (3) collocating DAAC staff with Goddard Space Flight Center researchers.

OCR for page 53
Review of NASA'S Distributed Active Archive Centers User Services Beginning in 1994, the User Services Group was abolished and the DAAC was reorganized into a Customer Support Group and an Engineering Group. The Customer Support Group—made up of several data support teams—provides end-to-end support for specific scientific disciplines, and each team is responsible for data preparation, information management, building the customer base, and interacting with customers for an individual science discipline. The teams have the flexibility to determine how to carry out their responsibilities. The panel was impressed that the data teams are so empowered and that the organizational changes have apparently led to speedier resolution of customer problems. Foreign Access The GSFC DAAC is located within a secure facility. Although the Internet provides a means of visiting the DAAC virtually, foreign visitors face difficulties in obtaining clearance for a physical visit of more than a few days. Usability of Data Within a Scientific Context Earth scientists will inevitably need access to NOAA as well as NASA data. Some of the necessary NOAA data are difficult to access, and this will ultimately inhibit the utilization of NASA data. The two cannot really be decoupled in a scientific context. Fortunately, the relative volume of NOAA data in the EOS context is small; it is also likely that at least the NOAA data in digital form could be incorporated at modest cost. Even the digitization of existing paper data should be affordable. NASA should consider assisting NOAA in this regard since the inclusion of these data in the DAAC will add greatly to its value as a climate research resource. TECHNOLOGY Strategy The DAAC's hardware strategy is to stay on the leading edge of mainstream. It allows others to test and find the bugs in new technologies, then it adopts the technologies that work. The DAAC manager is always thinking ahead, but no real plans are made more than six months in advance; this allows the DAAC to retain its flexibility. However, the panel feels that this tactic limits the DAAC's ability to deal with certain medium-to long-term issues: the rule of thumb for doubling performance in computer technology (so-called Moore's law) is only about 18 months, so that the DAAC will likely have to face retooling several times over its lifetime. At the same time, the useful life of ECS-provided hard-

OCR for page 53
Review of NASA'S Distributed Active Archive Centers ware and software is pegged nominally at five years, making it difficult for a production facility to evolve in response to continuing technological advances. Consequently, the panel feels that it would be prudent for the DAAC technology team to have a more specific strategy for tracking developments in relevant areas of software, hardware, and communications. Recommendation 3. In order to stay on the leading edge of mainstream, the DAAC should formulate a long-term strategy for keeping up with and taking advantage of new technologies. Hardware Availability The DAAC has developed its own file management system for short-term archiving, Archer, which was developed to replace the ECS file management system, Unitree. Archer seems to work well under current loads and has the advantage of being tunable to known access patterns at the DAAC. Locally developed systems, however, can be treated only as temporary solutions because of the continued responsibility to maintain the systems, upgrade them as needed, and migrate them to new platforms. This will drain resources from what should be the primary mission of the DAAC. Whenever possible, the DAAC should use commercial off-the-shelf (COTS) technology. For example, the ECS-provided archiving devices seem reasonable, and the current archiving plans are moving in the right direction with the STK Timberwolf. Recommendation 4. In order to avoid being distracted from its primary mission, the DAAC should strive to operate in a standardized environment and rely on industry-supported COTS technology whenever possible. TRMM Support System The TRMM satellite has two segments—the TRMM mission and two unrelated instruments (the Clouds and the Earth's Radiant Energy System and the Lightning Imaging Sensor). The GSFC DAAC is responsible for archiving and distributing data for the TRMM mission. Originally, the TRMM mission was the responsibility of the Marshall Space Flight Center, and the data were to be processed using the EOSDIS Core System. However, the Marshall Space Flight Center DAAC was closed in 1997, and ECS support of TRMM was first delayed, and then finally canceled. Consequently, the GSFC DAAC had to build the TRMM Support System. The TRMM Support System contains only those functions that its users need; it does not have all the functionality of the ECS. The system was built in eight months with only six full-time equivalents (FTEs), and users appear to be happy

OCR for page 53
Review of NASA'S Distributed Active Archive Centers with the result. The panel suggests that the GSFC DAAC carefully document its experience with TRMM so that other groups can benefit. Media Versus Web Distribution Strategy Last year the DAAC completed WWW-accessible precomputed subsets and accompanying README documentation for all major DAAC data sets. The DAAC plans to (1) continue adding new data products to its anonymous file transfer protocol (ftp) data collections, (2) upgrade the Web-based documentation of existing data sets, and (3) enhance the functionality of the search-and-order Web interface to allow users to do on-the-fly parameter and regional subsetting, to order and track off-line data and documentation, and to use date-specific functions. The panel feels that it would be prudent to focus user interface implementation almost entirely on Web technology. Connection to the World The LAN connections between processing and archive computers were designed to handle very heavy traffic (rated at 800 Mb/sec/channel and there are multiple 800-Mb/sec channels). The balance of the system communicates on shared 100-Mb/sec segments. Data distribution to the outside world relies on the NASA Science Internet, which is an OC3 150-Mb/sec line. Although the current networking, both internal and external, appears to be adequate, there are questions as to whether internal networking, at least, is adequate for the loads that will arise with MODIS. The internal rates of communication should be more in balance and higher than currently supported. For the DAAC's internal purposes, 150 Mb/sec is slow. The external rate is probably too low, with many university players in this area moving to OC12 lines. MANAGEMENT General Philosophy The DAAC manager believes that data centers that are driven by requirements are not taking full advantage of the data or serving their customers well. Consequently, the DAAC manager, Paul Chan, has instituted a customer-oriented "business model" for running the center. The business model approach focuses on increasing the demand for use of DAAC data, while decreasing the effort of the customer (shown qualitatively in Figure 3.2). By creating an end-to-end user services group, providing access to data products and services on the Web, simplifying data formats, and creating data products of known or potential interest to a broad community, the DAAC has greatly increased its transaction volume and

OCR for page 53
Review of NASA'S Distributed Active Archive Centers FIGURE 3.2. The GSFC DAAC's business model. SOURCE: Paul Chan, GSFC DAAC. user base. The panel was impressed with the business model approach and its focus on users. Implementing the business model required a change in culture at the DAAC. The DAAC manager has overcome the resistance of DAAC staff and the UWG to this approach, and at the time of the site visit, staff morale had improved. Chan delegates responsibility to his staff and they have responded in a positive manner. Although the DAAC does not have a formal strategic plan document, it has a strong short-term focus on dealing with the AM-1 platform, particularly the large MODIS data sets. Insofar as MODIS is justifiably viewed as the flagship instrument within the AM-1 mission, the panel feels that this focus not only is the correct one, but is in fact critical to the success of the EOS program. The recent delays in launch date have provided a window of opportunity for the DAAC to complete its readiness exercises and to install and test a more flight-ready version of the ECS, thereby mitigating the negative consequences of previous difficulties with the ECS. Recommendation 5. The DAAC should retain its strong focus on achieving full readiness in time for the AM-1 launch and on being able to secure, archive, and distribute the full MODIS data stream. Personnel The GSFC DAAC director has put together an excellent staff. They are professional, motivated, and obviously enthusiastic about their work. They have demonstrated their abilities by successfully assuming responsibility for management of the TRMM data flow. They are open and responsive to criticism and suggestions, which is critically important to their ongoing mission. This respon-

OCR for page 53
Review of NASA'S Distributed Active Archive Centers TABLE 3.1. Total GSFC DAAC Costs (million dollars)a   Fiscal Year   1994 1995 1996 1997 1998 1999 2000 2001 2002 GSFC DAAC 3.6 3.8 3.4 6.2 6.2 6.0 6.0 6.3 4.7 ECS hardware 0 1.3 6.8 19.2 1.0 7.8 4.4 1.2 1.0 ECS software 0.7 9.5 2.8 1.8 0.4 0.5 0.4 0.3 0.3 ECS personnel 0 0 0.4 0.9 1.5 5.4 6.2 8.4 9.0 Total cost 4.3 14.6 13.4 28.1 9.1 19.7 17.0 16.2 15.0 a Budget numbers for FY 1994–1997 are actual values; numbers for FY 1998–2002 are projections, as of May 1998. SOURCE: ESDIS. siveness was apparent at the site visit and is also indicated by the implementation of UWG suggestions. Tension between operations and development is a classic data center problem, particularly for the DAACs because most of the development is being done out-of-house by the ECS contractor. As a result, it is more difficult for the DAAC to shift the emphasis from development to operations as launch approaches. In the past, the DAAC had a poor relationship with the ECS developers, but the arrival of a new ECS liaison, Tom Dopplick, at the DAAC has smoothed tensions between the two organizations. In this capacity, Dopplick works closely and effectively with Chan, and problems are resolved before they reach unmanageable proportions. Budget The DAAC's budget grew from approximately $4 million in FY 1994 to $28 million in FY 1997, its peak year (Table 3.1). Although highly variable, the DAAC's average budget is about $15 million per year. ECS hardware, software, and personnel account for nearly 70% of the budget, partly because the DAAC serves as a center for cross-DAAC coordination of the ECS configuration. As a result, the DAAC's hardware and software help support all four AM-1 DAACs. The DAAC manager takes prides in doing everything as cheaply as possible. For example, the GSFC DAAC acquired the hydrology data from the Marshall DAAC without requiring additional resources, and the DAAC spent only one-fifth of the projected developments costs on TRMM. In fact, until this year the DAAC has always spent less than its approved budget. The DAAC's best measure of cost-effectiveness, however, is that the unit cost has declined from several hundred dollars per order to $60 per order of data.

OCR for page 53
Review of NASA'S Distributed Active Archive Centers Contingency Plans In the event that the ECS is not ready by the AM-1 launch, the instrument teams have been given a choice of which system they will use—Version 0, ECS, a hybrid of the two, or their own home-grown system. In particular, the MODIS instrument team has decided that the MODIS computing facility will be the backup for getting data to the scientists. On the other hand, the DAAC expects that the ECS will be available on time, but without all the promised functionality. If this happens, the DAAC believes it can develop work-arounds to process MODIS data through the ECS. If the contingency plans of the MODIS instrument team are in fact implemented, the DAAC will face a much lighter processing load, may find itself to be oversized, and would have to revise its plans accordingly. GSFC DAAC AND THE EARTH SCIENCE ENTERPRISE Relation to Goddard Space Flight Center The DAAC is hosted by the Goddard Space Flight Center, which provides office and computer space and pays the salaries of the DAAC's civil servants. The DAAC is one of many facilities at the Goddard Space Flight Center and does not receive special recognition from Goddard management. Indeed, since data management is not a central mission of NASA, the DAAC believes its position within Goddard Space Flight Center is vulnerable. The DAAC's primary contact with Goddard management is with Stephen Wharton, chief of the Global Change Data Center, who reports to Vincent Salomonson, director of the Earth Science Division. Relation to ESDIS The GSFC DAAC follows the basic roles and responsibilities laid out by ESDIS, but then puts in extra effort serving the customers (the business model). ESDIS neither encourages nor discourages the DAAC from assuming these new responsibilities as long as the basic requirements are met. At the time of the October 1997 site visit, the DAAC perceived ESDIS as having a development focus and thus a philosophical alignment with the ECS contractor. As a result, tensions with the ECS contractor (see below) led to tensions between the DAAC and ESDIS. Subsequent changes in management at ESDIS and delays in the ECS have created a different kind of problem, which was brought to the attention of the panel in June 1998. There is a growing belief among some in the DAAC that ESDIS as no longer able to enforce standards or interoperability among the DAACs. Instead, EOSDIS is becoming balkanized and it is no longer clear that anyone is in charge of the overall system.

OCR for page 53
Review of NASA'S Distributed Active Archive Centers Relation to Other DAACs In October 1997, the DAAC manager saw the DAACs as being in a friendly competition, although they all followed certain standards and cooperated in some areas, such as driving the direction of the ECS. In addition, the GSFC DAAC worked with the LaRC DAAC on instrument interdependencies and with the EDC DAAC on archive. Even then, however, Chan described the GSFC DAAC as an independent entity within EOSDIS, rather than as a part of the system. Chan argued that data interoperability (i.e., standard formats that allow use of the same tools on different data sets) is important but that system interoperability, which is burdensome and impedes evolution, should not be an absolute requirement. In the panel's view, this distinction, together with the emergence of a federated system operating over the Web, entails a serious challenge to the requirement for a uniform ECS architecture. This may ultimately call for a profound rethinking and restructuring of EOSDIS in the future. Relation to the ECS Contractor The close proximity between the GSFC DAAC and the ECS contractor should, in theory, facilitate a close working relationship between the two organizations. In practice, the DAAC has a poor, but improving, relationship with the ECS contractor. The DAAC has a good relationship with the operations side of the ECS contractor, and the ECS liaison, Dopplick, has greatly eased communication problems between the DAAC and the development side of the ECS contractor by acting as an emissary. SUMMARY The GSFC DAAC is a well-equipped, well-run operation, which has a number of impressive accomplishments, such as successfully assuming the management of TRMM data. The DAAC's successful handling of TRMM, SeaWiFS, and other existing data sets bodes well for its ability to handle the large data sets that will result from EOS missions. However, scaling up two orders of magnitude to handle the MODIS data stream is a much greater challenge. MODIS is the flagship instrument of AM-1, and the DAAC's ability to process and disseminate MODIS data and products to users will be a gauge of success for the entire EOS-EOSDIS program. The success of EOSDIS also depends on the ability of the DAACs and/or other data or service providers to work together to enable users to integrate disparate data sets from a variety of sources. Although the GSFC DAAC has shared processing responsibilities with the EDC, NSIDC, and LaRC DAACs, it views itself as isolated from, rather than an integral partner in, EOSDIS. Stronger

OCR for page 53
Review of NASA'S Distributed Active Archive Centers links with the other DAACs would help ensure that EOSDIS functions as a system. Finally, the panel was pleased to find that the DAAC's focus is on users. The DAAC has a number of creative strategies, such as preparing predefined subsets, for meeting the needs of existing users, and it tries to position itself to meet the needs of new user groups as they emerge. The DAAC has also empowered its data teams, which has led to greater user satisfaction with the center. These measures are designed to increase the demand for DAAC data, while decreasing the effort of the users. This is the essence of the DAAC's business model. However, the DAAC's relationship with its primary user community (i.e., scientists) is weak, and the DAAC needs to focus on building relationships with its science teams and the scientific user community, both at Goddard Space Flight Center and elsewhere. Meaningful, ongoing interactions with scientists will help the DAAC understand the impact of its decisions about issues such as reprocessing on the scientific user community.

OCR for page 53
Review of NASA'S Distributed Active Archive Centers This page in the original is blank.