Click for next page ( 26


The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 25
Ensuring the Climate Record from the NPP and NPOESS Meteorological Satellites 3 Essential Services for Climate Data This chapter presents the committee's perspective on the requirements for essential data system services for climate research and on how well current plans meet these requirements. In writing this chapter, the committee drew on discussions and presentations at its workshop on February 7-8, 2000; it also referred to related work completed by the NRC and others.1,2,3 While the earlier work focused on broad questions of what should be archived in a climate data archive, the focus of this study is on near-term steps that can be taken to ensure that the climate data record from NPOESS and NPP is preserved and that basic services are provided. Appendix E provides historical information relevant to this chapter. GUIDING PRINCIPLES The committee believes adherence to the following principles can help ensure the preservation of the climate record from NPP and NPOESS: Accessible and policy-relevant environmental information must be a well-maintained part of our national scientific infrastructure. High-quality data and information on climate change are the foundation on which policy decisions will be made. Collecting, managing, archiving, and distributing environmental data must be given sufficient priority to ensure that the requisite data foundation exists. The federal government should (1) provide long-term data stewardship, (2) certify open, flexible standards, and (3) ensure open access to data. The government does not necessarily need to control the implementation of every task and service for a climate data system. Rather it should undertake those activities and services that cannot be done in a competitive academic or commercial environment. Because the analysis of long-term data sets must be supported in an environment of changing technical capability and user requirements, any data system should focus on simplicity and endurance. Complex systems often become point designs; they meet current requirements but cannot incorporate any changes in tools or requirements.

OCR for page 25
Adaptability and flexibility are essential for any information system if it is to survive in a world of rapidly changing technical capabilities and science requirements. The system should not just react to change but instead should continually track technology and system performance so that it can respond proactively. Experience with actual data and actual users can be acquired by starting to build small end- to-end systems early in the process. EOS data are available now for prototyping new data systems and services for NPP and NPOESS. A waterfall design process, whereby the system is not available for testing and evaluation until it is complete, is nearly certain to fail in a changing world. Multiple sources of data and services are needed to support development of climate data records (CDRs). There is no single approach for deriving a particular CDR, such as atmospheric temperature or ozone density. The quality of the CDRs will improve as more research groups work with the various input data sets, and the overall system will be more robust if it does not rely on a monolithic implementation. Fostering open competition for services promotes innovation and new ideas. Access to the raw data will be needed to enable CDR development; Science involvement is essential at all stages of development and implementation. Scientific advice and involvement will be necessary for CDR design and production as well as for data system implementation. Having climate data record developers and users assisting in the specification, design, building, and testing of the system will help ensure its usefulness to the research community. COMPONENTS OF A CLIMATE DATA SYSTEM Developing a system appropriate for climate data records poses unique challenges related to the complexity and timescale of the data. Data sets (and information about the data sets) must be preserved for decades, and data must be archived in a manner that facilitates reanalysis. In addition to locating and delivering data, a climate data system must accommodate the new CDRs that will be developed as climate research evolves. Although these services appear to be identical to those provided by any data system, the scale of the problem, as well as the frequency of change, is much different from that of the typical satellite mission data system. For example, many NASA missions pass their data to the National Space Science Data Center at the Goddard Space Flight Center with the expectation that use of the data for long-term studies will be infrequent. The NOAA meteorological satellite data are stored in the National Climatic Data Center (NCDC) with the expectation that individual users will request only subsets of the data. On the other hand, as described below, active archives are designed for changes in algorithms, services, and user requirements; in general, they are better suited to the needs of casual users of small volumes of data. The following sections discuss an architecture that meets these varying needs, an architecture that relies on a blend of active archives and long- term archives to provide an overall set of climate data services.

OCR for page 25
Long-Term Archive Services for Climate-Related Data The committee views a long-term archive (LTA) as more than a static repository for data—it sees it as allowing a broad set of users to examine, retrieve, or copy the data. While users of an archive might simply want to retrieve data, it is more likely that they would require further data processing. Issues related to the refinement or reprocessing of the data are key drivers in determining an appropriate architecture and implementation strategy for an LTA. Certain processing may be done best within the LTA, while other processing could be done in the specialized facilities of the users. The committee's views on the necessary functions of an LTA, which are based on its prior work4 and information gained at the February 7-8, 2000, workshop, may be summarized as follows: Ensure the long-term survival of the data by (1) ingesting, storing, migrating media, and cataloging all the data and (2) including all ancillary data and calibration and characterization of the instruments. History has shown that review and retrospective reprocessing of data are important elements in understanding climate and the effects of natural and anthropogenic changes on climate trends. Provide additional information, algorithms, and such higher level products as can aid in the future reprocessing of the data. Distribute data to the large-scale users and primary processors by (1) streaming data for continual and scheduled transfers, (2) allowing the primary user and/or processor to hold the data for distribution, and (3) allowing competitive bidding for these services by universities and other groups. Data-use analysis has shown that a small number of users account for the bulk of data transferred.5 As data transfer bandwith is a key design parameter, this effect must be accounted for in developing the LTA. What Should Be Stored? The distinctions among science, weather, and climate data are fuzzy at best. Much of the data can be used for several purposes and the distinction is more in the requirements for minimum latency and regularity of processing. The value of meteorological data used for weather prediction is ephemeral and the data must be processed quickly. However, the data may retain their value for other uses. While it may be valid to distinguish between operational and research processing for a facility, with the advent of the NPOESS system and the planned use of the operational system to support more research, that distinction is also less clear. Thus, EOS, NEXRAD,6 NPP, and NPOESS data may all reside in an LTA for "climate." The needs of climate researchers are embodied in the CDR concept. Although the NPP- and NPOESS-derived EDRs may have considerable scientific value, CDRs are far more than a time series of EDRs. Even the EOS data products will require continuous assessment and refinement

OCR for page 25
as knowledge of both the algorithms and the sensor improves over time. Although the lines may be indistinct, there remain fundamental differences between products that are produced to meet short-term needs and those where consistency of processing over years to decades is essential. Management of the LTA Participants at the committee's February 2000 workshop considered several approaches to managing the LTA. A particular challenge was establishing a balance between centralized oversight and local processing and dissemination. A loosely connected system with specialized elements appears to be the optimal approach: for example, a central LTA supported by individual, active data archive centers to work with specific data streams and data sets. The active data archives could be modeled on NASA's Distributed Active Archive Centers (DAACs) if problems identified in a recent NRC review7 are addressed. Although NOAA and NASA are seen as the responsible agencies, the committee believes there should be a competitive approach to implementing the individual elements. The committee believes NOAA should implement the central LTA, which is responsible for the long-term preservation of the data. However, other elements could be implemented by a mix of academic, nonprofit, and commercial organizations. Implementation must be incremental to foster evolutionary growth and continual innovation. Such an approach should also result in lower costs. The National Climatic Data Center and the Long-Term Archive The NOAA National Climatic Data Center is a potential repository of data to support climate research in the coming NPOESS era. NCDC is currently assigned the mission of managing the nation's resource of global climatological in situ and remotely sensed data and information to "promote global environmental stewardships; to describe, monitor and assess the climate; and to support efforts to predict changes in the Earth's environment."8 NCDC receives, processes, controls the quality of, archives, and distributes weather- and climate-related data from a variety of sources, including satellites, ground radar, and in situ measurement systems. NCDC currently houses approximately 700 TB of data; by 2010, the total archive at NCDC may exceed 5000 TB (see Box 3.1). Storage and protection of the data are only two of the functions of an LTA. Access to the data and the ability to reprocess the data as scientific understanding improves are also required. The reprocessing requirement is particularly challenging; the committee does not believe that the NOAA National Climatic Data Center has sufficient resources to fully address this challenge now. Further, it is evident that NCDC or future LTA sites will face even more demanding requirements as NPP and NPOESS data become available. Increased amounts of long-term storage and media migration are already taken into account in NCDC plans to accommodate increased volumes of data. However, the committee finds that there needs to be additional work on the treatment of several categories of data, including the following:

OCR for page 25
Recent data in active use that might be best held in a separate active archive; Aging data that are used in periodic reassessments and other studies; and Older data that have not been accessed in some stated interval of time. Previous workshops and studies provide some guidance on these issues; however, the committee recommends consideration of a standing LTA science team to ensure a user voice in the LTA design. BOX 3.1 Near-Term Projections of Future Needs at NCDC The technical and financial demands associated with managing the climate-related data anticipated from NPP and NPOESS may be seen by examining operations at the current repository of such data, the NOAA National Climatic Data Center. NCDC has a current annual budget of approximately $20 million, which supports a civil service staff of 175 plus 60 contract personnel. The present total digital archive at NCDC is approximately 700 terabytes (TB), and officials expect approximately 80 TB of data to be added in calendar year 2000. (About three-fourths of this amount is expected to come from ground NEXRAD radars.) NCDC officials told the committee that current funding is inadequate to properly manage this quantity of data. Large increases (about 14 TB annually) in data will come from the Initial Joint Polar System (IJPS) when the European satellite MetOp is launched in 2003. NCDC also expects more data (an increase of 50 percent) from NEXRAD as capabilities are developed to ingest these data via telecommunications from each site. NCDC is currently working with NASA on the long-term archival of EOS data; officials expect by 2003 to be ingesting an amount of EOS data equivalent to the amount of NEXRAD data that is currently being ingested. Further, it is anticipated that this load will increase by approximately an order of magnitude by the end of the decade. The scheduled launch in 2005 of the NPOESS Preparatory Project satellite will add another 90 TB of data annually. NPOESS operations, which might begin in 2009, would add another 228 TB of data annually. It is anticipated that by 2005 NCDC will be tasked to ingest and make available more than 1 TB per day, or about 500 TB of data annually, which will increase to about 1000 TB annually by 2010. By that time, the total archive at NCDC is expected to be approximately 5000 TB. NCDC estimates that it will require between $10 million and $15 million of additional funding each year to be able to handle the expected increase in data and information.1 Further, officials note that these estimates assume minimum levels of user services. In particular, there would be no browsing, reprocessing, or subsetting capability for users to interact with the data. Such higher levels of service would require significant additional

OCR for page 25
expenditures, in the range of $3 million to $5 million annually, according to NCDC officials. __________________ 1 According to officials at NCDC, these additional funds will pay for more personnel and a larger information technology infrastructure, including increased communications capacity, online and near-online storage capability, drives, cabinets, tapes, and software. SOURCE: Tom Karl, director of NCDC, Private communication, July 2000. Active Archive Services for Climate Data Making data available to the casual, small-volume user requires a different set of services than those provided by an LTA. These services include rapid access to subsets of the data, sophisticated search tools to locate coincident data, and a focus on providing online search and order fulfillment. These services support the production of customized data products and allow climate researchers to test new algorithms. Thus, long-term archive services are only one component of a climate data system, and climate researchers require "active archives" that are comparable to those of the research satellite missions. Climate research will require an effective integration of these active and long-term archive services, much as it requires an integration of research and operational satellite missions. Active archive services have sometimes driven system design. In EOSDIS, the requirements to meet the data needs of the casual and small-volume user grew and began to drive design. The small-volume user can drive costs because the many small data extractions and associated packaging and shipping are personnel-intensive and can require expensive hardware and software. This type of service accounts for the largest number of transactions and involves random access to the data rather than sequential transfers. As the system grows, such usage could become a large factor in the overall usage. The committee believes the seven DAACs in NASA's current system have been reasonably successful in serving their user communities. The DAACs have developed a working system for delivering reasonably large volumes of data to their user communities, largely by responding to evolving science requirements. Based on the experience of the DAACs, as well as the lessons of EOSDIS (see Box 3.2), the committee finds as follows:9 Where new active archives need to be developed, a rapid development approach should be employed to produce a working system that delivers data to the user community in about a year. Active archives can play a vital role in both the development and operation of LTA facilities. User representation should be ensured by making sure that each active archive has a user working group that is made up of representatives selected from the archive's users. The user

OCR for page 25
working group should be able to propose new services and to endorse or veto changes that the archive staff may suggest. The active archives and long-term archive facilities should participate in a federation that would work to resolve issues of common concern, such as ensuring that the active archives can back up and restore data in the LTA and that various user communities can obtain help in translating each other's terminology and data structures, and to encourage tool sharing among the community members. Both active archives and long-term archive facilities need to be flexible enough to accommodate changes in the technology they use on timescales between 2 and 5 years. CONCLUDING OBSERVATION Finally, the committee notes that the President's budget for fiscal year 2001 includes some $4 million for activities at NCDC related to development of the LTA. Even if this work is fully funded by Congress, it should be recognized that substantially greater investments will be required to develop the LTA and to address the issues raised in this chapter and elsewhere in this report. BOX 3.2 Lessons from EOSDIS A number of participants at the committee's February 2000 workshop expressed the view that the EOS Data and Information System (EOSDIS) had "failed" as a result of having been overspecified in design and overcentralized in execution. The system is thought to have been overly specified because it could not take full advantage of very rapid advances in computer and data storage technology, including the software associated with databases. The concern with overcentralization refers to the degree of central control in a system that was highly distributed. Workshop participants argued that requirements for processing and disseminating a particular data stream are best performed by the people who generate and use the data. NASA implemented EOSDIS as single contract, and this also contributed to the ongoing difficulties with the system. Building monolithic systems in an environment where technology is changing rapidly and user requirements are poorly known or changing is an impossible task. For example, the EOSDIS contractor consistently underestimated advances in computing capability and frequently invested in technology that then became obsolete. If NASA had funded contractors for specific functions and relied on a systems integrator to link the components, the system might have been more resilient.

OCR for page 25
The EOSDIS experience of cost and schedule overruns may also be informative regarding LTA development cost. Cost and development time grow rapidly with the size of the programs; therefore, it is important to attempt to minimize the size of program elements to be developed. In addition, because users of the resultant systems may not understand or express their requirements fully, the functions and requirements tend to grow during development. One of the presenters at the CES workshop argued that the problem of understanding the user and avoiding requirements creep is best addressed at the level of the individual user. Further, it was stated that attempting to address these issues from a large central organization has rarely worked in the past. 1 National Research Council (NRC), Board on Sustainable Development, Committee on Global Change Research. 1999. Global Environmental Change, Research Pathways for the Next Decade. Washington, D.C.: National Academy Press. 2 U.S. Global Change Research Program (USGCRP), Global Change Science Requirements for Long-Term Archiving: Report of the Workshop, October 28-30, 1998. Boulder, Colo., March 1999. 3 H. Jacobowitz, ed. 1997. Climate Measurement Requirements for the National Polar-orbiting Environmental Satellite System (NPOESS), Workshop Report, NOAA. College Park, Md., February 27-29, 1997. 4 National Research Council (NRC), Space Studies Board. 2000. Issues in the Integration of Research and Operational Satellite Systems for Climate Research: I. Science and Design, in press; National Research Council (NRC), Space Studies Board. 2000 Issues in the Integration of Research and Operational Satellite Systems for Climate Research: II. Implementation, in press. 5 B.R. Barkstrom, "Understanding Data Users and Predicting Their Behavior," Presented at the committee's February 7-8, 2000, workshop but dated July 16, 1996. 6 The Next Generation Weather Radar system (NEXRAD) comprises approximately 160 Weather Surveillance Radar-1988 Doppler (WSR-88D) sites throughout the United States and selected overseas locations. Information about NEXRAD is available online via links within the NCDC site at http://www.ncdc.noaa.gov/ol/radar/radarresources.html#WHATIS. 7 National Research Council (NRC), Board on Earth Sciences and Resources. 1999. Review of NASA'S Distributed Active Archive Centers. Washington, D.C.: National Academy Press. 8 From the home page of NCDC, available online at http://www.ncdc.noaa.gov/. 9 The committee notes the similarity of these recommendations to recommendations made nearly two decades ago by the National Research Council's Committee on Data Management and

OCR for page 25
Computation (see National Research Council, Committee on Data Management and Computation (CODMAC). 1982. Data Management and Computation, Volume I: Issues and Recommendations. Washington, D.C.: National Academy Press).