Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 72
8 The EOS Data and Information System Does the proposed EOS Data and Information System (EOSDISJ represent the appropriate approach to support this lon~term data coilecfion and modeling effort? The preeminent challenge to global change research is the synthesis of diverse types of information from different sources. EOSDIS is a pioneenog effort in this regard: the intended scope of the system far exceeds that of any existing civilian data management system. Nonetheless, relevant experience for developing some aspects of EOSDIS exists in pilot data programs in NASA and in the data programs of other disciplines and agencies. If EOSDIS succeeds, it will be both a key ingredient in the success of the U.S. Global Change Research Program (USGCRP) and a substantial contribution to the field of data management. On the other hand, there is no operational paradigm for the effective management and dissemination of large scientific datasets under which data can be obtained readily and quickly for analysis and research. Nor is there any formula for success in such an endeavor. The EOS instruments will be examples of advanced scientific and engineering con- cepts, producing many simultaneous data sets, each with large amounts of data. Significant advances in data management will be required so that these data will be readily available and useful for modeling global change. 1b prepare for this challenge, NASA proposes to commit a major frac- tion of the EOS budget to the Earth-based component, including EOSDIS. NASA should be commended for its early recognition of the importance of 72
OCR for page 73
73 the EOS Data and Information System (EOSDIS) to the success of EOS and the USGCRP. This importance is best summarized in the words of the EOS Science Steering Committee: "The key to the Eos concept and to its ultimate success in meeting the needs of the Earth science community is the data and information system. This system must be the foundation upon which the rest of the mission is built; it will be the means fly which all Eos results are collected and communicated." [pp 25-27 From Pattern to Process: The Smategy of the Earth Observing System] We agree that investing in the early development of EOSDIS is appro- priate and necessary for the long-term success of the EOS data collection, management, and modeling effort. Some previous NASA missions have suffered from depleted budgets before the data processing and scientific analysis phases were done, resulting in a poor scientific payoff. The EOS- DIS program is beginning with a healthy commitment to data management and analysis. Investment does not guarantee success, however. The EOS program will be large and complex, and many potential pitfalls will be faced in the course of its implementation. While the importance and challenge of EOSDIS are understood, it is not equally clear that the route to achieving such a system and the resources and advance planning required to imple- ment, maintain, and evolve it are fully appreciated. The management of very large databases with provisions for indexing, browsing, visualization, and other capabilities is a research issue. Current understanding of how to meet this challenge is not mature. A program of research is needed to guide the evolution of the proposed data mangement capabilities. CHARACTERISTICS OF THE SYSTEM EOSDIS will support a variety of scientific activities. According to NASA plans, its major functions are: mission planning, scheduling, and control; instrument planning, scheduling, and control; effective resource management; communications; computational facilities to support research; production of standard and specialized data products; and archiving and distribution of data and research results. NASA currently plans to have EOSDIS operational well before the launch of the first EOS spacecraft. The processing, archiving, and dis- tribution of data are to be functional by 1994. Prior to the first launch, EOSDIS plans are to exploit currently available data, enhance the data acquisition and processing capabilities for ongoing missions, and correct
OCR for page 74
74 some long-standing deficiencies in the access to data. Development is to begin immediately, building on the existing infrastructure. NASA plans call for a network architecture that is open and distributed, capable of evolving with advances in computing and networking. In the terms used by NASA, "EOSDIS must adhere to a flexible, distributed, portable, evolutionary design and operate prototypes in a changing exper- imental environment." In our view, this is the right approach, but these goals are easier to state than to accomplish. The challenge of achieving them must not be underestimated. In the development of EOSDIS, NASA has had the benefit of interac- tions with other federal agencies and the external scientific community. In the former case, the Interagency Working Group on Data Management for Global Change has been a forum for discussing data, distribution, format, access, cataloging, and related topics affecting interagency management of data and their accessibility. Agencies participating include NASA, NOAA, USGS, DOD, DOE, and EPIC In addition, the EOS Investigators Working Group has organized a Science Advisory Panel for EOSDIS that is charged to represent the scientific community associated with EOS in advising NASA on matters related to EOS data production and scientific interfaces. Evolution The computer industry has experienced increasingly rapid technological evolution in both components and architectures. EOSDIS must be flexible enough to take advantage of inevitable advances in hardware and software capabilities, particularly in areas such as high-performance computing, data storage media, disk controllers, networking, and data base management. How hardware and software technologies will change over the next 25 years cannot be predicted, and traditional contract specifications for EOSDIS hardware and software written today cannot remain unchanged over the long term. Consequently, NASA should not attempt to define total system specifi- cations at the outset and then assume that they will not be altered through- out the remainder of the program. First, the evolutionary approach should rely heavily on experiments with prototype elements and include continu- ing interactions with and testing by members of the research community. Second, EOSDIS should have a system architecture sufficiently flexible to accommodate changes and to implement them in an evolutionary manner. Third, priorities for EOSDIS should be driven and determined by the research, monitoring, and modeling that will have to be carried out in answering fundamental questions about global change. When modifications to original specifications must be made for budgetary, performance, or other reasons, the global change research community should have a major role
OCR for page 75
75 In advising on priorities. A broad scientific input will assure that EOSDIS priorities are based on research requirements. Diversity of Data and Information The success of EOSDIS, like that of the USGCRP, will ultimately be judged on scientific results rather than by how many bits can be processed for the fewest dollars. 1b achieve the objectives of the USGCRP, the system must include datasets from a diverse array of space- and ground- based sources, which Poses significant challenges to its design. Data Diversify The demands of the USGCRP require that data and information from EOS be merged with datasets from a variety of disciplines and sources. EOSDIS must be able to cope with a wide spectrum of data and information types. It cannot be focused simply on the data needs of NASA, or even those of the United States. EOSDIS must provide the capability for accessing and interpreting data and information from many agencies domestically and from a number of other countries. Many ancillary datasets, which will be needed to exploit the scientific value of EOS data properly, are likely to be collected and held by agencies other than NASA, such as NOAA and USGS. The USGCRP recognizes this need, as do the participating agencies. In particular, NASA, NOAA, NSF and the USGS, as agencies prominent in the collection and dissemi- nation of data, have a special responsibility for working together to assure greatest accessibility to all data and information relevant to understanding global change. Some NOAA, NSF, and USGS centers are already primary repositories of geoscience data, and they have substantial experience and expertise in data archiving that could be of advantage to EOSDIS. The ob- jective should be an integrated national system for processing, distributing, archiving, and retrieving data and information about global change. Human Interactions While the needs for data and information by the human interactions component are not yet well defined, the data and information involved with this research will be sufficiently different from those customarily collected and archived in earth remote sensing missions that special attention should be paid to this field. As discussed in Chapter 5, NASA has an important role for ensuring that EOSDIS is responsive to these requirements as they evolve. We therefore encourage NASA to work with others, including both the natural and social sciences research communities, to assure that EOSDIS will contain data useful for human interactions research.
OCR for page 76
76 Conversely, EOSDIS could also provide the means for physical scien- tists to obtain those human interaction datasets that might enhance their scientific studies or that might help define the relevance of their research to sociological, political, and industrial decisions. A Distributed System In a project of this scope, there are inevitably divergent views on the proper balance between distributed and centralized responsibilities for data management. The appropriate balance can only be determined through experimentation and experience. A distributed system should be used to take advantage of scientific expertise as well as computational facilities. This basic requirement argues strongly against the development of a centralized system. Though distributed, EOSDIS should appear to users as a single in- tegrated system. All users want "one-stop shopping" for their data. The technical means are available to build a system that is distributed nationally and even internationally so that the user needs only a single point of entry to access all the data. EOSDIS planning within NASA seems to be heading in this direction. For example, NASA proposes to establish a network of EOSDIS Distributed Active Archive Centers (DAACs). Seven have been selected to date. Each DAAC would carry out routine production, distribution, and archiving of EOS data and data products. In addition, NASA has proposed that a number of Affiliated DAACs be located "outside of the critical path for EOSDIS." Among other functions, these Affiliated DAACs would provide access to important non-EOS data and seances. While planning documents apparently are still in a state of flux, we strongly endorse the concept of distributed archive centers charged with storing data and data products and making them available to users. We are concerned, however, about two important issues: criteria for selection of the centers and relationships between the DAACs and the Affiliated DAACs. We were unable to obtain clear descriptions of the criteria for the selection of DAACs or of the selection process. The criteria should be readily identifiable and publicly stated. It might be desirable for a joint working group of NASA personnel and extramural scientists to define the operational and scientific criteria. The broader questions are: What is the total range of responsibilities that can be defined for dealing with data issues before and during the EOS missions? Which activities should be handled at DAACs, which elsewhere, and how are sites for all activities to be selected? Of the seven DAACs named thus far, five are at NASA centers.
OCR for page 77
77 NASA:s stated objective is to build a national distributed data and infor- mation system whose principal aim is scientific understanding of global change. We believe that such a system is likely to benefit from involvement by centers outside NASA, particularly in the academic community. We recognize that some NASA centers have extensive experience on which to build EOSDIS. It is natural that some will become critical EOSDIS centers. Nevertheless, we believe it is important to establish an objective process that includes peer review before DAACs are named and funded. Such a procedure would optimize the effectiveness of a distributed EOSDIS consistent with the priorities of the USGCRP. Though some centers outside NASA have been identified in EOSDIS planning documents as Affiliated DAACs, their role and status in the EOS program have not been well established. It is also unclear whether they will receive adequate support. In summary, the entire matter of the DAACs needs study and clarification. NASA'S DATA POLICY NASA policy is that all data collected by the EOS program will be archived in EOSDIS, and all EOSDIS data will be made available to the research community at the incremental cost of distribution. EOS data and information will be available to all users; the only distinction among users will involve cost. There is to be no period of exclusive access for any group, including the EOS principal investigators. Where EOS sensors make site-specific observations, EOS will be an "acquire-on-demand" system as opposed to a "process-on-demand" system. The data system is to provide unprecedented ease of access to observations. NASA hopes to have a common data policy for the entire international suite of data. According to NASA:s EOSDIS policy, users who agree to place the results of their investigations in the open literature will pay only the nominal incremental cost of reproducing and delivering the data requested. In exchange for access at low cost, these users must agree, through the stipulations of a standard "Research Agreement," to make available to the research community the derived data, algorithms, and models at the time of acceptance for publication. Low cost data are to be used only for the researcher's bona fide research purposes. Data may be copied and shared among other researchers provided that they are covered by a Research Agreement or that the researcher who obtained the data is willing to take responsibility for compliance. Commercial users of EOSDIS will be charged market prices through an intermediate vendor. We welcome NASAs policy of open distribution for research of all EOSDIS data. Since the EOS program will be judged on its scientific results, maximizing the scientific use of the data is the optimal strategy.
OCR for page 78
78 Moreover, as a repository for an extensive range of data pertinent to global change, the success of the USGCRP also depends on the openness of EOSDIS. A number of impediments to accessing the data are currently limiting the effective scientific use of various existing datasets for global change re- search. In this regard, we would call attention to two impediments-related to data management policies and insufficient resources that require im- mediate attention if EOSDIS is to be successful. Landsat and Other Commercialized Datasets Current policies that govern the use, distribution, and cost of the Landsat and SPOT data make it difficult for the research community to take advantage of this resource. When purchased from the commercial remote sensing industry, the data are generally too expensive for most research purposes. Current legislation intended to protect the commercialization of Land- sat remote sensing activities unfortunately prevents the inclusion of these important data in EOSDIS unless the government purchases the data for that purpose. In our view, Landsat data are sufficiently important to global change research that means should be found to include them in EOSDIS, whether by revising the Land Remote Sensing Commercialization Act, if necessary, or by paying (again) for the data. We also believe that it is in the interest of international research to make all environmental data readily available to the global scientific com- munity. Indeed, this is NASA's stated policy in regard to EOSDIS: scientists anywhere in the world with appropriate telecommunications equipment will have access to EOSDIS provided that they adopt the standard research agreement. Only by such a strategy will the usefulness of the data be maximized. Similarly, U.S. scientists should have access to relevant data in foreign archives, and it is important that other nations be encouraged to establish similar data policy assessments. Preservation of Historic Datasets Changes in technology and insufficient attention to maintenance of irreplaceable historical damsels currently in various archives threaten to limit their usefulness. In some cases, valuable data may be lost altogether. For example, almost all the NOAA 1-km AVHRR data obtained prior to 1986 have essentially become inaccessible because they are still stored in an outdated system. Early Landsat data at the USGS's EROS Data Center in Sioux Falls, South Dakota, and NOAA geosynchronous satellite data at the University of Wisconsin are at risk of being lost forever because their
OCR for page 79
79 storage media are deteriorating. The success of global change research will depend in part on the availability of a long time series of measurements. The existing archived data are a most important resource in this regard. If lost, they are not replaceable. The USGCRP recognizes the need to preserve historical data and includes some funds for the purpose in the FY 1991 budget. We underscore the urgency of moving all relevant data to more secure storage media and incorporating them into the EOSDIS as soon as possible. RESEARCH AND PROlYPING NEEDS Plans for EOSDIS emphasize that the system will maintain continuity with current data systems because the current data centers provide a heritage for the design and prototyping of EOSDIS. If this approach is to succeed, those involved with EOSDIS development and implementation must be committed to an evolutionary approach using system prototyping. We are concerned that this commitment has not yet taken hold, and it may be difficult to establish once a contract is written to procure design and implementation services. Contracts normally include precise specifications for deliverables, but in the case of EOSDIS a description of the system and even its performance goals is premature. EOSDIS must be an evolving entity, even over the entire life of EOS; it is not a system that can be designed, implemented, and then left to function unaltered throughout the remainder of the EOS program. NASA appears to view EOSDIS as an evolving entity, but research will be required to provide the basis for implementing EOSDIS. The proposed structure of EOSDIS, the various entities involved, and their interactions were evolving conceptually even during the brief period that we were conducting our review. We regard this as a healthy sign for the future of EOSDIS. ~ ensure that the evolution continues, there should be a coordinated plan that incorporates a deliberate program of data system research and prototype development. Prototyping In our view, the community of researchers, including those in the fields of high performance computing and data mangement, is not yet ready to start designing certain aspects of EOSDIS. In many areas prototyping efforts should be under way, with dedicated people directing them. A number of areas where work is needed are listed in Table 1 and discussed in Appendix C. The list is long, but it is difficult to shorten it because the community collectively has not yet reached the level where the magnitude of the problem is understood.
OCR for page 80
80 TABLE 1: Elements of EOSDIS Requiring Prototype Development Data visualization and the user interface Browsing capability Data formats and media Accessibility of data and information Cataloging Search and query capabilities Model and data interaction Metadata* and data structures Data reduction algorithms Networking * Metadata is defined in Appendix C as information about data, such as documentation. Questions still to be answered include how to get NASA, other agen- cies, the DAACs, contractors, and independent scientists and computer engineers working together in defining problem areas; analyzing require- ments; and using the results to establish prototype approaches that are likely to be effective; creating designs; working together on interfaces, design, capabilities, and other aspects; bringing the creativity of comple- mentary kinds of expertise together; and exploring alternative approaches that conserve available human and financial resources. Prototyping should address the challenge of the immense size of EOS datasets, which will dwarf any previous experience. As part of the proto- Wping activity, some EOSDIS centers should be co-located with institutions with high-performance computing and global modeling capabilities. Such centers, which should have the capability for large temporary data repos- itories, could well be at research centers but not necessarily at EOSDIS DAACs. The high-performance computing capabilities required to effec- tively exploit EOS datasets are likely to greatly exceed current expectations. Because of its scale, EOSDIS is likely to break new ground in the use of computing technology in research. Pathfinder Datasets EOSDIS plans call for the creation of `'Pathfinder Datasets" that will incorporate currently existing data important to global change studies. The data sets include derived data products, chosen on the basis of community consensus. The procedures will include the validation and archiving of derived products. Among the datasets under consideration are AVHRR,
OCR for page 81
:~ ~1 GOES, TOYS, and others from earlier satellite missions. The selection of the Pathfinder Datasets will be driven by global change science require- ments. We endorse this approach. There is a critical need that proto or pilot datasets be made available to the scientific community. ~ challenge the system effectively, such sets should cover a range of sizes typical of those to be produced by EOS instruments, for example, from less than half a terabit to significantly greater than a terabit. INDEPENDENT SCIENTIFIC ADVICE We strongly believe that cooperative and constructive interaction among scientists in the global change research community, on the one hand, and software design engineers and software implementation pro- grammers, on the other, will be crucial to the success of EOSDIS. In the business community, it would be unthinkable for a software firm to design a large package for management of bank records or airline reservations without steady interaction with users and corporate executives. The former can provide feedback on what works and what does not, and the latter on what the system priorities are. In a project like EOSDIS, individual scientists will be needed to work actively with the system as it develops and to provide feedback A committee of scientists dedicated to the goals of USGCRP can provide an overview of the top priorities. Experience gained from software development for serving the needs of scientific communities has pointed strongly to the same conclusion. Major software systems developed without steady involvement of users have generally been failures; those that maintained advisory committees of scientific users were almost guaranteed success (provided the design and programming expertise was present) because potential problems were caught in the design or early development. 1b date, NASA has worked effectively with several committees that consist partly or wholly of scientists, and all indications are that this has been a positive and productive interaction. Such cooperation should con- tinue throughout the development of EOSDIS, notwithstanding histori- cal precedent and the fact that scientists and software engineers tend to speak different languages and sometimes have trouble communicating. We strongly emphasize the importance of maintaining a three-way structure in the evolution of EOSDIS, with NASA and other agencies involved in the USGCRP, research and instrument scientists, and the contractors work- ing together as equal partners. Excluding any of those three groups from being involved with software testing and decision making would be a major mistake.
OCR for page 82
82 In our view, the EOS Investigators' Science Advisory Panel for EOS- DIS has successfully focused the concerns of a broader research community, and has articulated the specific requirements of scientific users in the con- text of the EOS mission and the USGCRP (see, for instance, the EOSDIS Science Advisory Panel's November 1989 report). EOSDIS planners have been responsive to the advice of the Science Advisory Panel, and as a result of its guidance a major redirection of EOSDIS design strategy has taken place in the last few months. Because the implementation of EOSDIS poses significant, continuing challenges, we believe that EOSDIS must have an ongoing mechanism for acquiring independent advice from the user community. The EOSDIS Science Advisory Panel should be a long-term advisory element in the agency's planning and implementation process. It should include' some scientists who are not EOS investigators but who are active in fields with the range of scientific disciplines involved in global change research as well as in research on data and information management. We therefore recommend that the Panel EOSDIS Science Advisory Panel continue to perform its function throughout the EOSDIS procurement, design, and development cycle to ensure that all the major scientific requirements are effectively met.
Representative terms from entire chapter: