Click for next page ( 100


The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 99
Appendix C Prototyping for EOSDIS This Appendix was prepared by the Panel to Review NASAs Earth Observing System in the Context of the USGCRP. UNIQUE CEIALLENGES OF GLOBAL CHANGE DATA MANAGEMENT The types of data management now being undertaken are truly without precedent. Any process in the atmosphere or ocean is intricately entwined with numerous other processes because researchers are looking at the subsystems of an "object," the Earth. Gaining understanding and an ability to predict synoptic changes in the global system will require detecting and studying numerous interconnections among processes. Many different models are available for organizing a database, such as hierarchical, relational, and networked. Prototyping will be needed to learn how scientists will work with EOSDIS so that organizational schemes optimally suited to the functioning of different components of the system can be selected. 1 USE OF DATA ARCHIVES AS A RESEARCH LIBRARY Data collected under EOS and related Earth observing programs will form the research library for scientists trying to answer crucial questions about global change. There is no argument about the imperative to improve understanding as rapidly as possible. 99

OCR for page 99
10{) How are libraries used for research? The experience base is partly with libraries of printed material. Another relevant source is the on-line literature search. In the library, one starts with a card catalog, which has limited but valuable cross references. The on-line search allows logical combinations of subjects or keywords, which improves the precision of the search. But serious study invariably brings the researcher down to the level of the book index, and to a lot of old-fashioned browsing. If the books are in stacks with limited or no access, the job becomes increasingly difficult. On- line literature searches provide a somewhat more powerful ability to locate information that conforms to user-specified requirements. Its limitations result partly from the fact that only key words can be searched. Metadata, defined in the broad sense as the collection of important information about data, will form the library catalog for global change research. Research will require finding metadata and data through interre- lationships. If this cannot be done, scientists will find themselves thwarted in trying to trace complex causal effects, the understanding of which are the objectives of global change research. As with libraries, the more completely the metadata are accessible to the scientist, the more effective will be the research. Effective accessibility must include more than the equivalent of the card catalog. The slowness of finding a comprehensive set of relevant research material in a library, via indexes, tables of contents, and text scan- ning, will not be acceptable for answering urgent questions about global change. Furthermore, an efficient system will assist in keeping up with the high rates at which EOS data will be accumulated. DATABASE REQUIREMENTS FOR SCIENTIFIC METADATA The performance of existing systems for managing complex compila- tions of scientific data is not encouraging. For example, while catalogs of "available" planetary data are published, many potential users tell stories of their failure to obtain the data despite determined efforts. The successes, where they exist, can be instructive for EOSDIS. For example, one very effective geoscience data management system is available at the National Center for Atmospheric Research. Its success has been due in large part to the development of data management systems, quality controls, and data archives at a scientific center in consultation and collaboration with scientists. One of the few true prototypes is the system built at the NASA Space Science Center for plate tectonic measurements, which are unusually in- dependent of other geophysical events. Even for that dataset, for which queries might appear to be relatively predictable in nature, a highly sophis- ticated, intelligent system of layers (plus a natural language interface to the user) is used to process queries. The levels of complexity introduced

OCR for page 99
101 in global change research by following interactions among processes are absent from the tectonics dataset, however. The observation that emerges is that little is Mown about how scientists would use an EOS data management system, and it is premature to define it. Much is known about how to manage bank records and airline reservations and inventories. They involve large numbers of relatively simple and highly predictable transactions, e.g., queries and data operations. Such systems must keep instantaneous track of all changes, such as bank balances, and airline seating availability. The requirements for global change are different. Except for new entries, there will be little change in metadata already entered, so that keeping track of the system state on a second- by-second basis will not be needed. But the queries posed will tend to be complex and of a highly unpredictable nature. They will be driven by the mandate to the scientist: to understand connections between different elements of the system, to understand underlying causes, and to develop the an ability to predict. Research is needed to learn how scientists will work with EOSDIS through prototyping to select a system well suited to the functions of global change research. TIMELY ACCESS TO LARGE DATASETS The history of dealing with large datasets is also discouraging. Re- sponses to data requests can be slow, and the NASA and NOAA datasets are known to be difficult to obtain. Current datasets in both agencies are minuscule compared with those in the predicted EOS archives. Obtaining timely answers to pressing issues of global change that may affect society will require performance at a hitherto undreamed of level. Prototyping of this aspect of data management can be done with the datasets already in existence, and much could be learned by developing a system to efficiently locate and deliver data from existing archives. MISSING AND BAI) DATA There is a continuum of problems with data that needs to be addressed in any data management system and that can be explored with prototype experiments. Potential problems with data range from the predictable corrections that must be made on any dataset, through data tagged as "bad" according to some set of criteria, to data that is missing because either an expected measurement was not made or "bad" data were eliminated from the dataset. First, it is necessary to provide complete information about locations of missing data, so that the user does not discover until after investing both human and computer time that a dataset chosen for analysis is unusable.

OCR for page 99
102 Second, decisions must be made about' how to handle a segment of data that is "bad" from one point of view but may contain useful information for some other research purpose. Experiments must be done by scientists using data for research purposes and for checking the effectiveness of different approaches to their problems. Experimentation must then be done on how to integrate the method into an overall data management scenario. Another concern is one of data integrity. Scientists should begin using data as quickly as possible after obtaining it because standard error checking algorithms may not identify data that look useable but do not make sense physically, perhaps, for example, because of a malfunctioning instrument. Solutions may involve getting data online rapidly (which alone does not guarantee that it will be used quickly) and developing sample analysis programs with more sophisticated algorithms that will reveal subtle but systematic nonsense errors. VISUAL BROWSING Clearly, browsing is already an important element of data management. Less clear is how, in the future, it will be possible to use this technique for any selection of data as a skimming technique. The process of locating data to browse will thus use the tools mentioned earlier in the discussions of data archives and metadata -for research. Prototyping of browsing must include both workstation visualization tools and the full range of data management tools that make it possible to find data likely to be of interest. ACCESS BY MANY Prototypes should reflect the situation that will obtain in the EOSDIS era; that is, they should provide easy access to everyone with a need, just like the analogous library. While those involved can be expected to determine the design of the prototypes, any who need access to the prototypical systems should be allowed it, within reasonable financial constraints. Only in this manner can the prototypes be tested for their effectiveness in serving the needs of the broader scientific community. Lessons learned from such tests will be the real products of prototype development that must be incorporated into the design of EOSDIS.