National Academies Press: OpenBook
« Previous: Information for Global Change
Suggested Citation:"The Present System." National Research Council. 1991. Solving the Global Change Puzzle: A U.S. Strategy for Managing Data and Information.. Washington, DC: The National Academies Press. doi: 10.17226/18584.
×
Page 17
Suggested Citation:"The Present System." National Research Council. 1991. Solving the Global Change Puzzle: A U.S. Strategy for Managing Data and Information.. Washington, DC: The National Academies Press. doi: 10.17226/18584.
×
Page 18
Suggested Citation:"The Present System." National Research Council. 1991. Solving the Global Change Puzzle: A U.S. Strategy for Managing Data and Information.. Washington, DC: The National Academies Press. doi: 10.17226/18584.
×
Page 19
Suggested Citation:"The Present System." National Research Council. 1991. Solving the Global Change Puzzle: A U.S. Strategy for Managing Data and Information.. Washington, DC: The National Academies Press. doi: 10.17226/18584.
×
Page 20
Suggested Citation:"The Present System." National Research Council. 1991. Solving the Global Change Puzzle: A U.S. Strategy for Managing Data and Information.. Washington, DC: The National Academies Press. doi: 10.17226/18584.
×
Page 21
Suggested Citation:"The Present System." National Research Council. 1991. Solving the Global Change Puzzle: A U.S. Strategy for Managing Data and Information.. Washington, DC: The National Academies Press. doi: 10.17226/18584.
×
Page 22
Suggested Citation:"The Present System." National Research Council. 1991. Solving the Global Change Puzzle: A U.S. Strategy for Managing Data and Information.. Washington, DC: The National Academies Press. doi: 10.17226/18584.
×
Page 23
Suggested Citation:"The Present System." National Research Council. 1991. Solving the Global Change Puzzle: A U.S. Strategy for Managing Data and Information.. Washington, DC: The National Academies Press. doi: 10.17226/18584.
×
Page 24
Suggested Citation:"The Present System." National Research Council. 1991. Solving the Global Change Puzzle: A U.S. Strategy for Managing Data and Information.. Washington, DC: The National Academies Press. doi: 10.17226/18584.
×
Page 25
Suggested Citation:"The Present System." National Research Council. 1991. Solving the Global Change Puzzle: A U.S. Strategy for Managing Data and Information.. Washington, DC: The National Academies Press. doi: 10.17226/18584.
×
Page 26
Suggested Citation:"The Present System." National Research Council. 1991. Solving the Global Change Puzzle: A U.S. Strategy for Managing Data and Information.. Washington, DC: The National Academies Press. doi: 10.17226/18584.
×
Page 27

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

3. The Present System For more than a century earth scientists have made national and international arrangements for the management and interchange of data. These arrangements have evolved in patterns driven by the needs of the user community and can serve as a base to meet global change research needs. The present system has many strengths and should be more fully exploited for global change research purposes before new data management elements are created. In recent years the need for effective environmental sciences data management has increased tremendously because of scientific requirements and technological advances in observing capabilities. However, the data management system has not kept pace with these advances, largely because data management for secondary users has low priority in the funding and execution of scientific research. The data management system is no longer adequate for current scientific activities, let alone for the unprecedented challenges posed by a global change research program. The following sections summarize some major issues to be addressed if the U.S. Global Change Research Program (USGCKP) is to build effectively on the present system of data management. Will and Commitment A major challenge facing the USGCRP is the development of the collective will and commitment to managing data and information properly. The critical problems of setting up a data and information system are often perceived as being technical. They are not; they are policy problems. There are indeed some important technical problems to be overcome to create the system. However, these are not as serious as the policy issues. 17

18 THE PRESENT SYSTEM The financial and moral support of data management is inadequate. Federal agencies should support data management with adequate funds. The scientific community should support data management by being part of its development and operation. The enthusiasm of scientists for the data management part of research programs is tepid. As a result, federal agencies do not feel the pressure to act; budgets suffer. The funding allocated by the agencies for data storage and dissemination generally has been the weakest part of their research budgets. Factors contributing to budget inadequacies include the frequent need to cover cost overruns in other parts of a project, unrealistically low estimates of data management costs in order to sell a project, or ignorance or apathy about resources needed for data archiving and preservation. Most scientists would rather see budgets directed to their perceived "real research" than to data management. Thus, federal managers who fund data and information management inadequately are not often called to account. Data management is "everyone's second priority." If a system for the USGCRP is to succeed, there is a need to show that data management is valuable. Because of the lack of enthusiasm and of a policy for handling scientific data, data facilities in the federal government have suffered a general decline of funding relative to other research endeavors. The result has often been decreased efficiency of data processing and decreased data availability. Fjdsting centers offer potentially unique combinations of long-term continuity, general accessibility, data management expertise, and user orientation required to broaden the use of the data. The data management system for global change must build on existing national centers. Thus, the present underfunding of data centers, in particular, and data management, in general, must be remedied. Data Centers The current U.S. national data management system involves relatively large national data centers, specialty data centers not necessarily charged with long-term preservation, and many large collections at academic centers. The holdings of most of these centers

THE PRESENT SYSTEM 19 are available to users at large. In addition, there are many project data centers, at federal and academic laboratories, where often the data are generally available to only a small number of users associated with the project. The system has a number of strengths. Data, in general, are not willfully destroyed or discarded. The variety of holding sites means that data are held where there is some measure of local expertise on the particular data class. A fraction of the holdings are exercised by scientific users and reworked into quality-controlled, multisource datasets that are returned to data centers. There is a widespread determination to create an effective system for global change research. The Interagency Working Group on Data Management for Global Change (IWG), created in 1987, is an ad hoc voluntary group with senior representatives from the federal agencies involved with data management for global change. The purpose of the IWG is to coordinate interagency data and information management. Currently the group is developing a coherent interagency plan for handling global change research data. The system also has weaknesses. They include the lack of links, both managerial and operational, among the many components. This means that it is not possible systematically to find the holdings of one data center by calling another and that a project data center may not transfer all its expertise, documentation, and data to a national data center before terminating. There is a lack of interagency agreements and policy regarding the preservation and enhancement of data. A scientist has difficulty securing funding for reworking data in a manner that would improve their quality and usefulness for more than a group with narrowly defined scientific interest. The funding at national data centers has generally decreased in relative terms, while the numbers of datasets and users have increased. There is difficulty in funding "technology conversions" at national data centers, where data held on, say, a thousand decaying low-density tapes could be copied to ten new media such as optical disks or helical-scan magnetic tapes. Science users have little say or control over national data center operations. The data centers sometimes have difficulty obtaining usable data and documentation from scientists after a reasonable period of privileged use.

20 THE PRESENT SYSTEM Data centers exist for many disciplines. For example, within the National Oceanic and Atmospheric Administration (NOAA), a set of environmental data centers is operated with the mandate for national support. The National Environmental Satellite, Data and Information Service (NESDIS) operates three national data centers: the National Climatic Data Center (NCDC) in Asheville, North Carolina; the National Oceanographic Data Center (NODC) in Washington, D.C.; and the National Geophysical Data Center (NGDC) and the associated National Snow and Ice Data Center (NSIDC) in Boulder, Colorado. Other federal agencies have also established data center functions relevant to their particular missions. Some examples are National Aeronautics and Space Administration's (NASA) National Space Science Data Center (NSSDC) in Greenbelt, Maryland; the U.S. Geological Survey's Earth Resources Observation System (EROS) data center in Sioux Falls, South Dakota, and the NASA's Ocean Data System (NODS) in Pasadena, California. There are other smaller data centers operated or supported by the Departments of Interior, Energy, Defense, and Agriculture. The National Center for Atmospheric Research (NCAR) in Boulder, Colorado, maintains a Data Support Group. This list of centers is not exhaustive. Though they are not always identified as such, these centers are, in effect, de facto national centers for their scientific specialties. In the geosciences an international network of World Data Centers (WDCs) has been in operation for more than 30 years. The WDC system, in which U.S. data centers play a major role, is an excellent foundation for international exchange of global change research data. However, the WDC system does not yet include data from several disciplines (e.g., biological data, socioeconomic data, atmospheric chemistry data) that are critical to understanding global change. Recently, some data centers have been establishing stronger links with the research community. One such link worth noting is that between the NODC and the Scripps Institution of Oceanography. The Joint Environmental Data Analysis Center (JEDA) combines scientific use and quality control of oceanographic datasets at Scripps with archiving and distribution at NODC. With the possible exception of the NOAA/NESDIS centers, the national centers have loose and informal linkages. Rapid computer

THE PRESENT SYSTEM 21 connections between data centers or between centers and researchers are sometimes minimal or nonexistent . Except for the NOAA/NESDIS centers, there is no formal coordination among the managers of data centers. The tendency for the holdings of the various centers to have evolved in disciplinary patterns without strong interdisciplinary links is therefore not surprising. However, an interdisciplinary approach will be important in meeting the goals of the USGCRP. The centers have their data in a variety of formats. Many are high-resolution digital values in computer files, photographic images, handmade drawings, numerical tables, analog recordings, microfilm, etc. For example, seismic and meteorological data are quite different in character, especially in view of the meteorologists' extensive use of objectively analyzed gridded fields at regular time intervals. The centers tend to serve a relatively narrow clientele that is generally familiar and comfortable with the major data types and formats of the discipline. Some important fields that will be involved in global change research do not have established data centers. For example, there are no recognized national data centers for ecological, biological, and geological data. These gaps must be filled if an effective global change data management system is to be established. Effectiveness of the Centers We begin with two comments: 1. The national data centers are staffed by dedicated data management professionals who have been limited by decades of second-priority budget status. 2. Members of the Committee on Geophysical Data have not had the opportunity to visit recently all of the centers. (More site visits and reviews are planned, at the request of the operating agencies.) The comments that follow are based on the records of many visits in earlier years, on presentations by data center personnel, on documentation provided by the centers, or on personal experiences in working with the data centers. The national data centers are essential. However, it is the committee's opinion that the existing structure of the centers will not serve all the purposes of the interdisciplinary USGCRP. In addition to

22 THE PRESENT SYSTEM discipline-oriented data services, there is the need for centers to provide issue-oriented information services. Existing centers or new information analysis centers must provide information and data to meet the needs of interdisciplinary global change research issues and not continue simply to treat data on a discipline basis. Data center funding has been a significant problem. The impacts of funding limitations have been particularly severe at the data centers operated by NOAA. Some centers are technologically behind—some perhaps by decades. As a result of a lack of the technical means to cope with the increasing size of recent datasets, some satellite and hard-copy data have been effectively lost during the past several years. Some centers are perceived as doing a better job than are others of providing data and information for research purposes. Two data centers subjectively viewed as particularly successful by the science community are the Carbon Dioxide Information Analysis Center (CDIAC) of the Department of Energy in Oak Ridge, Tennessee, and the NCAR's Data Support Group in Boulder, Colorado. Both the CDIAC and the NCAR Data Support Group have a strong linkage between data center activities and the scientific community. Dataset users have been involved in the development of the centers and provide ongoing feedback. Both centers operate under special conditions which are not applicable to many national data centers. Though both have limited value for global change research, we cite them as examples that could be used in creating a new system. The CDIAC at Oak Ridge National Laboratory sees itself as an issue-oriented information analysis center. In this role it contrasts with the discipline-oriented data centers. The scientific issue it supports is research into environmental consequences, such as the greenhouse effect, that are potentially related to carbon dioxide. CDIAC is funded by the Department of Energy. It is located at a national laboratory, actively conducts research into many CO2-related issues, and is run by a staff that includes data managers and scientists. Unlike the national data centers, CDIAC's data holdings are not voluminous. They are chosen to be those considered most essential to the pursuit of CO2 research goals. Datasets and information distributed by CDIAC go through extensive quality assurance procedures, often with the help of researchers in the field. The datasets are documented

THE PRESENT SYSTEM 23 with information needed to interpret the data by someone unfamiliar with its generation. Typical documentation describes the limitations of the data and comes bound with reprints of pertinent journal articles and reports. CDIAC takes an active role in developing needed data and information. It fills a useful niche but cannot serve as the only model for a data center. It does not operate under the constraints of most of the national data centers, nor does it charge for any of its services. It does not comprehensively archive datasets in a given discipline; rather, it works with the user community to develop, to assure the quality of, and to document only those datasets selected as the most important in addressing its issue-oriented mission. The NCAR's Data Support Group has served as a model of a successful disciplinary data system. This group is considerably smaller (four to six full-time personnel) than most data centers. It has benefitted—perhaps uniquely—from the computational facilities at NCAR, from the expertise and dedication of its small staff, and from its support by the NCAR and university community of atmospheric scientists. This group has been successful not only in distributing datasets to a broad range of users, but also in providing, usually informally, documentation and information on data availability. The NCAR holdings include a wide variety of raw and instantaneous station reports, time-averaged (e.g., monthly) station data, satellite-derived fields, operational analyses from various centers, value-added analyses (e.g., hemispheric surface temperature grids spanning a century or more), ocean surface datasets, and output from global climate models. From the USGCRP perspective, NCAR must be viewed strictly as a disciplinary center. It has, to a large extent, been free to choose its holdings—although these choices often respond to user needs. The NCAR's Data Support Group functions in an environment surrounded by users. The group has kept its approach simple. It uses high-tech solutions only when necessary. These factors, which have contributed to the success of the NCAR system, can be regarded as luxuries that may not be available to all the system components of data management for global change. Nevertheless, the NCAR example shows that a data management system can work. It may provide useful input to the design of a much more complex system for global change.

24 THE PRESENT SYSTEM Finding Data and Information Not all significant public-domain data are in national repositories. This situation has developed for a variety of reasons: economic policy, research efficiency, and the personalities of principal investigators. Moreover, the secondary user is often unaware of the range of data available through the data centers and may be unaware even of the centers' existence. Many scientists face major obstacles in finding out what perti- nent data are available; in some situations, obtaining the data may be a practical impossibility. In other cases, inadequate dataset documen- tation makes personal contact with the primary user imperative. There is a need for all centers holding data acquired through federal funding to provide well-documented information about the extent of their holdings and the accessibility of the data. Furthermore, data interchange between agencies will become a major issue as global change programs require data from ever-wider sources. The lack of interoperability between data directories is a serious deficiency of the current system. It must be addressed if the USGCRP is to draw on existing and future data holdings. Users must have an easy method to access an on-line system to search for specific datasets of interest. In addition, a directory with information about data centers must be available. A national data catalog system, the Global Change Master Directory, sponsored by the IWG and based on a similar NASA system, is under development. Nearly 2,000 datasets are already described in the system. In addition, a user who is searching the national master catalog can be automatically transferred to search more detailed catalogs and inventories at individual data centers. User Participation Issues Data Submission The scientific research community does not participate uni- formly in data management . This creates a many-sided problem of

THE PRESENT SYSTEM 25 accessibility. Valuable datasets remain in the custody of individual research groups or even individual investigators, with the consequent acceptance by them of the data-handling task. Some of these datasets are widely known and accessible; others are not There are many reasons for the failure of individual scientists to provide data to the data centers. Among these are a desire for exclusive access to data, the reluctance to divert time and effort away from research in order to clean up a messy research dataset, and unawareness of the existence of appropriate repositories or of the importance of depositing data in such centers. How long recently collected data should remain under the exclusive control of the scientists) who collected the data is an issue being addressed by the federal agencies involved in the global change program and is discussed further in the next chapter of this report . In general, the global change research community is not sufficiently aware of the importance of ensuring the availability of environmental data. Until this changes, potentially valuable datasets will continue to be lost as primary users retire, relocate, or move on to other projects. Quality Assurance Many data centers provide little or no quality assurance of their data holdings. The assumption is that the individual researcher has applied adequate attention to assuring the accuracy of the data. This is often not the case. Scientific use of data is the best road to its quality assurance. Many data centers have little or no in-house scientific research and therefore are not in a good position to provide effective quality assur- ance. Sometimes the data management system unwittingly corrupts datasets. The working scientist sometimes treats data supplied by a data center with ambivalence. While healthy skepticism by the working scientist is a desirable part of the data exchange process, the present system suffers unacceptably regarding dataset credibility. Data management policies need to include more specific arrangements for data validation so that data preserved for the future are of adequate

26 Tax. PRESENT SYSTEM quality. Defining the meaning of adequate quality and establishing stan- dards are, fundamentally, responsibilities of the scientific community. However, applying these standards is a joint responsibility of scientists and data system managers. Documentation Data are frequently separated from information about the data. This is an unfortunate by-product of the explosion of digital data and techniques for handling such data over recent decades. Documentation can permit the user to judge the reliability or value of a data product for a particular application. The same is true for original data in terms of calibration, quality control flags, and station histories. Such docu- mentation should therefore be an inseparable part of the data. Examples of useful documentation include information about the algorithms used for a derived product, quality control procedures, comparisons with independent measurements, and reviews of the dataset by outside experts. Dataset Evaluation For global change research purposes, few criteria have been developed to guide the evaluation, retention, and purging of datasets. These activities involve assigning priorities and establishing thresholds of importance for archiving and retention. Evaluations of individual datasets are important for determining which existing ones may be most useful for global change research. The data centers should play important roles in this function. By monitoring distribution and obtaining feedback from users of datasets, data center personnel should be able to compile a consensus of the user community on the quality of a particular dataset, At the very least, the feedback obtained by the data centers can serve as input to groups of experts assigned to make recommendations about dataset retention or purging. The feedback obtained by the data center also can benefit the scientists who originally provided the data if the data compilation is an ongoing effort. Although this activity

THE PRESENT SYSTEM 27 represents a key step in interfacing present and future data for studies of global change, it is lacking at many data centers. Data centers can play such a role most effectively if scientific expertise exists at the centers. With few exceptions the data centers do not now have adequate scientific expertise to perform this function.

Next: A Data Management Strategy for Global Change »
Solving the Global Change Puzzle: A U.S. Strategy for Managing Data and Information. Get This Book
×
 Solving the Global Change Puzzle: A U.S. Strategy for Managing Data and Information.
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

READ FREE ONLINE

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!