National Academies Press: OpenBook

Solving the Global Change Puzzle: A U.S. Strategy for Managing Data and Information. (1991)

Chapter: A Data Management Strategy for Global Change

« Previous: The Present System
Suggested Citation:"A Data Management Strategy for Global Change." National Research Council. 1991. Solving the Global Change Puzzle: A U.S. Strategy for Managing Data and Information.. Washington, DC: The National Academies Press. doi: 10.17226/18584.
×
Page 28
Suggested Citation:"A Data Management Strategy for Global Change." National Research Council. 1991. Solving the Global Change Puzzle: A U.S. Strategy for Managing Data and Information.. Washington, DC: The National Academies Press. doi: 10.17226/18584.
×
Page 29
Suggested Citation:"A Data Management Strategy for Global Change." National Research Council. 1991. Solving the Global Change Puzzle: A U.S. Strategy for Managing Data and Information.. Washington, DC: The National Academies Press. doi: 10.17226/18584.
×
Page 30
Suggested Citation:"A Data Management Strategy for Global Change." National Research Council. 1991. Solving the Global Change Puzzle: A U.S. Strategy for Managing Data and Information.. Washington, DC: The National Academies Press. doi: 10.17226/18584.
×
Page 31
Suggested Citation:"A Data Management Strategy for Global Change." National Research Council. 1991. Solving the Global Change Puzzle: A U.S. Strategy for Managing Data and Information.. Washington, DC: The National Academies Press. doi: 10.17226/18584.
×
Page 32
Suggested Citation:"A Data Management Strategy for Global Change." National Research Council. 1991. Solving the Global Change Puzzle: A U.S. Strategy for Managing Data and Information.. Washington, DC: The National Academies Press. doi: 10.17226/18584.
×
Page 33
Suggested Citation:"A Data Management Strategy for Global Change." National Research Council. 1991. Solving the Global Change Puzzle: A U.S. Strategy for Managing Data and Information.. Washington, DC: The National Academies Press. doi: 10.17226/18584.
×
Page 34
Suggested Citation:"A Data Management Strategy for Global Change." National Research Council. 1991. Solving the Global Change Puzzle: A U.S. Strategy for Managing Data and Information.. Washington, DC: The National Academies Press. doi: 10.17226/18584.
×
Page 35
Suggested Citation:"A Data Management Strategy for Global Change." National Research Council. 1991. Solving the Global Change Puzzle: A U.S. Strategy for Managing Data and Information.. Washington, DC: The National Academies Press. doi: 10.17226/18584.
×
Page 36
Suggested Citation:"A Data Management Strategy for Global Change." National Research Council. 1991. Solving the Global Change Puzzle: A U.S. Strategy for Managing Data and Information.. Washington, DC: The National Academies Press. doi: 10.17226/18584.
×
Page 37

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

4. A Data Management Strategy for Global Change The U.S. Global Change Research Program (USGCRP) requires a strategy to meet its data and information needs. This chapter discusses a strategy and makes recommendations to achieve a new approach. Scientific Involvement This report, as others before it (e.g., NRC, Committee on Data Management and Computation, 1982, the "CODMAC report"), describes the lack of scientific involvement as being a problem in data management. Scientists may be involved in fundamental ways. One scientist may "make" data (by collecting, controlling quality, and processing), while another may "use" data from the first scientist and/or from an operational source. In the process the investigator may find a problem with the data. In a free, competitive marketplace, the choice of one supplier's goods over another's is the mechanism by which users reward good quality, efficiency, and low cost. But a scientific data activity cannot be judged by such economic yardsticks. In view of the limits of science budgets, any costs significantly higher than the cost of reproduction are self-defeating. A mechanism by which users can be assured of quality, efficiency, and low cost must be found. Involving a sufficiently large sample of knowledgeable users in the funding priorities for data activities may be the new mechanism. The process might require that data activities be accompanied by a 3-year proposal which is reviewed by data management peers and by scientists who supply and request data from that activity. A standing oversight committee, akin to those that review academic departments every few years, could advise on longer-term plans and data acquisition 28

A DATA MANAGEMENT STRATEGY FOR GLOBAL CHANGE 29 or deletion decisions. Members of the scientific advisory groups would work as advocates for the system with their scientific colleagues. Some scientists are knowledgeable about certain data types and their applications. Data activities should collaborate with these individuals to exercise the data, to reorganize and document them, and to control their quality. Any successful business understands its customers' needs, knows its product line, and chooses its location carefully. By analogy, a data and information management system should follow some of the same principles. For example, an oceanographic data center would benefit from being physically adjacent to an oceanographic research activity. In spite of the increasing electronic transfer of information, proximity is important for understanding needs and sharing experiences. Data center personnel must know how scientists work with the data and why they choose certain datasets over others. Visiting scientist programs at data centers, and visiting data center personnel programs at research institutions, should be a part of the activity. Scientific participation and authoritative oversight may be the key to creating and maintaining an effective global change information system. However, achieving effective scientific involvement will not be easy. Unfortunately, much of the scientific community is not aware of the need to be involved in developing any unified global change data and information management approach. This attitude is part of a pattern: data management has long been considered a secondary aspect of research. Since data and information together will be such a critical element in global change research, a change of attitude is essential. Fortunately, there are many signs that this change is taking place, both by research scientists and within sponsoring agencies. Active researchers must be participants in the process. They should define needs and create the framework for a data and information system to meet those needs. They should help establish procedures and data centers. It will not be enough for them simply to assent to what a group of data technologists are creating. They must be involved. There must be incentives for researchers to be involved. For example, the system must respond to scientists' needs. It must be perceived as the optimal way to do research with data. A simple feedback process will have a beneficial effect . For example, data centers

SO A DATA MANAGEMENT STRATEGY FOR GLOBAL CHANGE should involve researchers working with the datasets in the development of the data system. When advice is sought and listened to, there is an incentive for involvement . Not only should incentives be created, but existing disincentives should be removed. User fees above a minimal cost for reproduction for scientific use of data constitute an existing disincentive. The nature of global problems requires access to large datasets. If their cost makes this prohibitive, then exploratory research will be obstructed. The global change data system should have the lowest possible user fee structure. Data should be free wherever possible. As one example, Landsat data are currently so expensive that the data are generally beyond the reach of the research community. Furthermore, only data from selected areas are acquired and archived regularly. Data from most areas in the world must be requested by a user or a Landsat scene will not be acquired. This is a major problem because we cannot always identify what data will be needed in the future for studies of change detection. There may be innovative ways to recognize data contributions. The Committee on Geophysical Data has discussed the possibility of creating a CD-ROM "journal," or of refereeing and referencing dataset contributions, as is done with scientific contributions. However, there is a strong community reluctance to recognize the preparation of a scientific dataset as being equivalent to the preparation of a scientific paper. There are encouraging signs that this attitude is changing: the Journal of Geophysical Research (Space Physics) has begun soliciting "brief data reports" of datasets that have been submitted to national data centers. Referees are asked to use the protocol to access the data and to comment on the data's relevance and quality. Creating a New System A new system for data and information management begins with the establishment of objectives. What datasets are needed to describe global change? What are the highest priorities? Setting objectives and priorities goes beyond data management. The fundamental scientific design of the USGCRP should include setting objectives for the data and information which will be needed. These

A DATA MANAGEMENT STRATEGY FOR GLOBAL CHANGE 31 objectives must be set with data management input into the scientific process. Any new system should build on existing elements. Thus, although new elements will be needed, we must make sure that the existing components work appropriately. Then we can move on to create the more complex system. This represents a challenge: can we make the existing elements work? As has already been noted, making them work is not a technical challenge but one of will and resources. Design for Evolution New systems must be designed for evolution. History shows that data storage systems, concepts, formats, and computing capability have changed on time scales of 2 to 5 years. It is unlikely that a data system defined today will remain constant for the next decade. The design problem is to develop systems which are flexible enough that data and information about the data can be readily saved for the next 100 years or longer, despite changes. Methods have already been implemented which permit relatively easy migration of data from one medium to another. Systems must be defined to accommodate diversity (e.g., a format designed 10 years before). Metadata must be structured so that they can be readily used by future as well as present systems. Agencies must accept the responsibility of providing for the stewardship of the data they generate. Data management should be considered at the outset of every project, explicitly defined, and adequately budgeted for the life of the project. Arrangements should be made for the long-term archiving of the data. Demonstrate Success A new system should demonstrate success through practical prototypes. Confidence will be built by proving accuracy, by showing competence, and by producing some early products of value. This can be done by beginning feasible pilot projects of high importance. The Master Directory project sponsored by the Interagency Working Group on Data Management for Global Change (IWG) is an excellent

32 A DATA MANAGEMENT STRATEGY FOR GLOBAL CHANGE example. (This project is creating a high-level directory of datasets held by many agencies and institutions related to global change. Its success depends on information standards between centers and on operational computer network links.) Data centers should be created for those disciplines important to global change research that are not included in the network of national data centers. This can be achieved through the establishment of new centers or by expanding the purview and resources of existing centers. A network of discipline-oriented data centers is a necessary component of the system to support global change research. However, because of the extraordinary information requirements that the global change program will make on data management elements, a new mechanism to handle the data will be necessary to augment the existing system. Establish Information Analysis Services Information analysis services for global change research should be established, either at existing data centers or as new information analysis centers (lACs). LACs should be issue oriented, complementing the existing data centers, which are principally discipline oriented. They should accept a much broader concept of information management than traditional data centers. They should support a broad user community, from individual researcher to agency policymaker. The information analysis services should extend, not replace, the existing system of data centers. In fact, it has been argued that only by maintaining a strong disciplinary component in most data centers will the requisite expertise be available for quality control. The provision of information analysis services to complement the discipline- based services should be a key component to the strategy of information management for the USGCRP. Apply Appropriate Technology New data and information systems should apply appropriate technology. This demands a choice between over-engineering and

A DATA MANAGEMENT STRATEGY FOR GLOBAL CHANGE 33 under-utilization of technical advances. We face a future with an ever-increasing number of scientists and engineers sensing the environment and using the data, an ever-increasing spatial and temporal coverage of the sensors, and an ever-increasing number of channels on each platform. Coping with the data will require flexible computer-based solutions to store, find, and retrieve usable portions of the data. Technologically appropriate solutions will likely not be perma- nent. New datasets that do not fit the current computer solution will appear, prompting a need for new algorithms. It is impossible to give here specific guidance for solving these problems. However, we must ensure that they are effectively dealt with in a future data management plan. Centralize the Data Directories Locating data, both nationally and internationally, will be helped by the establishment of a centralized data directory. This directory will have information about information. It will be created by a joint effort, initially by the national data centers, and eventually with the help of international data centers. The data directory should be accessible electronically and user friendly. It should provide as much information about datasets as possible, including location; access policies and procedures; and information about the data's completeness, accuracy, general usefulness, documentation, and limitations. There should be no charge to use the directory except the telephone or electronic mail cost of connecting. Upgrade Standards for Quality Assurance and Documentation The quality assurance and documentation standards of datasets important to global change research must be upgraded. Quality assurance and documentation should be at the heart of a data management system supporting the global change program. Only after extensive testing by independent reviewers should important global research datasets be considered accurate. Depending on the data in-

34 A DATA MANAGEMENT STRATEGY FOR GLOBAL CHANGE volved, the effort can be extensive and often the single most expensive step in processing a dataset for distribution. This process, analogous to independent peer review of a journal article, maximizes the integrity of the information. It is necessary for the USGCRP. Future research and policy decisions will rest in part on these important datasets. Data documentation must pass the well known "20-year test." That is, will someone 20 years from now, not familiar with the data or how they were obtained, be able to find datasets of interest and then fully understand and use the data solely with the aid of the documentation archived with the dataset? This is a tough test, yet one that must be passed for many of the data collections if long-term global environmental programs are to be successful. Documentation must do more than describe the values represented in each field and the format information that is needed to read the data tape. It must fully document the dataset from all possible points of view. At a minimum, dataset documentation should contain: • Identification of contributors • Background information • Scope and purpose of the program • Data collection procedures • Station history • Description of instrumentation • Definition of calibrations applied • Full variable definitions • Definition of calculated variables • Description of adjustments • Quality assurance at the center • Modifications made at the center • Limitations of the data • Systematic and random errors • Data transport verification statistics • Full or sample listings • Input/output routines on the transport medium • References

A DATA MANAGEMENT STRATEGY FOR GLOBAL CHANGE 35 Complete data documentation is a crucial portion of data processing and, along with quality assurance, will be the heart of successful data management activities for global change. Encourage Data Management Research Data management has developed in large part through the efforts of its practitioners. To meet the needs of global change research, a major research effort directed to environmental data management that involves computer scientists, data managers, librarians, archivists, and user scientists is needed over the next decade. Issues to address include the following: • The design of an indexing system tied to the nature of the database. • Provisions for browsing through complex datasets from a distant work station. • Increased capability for visualizing databases. • Principles for using compressed data in databases. • Guidelines for the length of time data should be kept in "live" databases. • Software design for the management of large databases. • Application of expert systems to the management of large databases. Important to the discussion here is NASA's plan for an elaborate system to handle data generated by the Earth Observing System (EOS). The EOS Data and Information System (EOSDIS) is being planned in parallel with the EOS flight mission to ensure that scientists will be involved not only in the analysis of the EOS measure- ments, but also with the storage and archiving of EOS data. EOSDIS is the largest single component of the federal data and information management system being created for the USGCRP and, as such, is important to this discussion. EOSDIS represents a major step in data and information management design in that it will handle extremely large databases. It has received advice from many groups and individuals, such as the NRC Panel to Review NASA's Earth Observing

36 A DATA MANAGEMENT STRATEGY FOR GLOBAL CHANGE System in the Context of the USGCRP (NRC, Committee on Global Change, 1990b) and NASA's Science Advisory Panel for EOS Data and Information (NASA, 1989). An excellent report by Dutton (1989) proposes EOSDIS design concepts. Although an analysis of such a complex system as EOSDIS is not appropriate within the scope of this report, the groundwork laid by EOSDIS will be significant for the development of a broader national data and information management system for global change data. A Federal Data Policy The USGCRP will depend on scientists sharing their data with each other. The timely submission of data to national centers requires a policy to ensure it. The policy must recognize the needs of principal investigators to protect their intellectual investment and to encourage their continued efforts to collect useful global change data. The Ocean Sciences Division of die National Science Founda- tion (NSF) has long been a leader in maintaining a data policy. Resear- chers supported by the division must agree at the outset to share their data within a reasonable time. When the Committee on Geophysical Data (CGD) began its review of global change data and information management, it regarded the text of the then-current NSF Ocean Data Policy as a model of what should be done for all U.S. researchers involved in global change research. The IWG has developed a Policy Statement on Data Management for Global Change. The participating agencies have endorsed the policy as part of the work of the Committee on Earth and Environmental Sciences. A National Information System A system to meet the data and information needs of the USGCRP should evolve step by step, based on scientific requirements. Until scientific needs are clear, it is unwise to develop elaborate technical plans. The CGD has included a possible implementation

A DATA MANAGEMENT STRATEGY FOR GLOBAL CHANGE 37 approach in Chapter 5, along the lines of the strategy suggested in Chapter 4.

Next: A Vision of a National Information System for Global Change »
Solving the Global Change Puzzle: A U.S. Strategy for Managing Data and Information. Get This Book
×
 Solving the Global Change Puzzle: A U.S. Strategy for Managing Data and Information.
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

READ FREE ONLINE

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!