Page 61

5

DATA MANAGEMENT

With the virtual explosion in volume and diversity of geomagnetic data sets in recent times, the need for a coherent interagency data management policy is reaching critical levels. The combination of near-real-time availability of magnetic observatory data provided by global networks and low-cost mass storage media offers exciting possibilities for the research community. However, these advances also present problems for the management of data sets, including problems with data quality and the proliferation of multiple versions of data sets. It is possible that irreplaceable data bases could be lost because of the lack of a general consensus within the geomagnetic community about how they should be managed. The scientific community must clearly recognize the need for stewardship of data that are fundamental for research.

The basic issues regarding availability of geophysical data have been addressed in several reports of the National Research Council, including those of the Committee on Data Management and Computing (CODMAC) (1982, 1986, 1988), Geophysical Data: Policy Issues (1988), and Solving the Global Change Puzzle (1991). The present report emphasizes the importance of effective availability of geomagnetic data and associated data products through national data centers, World Data Centers, and distributed data centers.

Data, Metadata, Data Quality, and Formats

High-quality, well-documented data sets are necessary for the success of geomagnetic research. Procedures must be established to assure the quality of both the observations themselves and the metadata (information describing the data) and to assure that these data are machine-readable.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 61
Page 61 5 DATA MANAGEMENT With the virtual explosion in volume and diversity of geomagnetic data sets in recent times, the need for a coherent interagency data management policy is reaching critical levels. The combination of near-real-time availability of magnetic observatory data provided by global networks and low-cost mass storage media offers exciting possibilities for the research community. However, these advances also present problems for the management of data sets, including problems with data quality and the proliferation of multiple versions of data sets. It is possible that irreplaceable data bases could be lost because of the lack of a general consensus within the geomagnetic community about how they should be managed. The scientific community must clearly recognize the need for stewardship of data that are fundamental for research. The basic issues regarding availability of geophysical data have been addressed in several reports of the National Research Council, including those of the Committee on Data Management and Computing (CODMAC) (1982, 1986, 1988), Geophysical Data: Policy Issues (1988), and Solving the Global Change Puzzle (1991). The present report emphasizes the importance of effective availability of geomagnetic data and associated data products through national data centers, World Data Centers, and distributed data centers. Data, Metadata, Data Quality, and Formats High-quality, well-documented data sets are necessary for the success of geomagnetic research. Procedures must be established to assure the quality of both the observations themselves and the metadata (information describing the data) and to assure that these data are machine-readable.

OCR for page 61
Page 62 The use of data for research provides an effective test of its quality. The experiences of the researchers in the use of data are a valuable component of the quality-control process. Scientific peer review of data is another valuable component of the quality-control process. In addition to information about data quality, the documentation must describe the data set contents, instrumentation, processing, and data formats. Because geomagnetic data are used by several scientific disciplines that employ different nomenclature, the documentation scheme must contain a data dictionary in which all scientific and technical terms applicable to geomagnetic data are clearly defined. Given the diversity of sources of data required for geomagnetism, it seems unlikely that a single data format could be used. However, it may be possible to use one of several established data formats. Software systems are already available to do this. After the data are collected, prepared, and stored in the system, their availability and relevant properties must be made known to interested scientists. Some mechanisms have been established to do so. The interagency Global Change Master Directory provides the scientific community with high-level information about data availability and access. NASA is the lead agency in the development of the Master Directory, which is supported by all of the U.S. agencies involved with geomagnetic data and by many foreign governments. Metadata needed for the Master Directory are being put into the Directory Interchange Format. Use of this standard for geomagnetism will simplify scientific access to data. Data Centers The are several relevant issues surrounding the use of national centers for managing geomagnetic data bases. They include the following: the complementary roles of centralized versus distributed data centers;

OCR for page 61
Page 63 the problem of converting the volumes of analog data currently being archived at the World Data Centers into digital form; the urgent matter of “data rescue,” that is, the identification, acquisition, and archiving of geomagnetic data sets in danger of being lost or destroyed; and the stewardship of geomagnetic data. Both centralized and distributed data centers have advantages for managing geomagnetic data. Centralized data centers, funded by federal agencies, offer the stability required to preserve the data for posterity. Distributed data systems provide close contact with experts who are knowledgeable about the data. Combined centralized and distributed systems can provide the advantages of both. The World Data Centers currently house massive quantities of analog data, such as magnetograms on microfilm and tabular data on paper records. These data need to be converted into digital electronic form. In addition, many data bases are in danger of being lost or destroyed; steps must be taken to ensure that this does not occur. Not all data in each data center are duplicated at other centers. Thus, if records at one center are lost, there might be no means to recover them. Furthermore, the sole copies of many data sets that are the results of completed projects—data never sent to any data center—may be stored, with or without cataloging, on shelves or tape cabinets at research institutions. Scientific personnel are needed to identify and retrieve such data for inclusion in the general geomagnetic data base. A window of opportunity now exists for rescuing these various data sets through cooperative efforts with colleagues in other countries, particularly in the Commonwealth of Independent States.

OCR for page 61
Page 64 Data Availability Data are being made available to users in a variety of ways. Increasingly, users require that data from many locations be made available in as near to real time as possible. Users want to retrieve these data through on-line systems. The availability of geomagnetic data in real time is an important objective, but it also presents some significant problems. For example, data must be retrieved from the observatories over satellite networks, and can contain many spikes, gaps, time shifts, and other quality problems that have to be corrected before the data can be used for research. The INTERMAGNET program is currently implementing such a near-real-time capability from a worldwide distribution of magnetic observatories, and is dealing with many of the operational difficulties that accompany such a program. On-line data access is currently available for some applications and is very appealing to users who can simply download data directly into their computers. But large data bases that are continually being updated require large storage spaces and can sometimes be labor-intensive. There is a need to provide existing digital data in a form that is long-lasting, inexpensive, and compact, such as CD-ROMs. For longer-term needs, selection of the storage medium must allow for random access and operation on multiple computer platforms. NOAA, USGS, and other agencies are currently distributing large-volume data bases on CD-ROMs, but many data are not yet available on this medium. In view of the limited lifetime, cost, and vulnerability of magnetic tape, CD-ROMs represent a very cost-effective way to distribute data. Derived Products Developments in technology have revolutionized the collection and analysis of geomagnetic data. It is now feasible to monitor a global distribution of observatories in real-time and generate better products at increased temporal resolution. Both the magnetospheric ring current index, Dst, and the auroral ionospheric electrojet index, AE, can now be

OCR for page 61
Page 65 determined more effectively and disseminated more rapidly. A polar cap index is now possible, and a new family of power spectra indices could be developed. The understanding of the physical processes coupling the solar wind and the geomagnetic field has progressed to the point that solar wind data are critical for deriving magnetospheric models and for making accurate forecasts of impending activity. Mathematical models and charts are among the more important products derived from measurements of the Earth's magnetic field. These tools provide information needed for the protection of life and property, for commercial activities, and for scientific research. Geomagnetic models, which are based on millions of field measurements, are a necessity for safe navigation. They are built into the navigation systems of thousands of civilian and military aircraft, ships, and boats. Models and charts of the main field are routinely used for processing magnetic survey measurements taken in the search for minerals and petroleum. They are used to calculate the paths of cosmic rays, to orient spacecraft, and to find the positions of field-line conjugate points. These products require a continuing abundance of high-quality geomagnetic field measurements. Recommendations High-quality, well documented data sets are essential for the success of geomagnetic research. The following recommendations address the stewardship and distribution of geomagnetic data. The order of the recommendations does not imply a priority ranking. Procedures should be implemented to assure the quality of both the observations themselves and the metadata describing the nature of the data: contents, instrumentation, processing, and formats. Data and metadata should be recorded in machine-readable form. The best way to learn about the quality of data is for investigators to use it for scientific research. Research that “exercises” data sets should be encouraged, and the experiences

OCR for page 61
Page 66 of the researchers should be captured as part of the quality-control process. Scientific peer review of data should be a key component of the quality-control process. A source book should be developed to encourage more interdisciplinary use of geomagnetic data. This book should describe past, present, and future programs and sources of data, along with a glossary that clearly defines all applicable scientific and technical terms, with clear descriptions as to how to access particular data sets. As part of the national geomagnetic initiative, special attention should be given to stewardship of data. The activities of centralized and distributed data centers should be integrated to achieve the fullest utilization and to maintain the highest quality of existing and future geomagnetic data sets. The massive quantities of analog data, such as magnetograms on microfilm and tabular data on paper records, that are currently housed at various facilities around the world should be converted into digital form. Data bases in danger of being lost or destroyed should be preserved and properly duplicated. Data sets from completed projects now stored at institutions that no longer have an interest in using them should be identified and turned over to the appropriate data centers. An appropriate entity or organization should be identified to take responsibility for ensuring that these recommendations are implemented. Key geomagnetic data should be made available to users in as close to real time as possible. Many of the data could be available virtually instantaneously through on-line systems. Larger, more comprehensive data sets should be distributed on CD-ROMs with as short a delay time after recording as possible. In developing “real-time” systems, advantage should be taken of INTERMAGNET's extensive experience in dealing with the many operational difficulties that accompany retrieving data from observatories over satellite networks. In order to minimize the redundancy of effort among various investigators who use these data and to minimize the required level of computer literacy, there should be close integration of the efforts of USGS, NOAA,

OCR for page 61
Page 67 and NASA in supplying the relevant data bases from observatories, surveys, and satellites to assure the compatible formatting of data and their distribution on similar media. Software to accomplish such an exchange should be readily available and easy to use. New descriptors of geomagnetic activity that measure the amplitude and rate of change of magnetic fluctuations over a range of time scales should be developed. The suite of existing indices (for example, AE indices, Dst, and the polar cap index) should be computed at higher resolution and corrected for quiet diurnal variations from the global network of geomagnetic observatories envisaged in this initiative. There should be close coordination among the principal partners concerned with developing, cross-checking, and using the International Geomagnetic Reference Field, particularly universities, NASA, the U.S. Naval Oceanographic Office, the USGS, the British Geological Survey, IZMIRAN (the Russian Institute of Terrestrial Magnetism), and industry.