B2. DATA MANAGEMENT SYSTEMS
Appendix B2 was largely developed by the following workshop group: Joe Allen and Herbert Meyers (Group Leaders), R. Barnes, J. Cain, W. Campbell, S. Cande, V. Chandler, D. Chapin, R. Clauer, A. W. Green, D. Herzog, W. Hinze, J. Joselyn, J. Kappenman, J. MacQueen, L. Newitt, N. Peddie, J. Phillips, C. Russell, M. Teague, R. Walker.
New Directions and New Needs
Management of data in the future will be a greater challenge for what have been previously termed “data centers.” New developments in technology and new demands by the scientific community, as expressed by the working group reports in this document, for improved services will tax the resources and ingenuity of these organizations. With the advent of distributed computing systems, data centers will not only be repositories of archived data physically residing in such centers, but even more importantly they will become the information and management vehicle that will enable scientists in all disciplines to locate the data they need more easily and efficiently. This is especially important for the needed interdisciplinary studies that will require a relational data base of such files. Such a data base will require a significantly increased level of coordination and cooperation among the centers.
The new demands for access will require modern and efficient approaches to accommodate the increasing volume and complexity of data acquired. With this new confederacy in information management, the centers cannot easily mandate standard formats, but they can help guide the many remote cells in use of data formats and access software independent of the platforms being used.
Magnetism is a pervasive and fundamental area of science that cuts across the traditional boundaries of disciplines that often define and divide groups. Some scientists are concerned with the intense magnetic fields of sunspots and other solar features. Fortunately, these can be observed remotely from Earth, and there are fascinating data that suggest that the maximum magnetic fields of solar active regions have been systematically and monotonically increasing in intensity over at least the past three solar cycles. At the same time, the well-known decline in intensity of the internal dipole moment is reducing the volume and size of the magnetosphere that shields against (or organizes) the influx of energetic charged particles from the galaxy and from the Sun. The frequency and intensity of magnetic storms are also increasing with time. Are these changes interrelated? If so, what is the prospect for the future?
Some workers will be concerned with the heliospheric magnetic field that spreads in waving folds through interplanetary space and sweeps around with the rotating Sun. Others will be mainly interested in changes of magnetic fields sensed by geostationary satellites just inside the magnetopause or at lower altitudes by polar-orbiting spacecraft. Many will be concerned with magnetic field measurements in space at lower altitudes by more satellites so that spatial patterns of geomagnetic field variation may be identified as arising from currents in space, from near-surface sources, or deep in the Earth's interior. Some will be interested in the long-term global field and its slow, almost geological, time scales of change; others in the rapid, local variations or globally organized patterns of external currents that cause rapid change at one or a few sites or that produce significant changes over large regions of Earth.
Finally, there will be the spacecraft operators and electric power system operators or makers of sensitive microelectronics for whom large, rapid changes in the geomagnetic field are a potential source of disaster because of their effects on sensitive technology. All of these groups will have a common need for access to high-quality geomagnetic data collected with accurate instruments at key locations, processed to comparable
Page 192standards over time and space, and reduced to products whose definitions and assumptions are well known and understood. Data Bases
Of initial concern is the need to identify the sources of geomagnetic data. What data are collected, by which groups, and at what sites? Are data collected in the past adequate in type, quality, and coverage (spatial and temporal); adequately documented to support the conclusions already drawn from them; and adequately documented for them to be combined with newer data to support present and future research? Those same questions can be asked of geomagnetic data being collected today, and for data that will be taken in the future either by temporary, limited campaigns or by worldwide monitoring arrays. Also, as technology changes and costs of maintaining old methods of data collection rise beyond the ability of sponsoring groups to continue supporting them, how can priorities be set for what should be continued and what terminated?Data Archives
The transition in philosophy of data center function—from management of all data at data centers to distribution of many of the data—brings forward many issues. Well-organized and funded projects that make data available need only keep the centers informed of their directories. Moreover, as such projects mature, and especially after the data collection period is completed, the centers must take a more active role. In the implementation of a project, documentation of data is frequently neglected. Moreover, as a project matures and participants leave, it often happens that this neglect of documentation is never corrected. How can data centers maintain an active role throughout such a project, keep abreast of the activities that generate data, coordinate documentation, and arrange transfer to an archive? How can centers be made aware that some
Page 193data may be at risk of loss to future research and be encouraged to take a hand in their preservation?
Not all data collection efforts are well integrated and organized, especially of internationally produced data. In such instances, the centers need to maintain their traditional role of negotiating standardization of formats and assisting, as possible, to achieve adequate archiving capability of the data produced. Such efforts relate both to primary observations (such as geomagnetic data collected from fixed stations) and to indices derived from such observations. With limited resources, a balance of effort is needed between centralized and distributed components.Data Access
Data access concepts are of special concern because many data sets exist that cannot be reached by all who need to use them. Limited accessibility for a particular data base may not be a result of a deliberately implemented policy decision, but may result from the fact that plans for collecting the data did not include data processing and availability as important goals. Limited accessibility may be a consequence of the physical characteristics of the media containing the data (for example, analog magnetograms on paper or strip charts held as rolls stored in a closet). Film media (35mm microfilm or microfiche) for many years were a convenient way to capture vast amounts of geomagnetic data and preserve them compactly at an archive so that they could be retrieved and inexpensively copied upon request. However, the requirement for digital data for computer analysis that characterizes research today relegates film records to a secondary role.
Once geomagnetic data are in digital form, how are they to be stored and transferred from one site to another? Are tapes with inherent serial access problems still a viable answer? Has the 8-mm helical scan videotape cassette with its vast capacity become the new medium of choice? What are the relative merits and problems with optical media? What will technology provide next (for example, digital audio tape, optical tape)? How long-lived are the various choices, how universal are
Page 194the recording and playback devices, and how robust but inexpensive are the media? These are key questions. Also, how many users today prefer on-line data bases for real-time or retrospective access, and what type of analysis does on-line access support? How can images and data plots be made accessible on-line together with digital data? The ubiquitous personal computer has revolutionized the collection, processing, exchange, and analysis over the past decade. Now, globally interconnected workstations are clearly the wave of the next decade. As data are made more accessible through the net or on various media, access, display, and manipulation software are essential for efficient utilization. This places a premium on having platform-independent data systems. Data Products
Much of the driving force that justifies national and international support for programs to collect geomagnetic data comes from a demand for derived data products such as maps, models, or magnetic activity indices. Some models and charts are primarily used for navigation while others describe the main field. External fields in the magnetosphere and ionosphere are interesting scientifically but must be removed to study or to chart the main field. Models of the crustal field and maps of magnetic anomalies have geological importance but can be contaminated by natural fluctuations of external fields. When the complication of secular change is added, it becomes challenging to create accurate models and charts with data that are less than ideal. In addition, changing user needs require higher time resolution and data from locations not previously monitored. Developments in technology have revolutionized the ability to collect and analyze data. Understanding of the physical processes that couple the solar wind and the geomagnetic field has progressed to the point that solar wind data are critical for driving magnetospheric models and making accurate forecasts of impending activity. These advances have created opportunities for significant improvements in data products.
Components of Data ManagementData Bases
Data are the most fundamental and essential component of geophysical endeavors. However, transfer of data to a data base is frequently a low-priority concern of scientists once the data have served the intended purposes for a particular research project or objective. Often a research project will collect a large amount of data, use only a small portion of it for the intended purpose, and then disregard or dispose of the entire data set. Until recently, few research programs have taken the necessary steps to document their data sets by providing accounts of such factors as the type of data being collected, technical specifications of the instruments that were used, the temporal and spatial resolutions of the observations, or even how the data are being stored, and few research programs have seen to it that the data are archived at some publicly accessible archival center. Data base management is an often neglected aspect of many (if not most) research programs, despite the fact that data constitute the evidentiary pillar upon which science itself rests.
Furthermore, it seems clear that most scientists are aware of only a fraction of the data bases that have been, are being, or will be collected, even within the limited field of geomagnetism. This represents a loss in at least two ways: first, because data may already exist (or be on the way) that could serve the needs of other research objectives; and second, because the mere knowledge of an existing (or developing) data base can serve the serendipitous function of generating new research ideas and promote interdisciplinary cooperation between research groups. Because of the vast proliferation of data sets from land, air, sea, and satellite programs, it is not (currently) possible for anyone to keep track of all the data bases being accumulated in the diverse disciplines within geomagnetism.
Thus, two operational requirements for geomagnetic data bases are as follows:
1. to identify the sources of geomagnetic data from land, air, sea, and satellite that have been, are being, or will be collected from ongoing or temporary observational campaigns; and
2. to address the issues relating to the formatting, storing, documenting, and archiving of data bases.
The following few acronyms (spelled out in Appendix 3 of this volume) represent a small sample of the variety of projects that involve geomagnetic data bases.
Please refer to the page image for an unflawed representation of this content.
Although many of these acronyms are well known, it is unlikely that anyone is knowledgeable about all of these programs. It is even less likely they know the type of magnetic data being recorded, the spatial resolution, sampling rates, instrument sensitivity, status of the data, principal investigator, or how to obtain the data. In addition to temporary campaigns, there are many ongoing geomagnetic data bases that may be more familiar to the research community, such as the USGS magnetic observatory network, the NGDC worldwide digital data collection, and the magnetic models produced by the USGS, U.S. Naval Oceanographic Office (NAVOCEANO), NASA, and IZMIRAN (Russian Institute Terrestrial Magnetism).
Clearly, there is a need to identify, catalog, specify (document), and make known the many diverse projects and programs that involve geomagnetic data bases. Perhaps an organization such as the American Geophysical Union (AGU) could provide the forum wherein the (largely) informational requirement could be fulfilled. Researchers could provide the relevant details to AGU, which could, in turn, publish this information periodically in their transactions (EOS). It would be the responsibility of every research program to notify AGU of its activities related to
Page 197geomagnetic (or other) data bases for the benefit and elucidation of everyone in the community.
It is not enough, however, that the scientific community be informed of existing and prospective data bases in geomagnetism. It is also imperative that these data be transferred to standardized formats, on stable media such as CD-ROMs or Magneto-Optical (MO) drives, and be made readily accessible to potential users.
There are almost as many formats for data bases as there are projects acquiring them. Often the choice of formats is arbitrary and determined by the existing data-processing software at the organization is gathering the data. This need not be the case, however, and uniform data formats could be established that would greatly facilitate the ready compatibility of data bases to everyone's data-processing capabilities. Formats could be developed for 1-second, 5-second, 10-second, 1-minute, 1-hour, and so on, temporal resolutions; the magnetic field values themselves could be stored in standardized formats for vector or scalar data, in variation (voltage) or field (magnetic) units, with a variety of degrees of resolution (1 nanotesla, 0.1 nanotesla, 0.05 nanotesla, and so on). Even much of the documentation of data bases could be included in the header portion of the data records, which would ensure that those unfamiliar with the data set would have the best opportunity to be well informed of the nature of the data and any issues relating to their quality. This is certainly not an easy matter to resolve, but the effort needs to be made to standardize the formats of geomagnetic data as much as possible in order to minimize the need for multiple data access software programs, minimize the potential for error in retrieving these data, and maximize the availability and ease of use.
In addition to formats, there is also a need to standardize data base storage media and take advantage of advancements in technology that offer small, high-volume, high-density, low-cost, and easy-to-use data storage devices that ensure the preservation of data bases for decades to come. As was recommended in the NAS/NRC report, Geophysical Data: Policy Issues, the scientific community itself should be aware of the problems and take steps to ensure the preservation and integrity of collected data. CD-ROMs and MOs have long lifetimes—too long to be
Page 198accurately determined, but that apparently span decades or more. They are small, relatively insensitive to environmental influences, and can store massive quantities of easily accessible data. Each CD-ROM currently costs less than $2 to produce and holds approximately 650 megabytes of data. There is an investment cost for the equipment and software to pre-master a CD-ROM for manufacture (mass production), so there would be a need for an organization to assume these responsibilities, particularly in cases where the data are obtained by groups that do not have or cannot afford the costs of such equipment. Because these devices are capable of holding such large volumes of data, criteria would have to be established to perhaps abstract several different data bases onto the same CD-ROM to maximize the use of space and minimize production cost. Data Archives
It is not possible to revisit the past and reobserve the magnetic field as it was in a certain place years ago. Thus, the record of current and historical observations is critical to ongoing and future research efforts. If the data are not properly archived, research requiring a long history of observations could not be conducted without waiting for a new long history of observations to take place. Moreover, who knows whether features of the magnetic field, such as magnetic jerks and magnetic storms, will be the same during the next century as they were in this one?
The most important new scientific understandings of the nature of the Earth's core processes and their relationship to other phenomena need to have data available covering as early times as possible. Magnetic declination data from ship observations date back at least to the fifteenth century (it should be noted that these data were collected for completely practical, operational reasons, but today they serve as research tools). These need to be ferreted out and added to the geomagnetic archive. Potential contributions from direction and intensity data from archaeomagnetic studies, and paleomagnetic intensity and direction
Page 199observations (lake sediment and lava flows) can be used to extend our knowledge of core processes.
Observations that are less time-dependent, such as aeromagnetic and shipborne survey data, become meaningful only when surveys conducted over many years by numerous institutions become available in an archive for syntheses, analyses, and research. Preparation and analysis of these survey data require access to accurate local, regional, and global magnetic field models, and to dynamic models that provide a reference for moment-to-moment change throughout the day.
The collection and processing of magnetic data represent an investment in resources that becomes more valuable as time passes. Data are an important national resource that with proper care, will contribute to the solutions of numerous scientific and human problems now and in future decades.Identification of Data To Be Archived
Although it is technically possible to archive all data pertaining to geomagnetism, this may not be economically feasible or necessary. Data from observational systems exist in raw, processed, and interpreted forms. Additionally, the data may exist in several different resolutions. Collections of data at various organizations should be identified and evaluated. Not all data are available in machine-readable form. Major portions of the data need to be digitized, microfilmed, or scanned. Priorities need to be established and decisions need to be made.
Some archival problems and related issues include the following:
Sensor data are often not available at the full observing resolution. The data that are available are often sampled, averaged, or summarized. For example, observatory data are often processed to obtain 1-minute values, although the digital data collected are usually at much higher time resolutions. Similar conditions exist for aeromagnetics, ship-towed magnetics, satellite observations, and other types of measurements. The higher-resolution observa
Page 200tions are generally not now available at national data centers and World Data Centers.
There are many types of nonstandard measurements made for which there is no recognized archive.
Many of the older data such as magnetograms and aeromagnetic and ship data are available as paper records. In many cases the data have been manually digitized at sample intervals much lower than present practices.
Many paper records are deteriorating.
The types of geomagnetic data that need to be considered for archiving include those from repeat stations, satellites, paleomagnetism, archaeomagnetism, observatories, aeromagnetic data, ship-towed data, sea bottom instruments, land surveys, historical compass readings, anomalous compass reading reports, electromagnetic data, rock properties, and perhaps others. For a few of these there are active archiving activities, but for others there are only passive activities or no archive at all.Identification of Archive Centers
The existence and the programs of the U.S. national data centers and the World Data Centers are fairly well known. They operate as archive centers for U.S. and international programs. Are there, or should there be, other archive centers? In the case of federal organizations, the U.S. General Accounting Office (GAO) and the National Archives and Records Administration (NARA) have specified certain requirements and responsibilities that an archive center must meet. These include provisions for backups, environmental controls, periodic sampling, periodic migration to more modern archival media, and implementation of new technology. There is a trend toward NASA and NSF support of discipline centers at universities where researchers will oversee the data processing and distribution. These are not archive centers, but they offer the opportunity to contribute to the process of making higher quality data available for the archives if their design includes eventual or periodic data transfer.
Page 201Distributed Data Systems
The technology exists to allow a user to obtain data from a distributed system regardless of where the data are stored or the node of entry. Problems include the present difficulty of reliably transferring several gigabytes of data and concerns about long-term security. Recommendations and relationships need to be developed among the nodes, the long-term archive centers, and the funding agencies to assure viability and success of the national archive and data distribution systems. A balance needs to be established between the desire to have data under the control of those actively using the data and the need to protect against the risk that the data will “vanish” when interest and/or support for the data base goes away. Overall stewardship of the geomagnetic archive needs to reside at a single center even though there are many remote nodes performing many of the processing, distribution, and analysis functions.
The National Geophysical Data Center and its collocated World Data Center-A in Boulder, Colorado, are the archive centers for U.S. national geomagnetic data and for geomagnetic data relating to national and international programs, respectively. All data-collecting agencies, funding agencies, and research programs should coordinate with NGDC and WDC-A at the beginning stages of new data campaigns and research programs to assure that adequate provisions and resources will be available for data management activities.Data Access
This section considers some of the technical issues involved with providing scientific access to magnetic field observations. For this discussion it is assumed that the data are available in digital form. Three major issues must be considered. First, the data must be of high quality. Second, they must be available in a timely fashion. Third, they must be properly archived. Technically, both the data quality issue and the timeliness issue can be addressed most readily through on-line distribution of data. In this approach, data would be placed on-line as soon as they are
Page 202processed. Users can access the data over computer networks. The data need not be located at a central location. They can be stored at any location that is accessible by network. Systems for the distribution of on-line data and for distributed inventory tracking are becoming common. Network access is now worldwide, and the low data rate associated with magnetic field observations (whether from ground observatories or spacecraft) makes it practical to deliver magnetic data electronically. As problems with the on-line data are discovered, they can readily be corrected.
When data are held in a large collection—whether archive or working data base—they must be readily retrievable in order to be accessible. In practice, this means that desired data must be easily identified and called out of the larger mass. This raises to a high level of importance the ability of a system of data base storage and access to provide a user with a fast, simple means of browsing. This might be through a relational data base management system that provides means to search on data criteria, for example, by amplitude or orientation. It might also be through a simple visualization technique that allows a user to display an analog image of a selected length or array of the data. Browse and visualization techniques to provide effective access to the contents of a large data base are essential.
After the data have matured and a sufficient quantity has been accumulated, the data can be moved to permanent archival media. In cases involving many sources, an archival system with random access is desirable. At this writing (February 1992), the only archival random access media are optical media; of these only one (the CD-ROM) has both hardware and logical data standards in place. Logical standards concern naming conventions and directory structures. They make the CD-ROM vendor-independent. Because the volumes of magnetic data are relatively small, the limited capacity of the CD-ROM (650 MB) is not a major concern. A master of each CD-ROM must be generated. A CD-ROM master currently costs about $800. The price per disk (copied from the master) is less than $2. CD-ROM readers cost $300 to $500. Recently, write-once CDs have become available. On a CD-ROM, magnetic field
Page 203observations can easily be distributed to the wide number of potential users. Derived Products Indices
Geomagnetic indices encode the level of short-term fluctuation in the magnetic field above the normal (quiet-day) diurnal variation on both local and global scales. Indices in common use include the K-index family, Dst, and AE. The K index has served the geophysics community well for 60 years, but a simple range index defined over 3-hour intervals is no longer computationally necessary or adequate. New descriptors that measure the amplitude and rate of change of magnetic fluctuations over a range of time scales are needed. Modern data collection platforms allow spectral analysis in near real time. Data sampled at 1-second resolution could be analyzed in place, and the power in specified frequency bands indexed and transmitted over satellite links.
The Dst and AE indices are based on separate networks of observatories and meet the need for global activity indices. However, because some of the stations do not deliver digital data, it takes years to construct the indices, which are presently issued at 1-minute to 1-hour resolutions. In addition, deficiencies in the spatial distribution of the observatories used to derive the Dst and AE indices have been identified. The first three recommendations are these: (1) a new family of indices based on magnetic power spectra at local observatories should be developed; (2) the suite of AE indices should be computed at 1-minute resolution from an improved spatial distribution of digital stations; and (3) Dst should be computed at 1-minute resolution from an improved distribution of digital stations with improved correction for quiet diurnal variations.
For operational use, indices should be available (as nearly as possible) in real time. Accurate forecasts are possible if solar wind data from the forward Lagrangian position (L-1) are used. In addition, solar wind parameters provide the boundary conditions necessary to drive
Page 204magnetospheric and ionospheric models. Accordingly, the solar wind plasma and magnetic field data from the L-1 position should be acquired continuously.
The variation observed by a single station in the polar cap can provide a measure of the efficiency of connection of the interplanetary field and geomagnetic fields and can serve as a warning of increasing activity. The polar cap index should be computed at 1-minute resolution.Models and Charts
Many kinds of mathematical models, charts, and similar products are created from magnetic data. They can be discussed conveniently by considering three categories. The first category includes models and charts that describe the main field and those whose primary use is for navigation. Both kinds may be either national or global in coverage. The second category, models of externally caused fields, includes magnetospheric and ionospheric models. The third category includes models of the crustal field, maps of magnetic anomalies, and data sets consisting of grid values derived from anomaly maps. These three kinds of products and associated requirements are described below.
Main Field and Navigation (Global and National). These products provide the vital information on the variation of the compass that is so essential for safe navigation of aircraft, ships, and boats. They also provide information on the strength of the main field needed by exploration geophysicists for enhancing magnetic survey measurements taken in the search for petroleum and minerals, and information on the change of declination often needed by land surveyors. Currently, NASA, NAVOCEANO, and USGS produce global geomagnetic models. These agencies, along with the British Geological Survey (BGS) and IZMIRAN, participated in the recent (1991) revision of the IGRF. World charts for 1990, based on NAVOCEANO and BGS models, and aimed at satisfying DOD requirements, have been issued by the Defense Mapping Agency (DMA). The USGS will issue world charts for 1990, based on the IGRF and aimed at satisfying the needs of science and commerce. The USGS
Page 205has produced national models for the United States for 1990 and will issue associated charts of D and F. The USGS also provides a dial-in service for obtaining model field elements via terminal and modem.
The main challenge faced by workers in this field is how to create accurate models and charts with data that are often less than ideal. Other challenges include improving forecasts of secular variation; providing better access to models, model information, and associated software; and coping with reduced funding. Good models and charts require good data. Needed are frequent global surveys, like the Magsat satellite survey of 1979-1980, and a well-distributed network of magnetic observatories. Secular-variation forecasts, currently based on empirical analysis, would improve if a workable theory of main field generation were available. The increasing need for greater accessibility to models, model values, and associated software could be met by greater exploitation of modern technology, including network communications.
External Field Models. Models of the externally caused part of the geomagnetic field are useful for correcting ground-based observations and for basic research on the magnetosphere. More accurate, dynamic representations of all of the components of the magnetosphere and ionosphere (including solar quiet, the auroral electrojet, and the field-aligned currents) are needed. These models should provide realistic values of the externally caused field at and near the surface of the Earth.
Crustal Models, Anomaly Maps, and Gridded Data. These products are especially needed for interpreting the geology of the lithosphere. It is vital that the original data upon which these products are based, as well as the digital form of anomaly maps, be saved and made available. The perennial problem of mismatch between maps of neighboring areas must be solved by one or more of the following: regional-scale tie lines, low-altitude satellite surveys, and high-altitude aerial surveys, which would provide long-line data. Furthermore, visualization and interpretation techniques, and the associated software, must be developed and made available.
Archive centers must be located in federal organizations that have a long-term commitment to archiving and servicing geomagnetic data.
Nodes for data processing, quality control, analysis, and distribution are necessary at institutions performing research. However, strong support to the national archive centers must be maintained to provide standard and custom user services and especially to capture and archive data that would otherwise be lost.
Organizations providing data to a national data center (or node) must provide information on quality control and complete documentation of the data. These should appear as digital records accompanying the data, wherever possible. The nodes and national archive centers must perform quality control on the data in their systems.
Magnetic maps need to be digitized for those cases where the trackline data are no longer available in a useable form.
On-line directories and inventories need to be made available. These should describe not only the data held at national centers, but also those existing elsewhere. Such a system should describe the existence of digital data, paper records, maps, and special analyses.
Survey data need to be provided as total field observations as well as residuals. All corrections applied to the data should be included as part of the data record or within the documentation.
A rock-properties data base needs to be developed to support paleomagnetism and interpretation of magnetic surveys.
A task group should be established to investigate the possibility of making some version of classified and proprietary data available for the research community. Reviews of the need for continued restrictions should be made periodically in order to move data into the public domain.
Data at institutions should not be discarded without first contacting the appropriate national center. The national center, with the
Page 207help of advisory groups, will evaluate the need to archive the data and will seek funds for data rescue.
Data distributed by nodes or national data centers should include access software and be made available in work station formats.
Funding agencies should support long-term visits of research scientists to national data centers to perform cooperative analysis of the data and to provide a strong link between the data centers and the research community. Representatives of data centers should periodically visit active research and data collection organizations to assure that the needs of these organizations are being met and to arrange for any special assistance that the data center could offer to support the research and flow of data.
Institutions must develop an archive policy for those data that are not sent to a national archive center.