1
About the Data Centers
Investigation of environmental change requires the ability to compare changing conditions through time and between locations. Such comparisons are enabled by access to environmental data stored in government data centers (Table 1.1). These centers have been collecting environmental data for operational and scientific purposes for decades, and, with the lengthening record, the potential usefulness of these data continues to grow (Sidebar 1.1).
TABLE 1.1 U.S. Government Data Centers, Their Sponsoring Agencies, and Their Scientific Specialties
Center |
Agency |
Specialty |
National Data Centers |
||
Carbon Dioxide Information Analysis Center (CDIAC) |
DOE |
Atmospheric trace gases, global carbon cycle, solar and atmospheric radiation |
Center for International Earth Science Information Network (CIESIN) |
Columbia Universitya |
Agriculture, biodiversity, ecosystems, world resources, population, environmental assessment and health, land use and land cover change |
Earth Resources Observation Systems (EROS) Data Center (EDC) |
USGS |
Cartographic and land remote-sensing data products |
National Earthquake Information Center (NEIC) |
USGS |
Earthquake information, seismograms |
National Climatic Data Center (NCDC) |
NOAA |
Climate, meteorology, alpine environments, ocean-atmosphere interactions, vegetation, paleoclimatology |
National Geophysical Data Center (NGDC) |
NOAA |
Bathymetry, topography, geomagnetism, habitat, hazards, marine geophysics |
National Oceanographic Data Center (NODC) |
NOAA |
Physical, chemical, and biological oceanographic data |
National Snow and Ice Data Center (NSIDC) |
NOAA |
Snow, land ice, sea ice, atmosphere, biosphere, hydrosphere |
National Space Science Data Center (NSSDC) |
NASA |
Astronomy, astrophysics, solar and space physics, lunar and planetary science |
Distributed Active Archive Centers (DAACs) |
||
Oak Ridge National Laboratory (ORNL) DAAC |
NASA |
Terrestrial biogeochemistry, ecosystem dynamics |
Socioeconomic Data and Applications Center (SEDAC) |
NASA |
Population and administrative boundaries |
Land Processes (EDC) DAAC <http://edcdaac.usgs.gov/landdaac/main.html> |
NASA |
Land remote sensing imagery, elevation, land cover |
Much of the research on the interactions of natural and human-induced changes in the global environment and the implications for society is coordinated by the United States Global Change Research Program (USGCRP). The USGCRP was established through a presidential initiative in 1989 as a multiagency effort to:
-
develop and coordinate a comprehensive and integrated program to increase the effectiveness and usefulness of government-supported global change research;
-
address scientific uncertainties about natural and human-induced Earth system changes;
-
observe, understand, predict, evaluate, and communicate the societal and environmental implications of global change; and
-
provide a sound scientific basis for U.S. policies and resource management (Subcommittee on Global Change Research, 2000).
SIDEBAR 1.1 History of Data Centers Data centers are permanent facilities that focus on the long-term maintenance, disribution, and archiving of data and data products. There are 13 discipline-based World Data Centers in the United States, including centers for atmospheric trace gases, glaciology, human interactions in the environment, marine geology and geophysics, meteorology, oceanography, paleoclimatology, remotely sensed land data, rockets and satellites, rotation of the Earth, seismology, solar-terrestrial physics, and solid Earth geophysics. In addition to these World Data Centers, federal science agencies maintain nine national data centers, which provide access to an array of publicly available datasets. Scientists have always collected data, but the creation of data centers for improved archiving and distribution is a relatively recent and evolving activity. For example, U.S. government collection of weather observations began during the War of 1812, although weather records had been maintained in personal “weather diaries” in the United States as long ago as 1644 (Shea, 1987). In 1817 a system of weather observation was established at Weather Bureau land offices, and in 1942 a central Analysis Center was created to prepare and distribute computer weather forecasts, which later became part of the National Meteorological Center (Shea, 1987). The Weather Records Center in Asheville, North Carolina, was created by the Federal Records Act of 1950 (Public Law 754, 81st Congress; CFR § 506[c]), which combined the efforts of the Weather Bureau and the Air Force and Navy Tabulation Units. In 1957 the National Climatic Data Center was established during the International Geophysical Year and now maintains the World Data Center for Meteorology in Asheville. The National Climatic Data Center is the world’s largest active archive of weather data (NOAA, 2002). The National Space Science Data Center was established as part of the National Aeronautics and Space Administration’s (NASA’s) Goddard Space Flight Center in 1966 and is primarily responsible for the long-term maintenance of space science data (NASA, 2002a). Distributed Active Archive Centers (DAACs) provide |
access to the complex multidisciplinary Earth Science Enterprise data from the Earth Observation System Data and Information System (EOSDIS). The DAACs differ from data centers because the focus is on the most scientifically active part of a mission or experiment, rather than on the long-term stewardship of all data. Because a permanent storage facility is not available for NASA Earth Science data, they are transferred to the National Oceanic and Atmospheric Administration or the U.S. Geological Survey 15 years after collection (NRC, 2002). The Langley Research Center (LaRC) DAAC, for example, was created in 1989 and maintains no heritage archives. Other DAACs evolved from data centers in the early 1990s, such as the Goddard Space Flight Center (GSFC) and the Earth Resources Observation System (EROS) Data Center. The Goddard Space Flight Center DAAC maintains records from 1978 on atmospheric science and hydrology. The Land Processes Data Center evolved out of the USGS EROS Data Center, created for long-term data storage in 1972 to archive, process, and distribute Landsat data. These DAACs are among 16 major data archives, data centers, and services that disseminate NASA’s Earth Science and Space Science Enterprise data (NRC, 2002). |
Most of the data collected through the USGCRP, as well as data for the operational purposes of individual agencies, are housed in environmental data centers.
Since their inception (Sidebar 1.1), however, demands on government data centers for archiving and distributing data have evolved and increased. Data from space missions, process studies, and field experiments continue to flow into the data centers.
The number of datasets and files and the volume of holdings have increased dramatically with the advent of new measurement programs, many of which are space based. For example, the amount of data that the National Oceanic and Atmospheric Administration (NOAA) archives increased from 20 terabytes in 1979 to 760 terabytes in 1999 (NOAA, 2001). Moreover, the use and integration of data across scientific disciplines have increased substantially. In 1979 there were 95,400 requests (accesses) for NOAA’s data compared to 4,200,530 in 1999 (NOAA, 2001). Increasing numbers of individuals and organizations outside the research communities seek information for legal matters, decision making, commercial strategies, education, and general curiosity. These users require specialized information and datasets tailored to their
individual applications and place a heavy demand on data center user services. At the same time, many data center budgets have remained flat or declined, making it difficult for data centers to fulfill their missions.
The environmental challenges facing the twenty-first century will place an increasing reliance on the full spectrum of environmental data. These data are critical for understanding how the earth system operates and how to ensure a sustainable future in the face of environmental variability and change. Scientists are interested in issues such as the composition of the atmosphere, changing ecosystems, the way carbon cycles through the environment, the human dimensions of climate change, the variability and change of climate, and the global water cycle. Commercial concerns are prodded to use resources efficiently while minimizing harm to the environment. Policy makers must make decisions on activities that may affect the environment and must determine how best to adapt to environmental changes. Finally, educators work to communicate knowledge to create a more informed populace. Data centers serve all of these user groups, although each requires different products, services, and degrees of assistance.
For example, information providers already know what products they want; they will be the least tolerant of barriers to immediate delivery of those products. These users must be offered direct access to standard or custom products via Web services. Information browsers are reasonably familiar with a data product domain but not necessarily with the scope or character of holdings in that domain. They may also wish to perform exploratory analyses on the domain to help identify product subsets of interest. Information seekers have a constrained notion (e.g., geophysical parameter, region, season, etc.) of what they seek but may be unfamiliar with the corresponding providers and products.
Nine national data centers and eight distributed active archive centers (DAACs) collect, disseminate, and archive environmental data (Table 1.1). Data center holdings vary and include data collected from a variety of measurement platforms—satellite, aircraft, ship, ground—with different temporal and spatial resolutions and degrees of documentation. In addition, each center focuses on specific scientific disciplines, such as oceanography, remote sensing, climatology, or snow and ice. Another variation in data center operations is with the timing of data distribution: some centers deliver data only on request, while others deliver in real time, and others are on a subscription basis.
Government data centers are repositories for the nation’s environmental data. Methods of data archiving and stewardship are complemented by strategies for ingesting large volumes of raw data. In addition, data centers perform a valuable service to the scientific
community through data quality control, integration, and value-added activities, such as processing data and developing tools for data analysis and presentation. In many cases, they have been successful in developing a laudable level of customer service and satisfaction.
Increasing amounts of data, differing data types, changing user communities, and steadily increasing demands of users and data providers are precipitating a crisis in the ability of data centers to fulfill their missions. In recognition of this crisis, the centers may have to make trade-offs between maintaining existing holdings and incorporating new holdings, serving more users, or providing quality services.
These challenges prompted the USGCRP to ask the National Research Council (NRC) to host a workshop to examine the extent to which emerging technologies can help data centers meet user needs and maintain the long-term record of environmental change. The Committee on Coping with Increasing Demands on Government Data Centers (Appendix A) was charged to examine
-
technological solutions that could enhance the ability of users to find, interpret, and analyze information held in environmental data centers and
-
technological solutions that could help data centers collect, store, share, manage, and distribute large volumes of data.
This report results from the requested NRC workshop, which provided a starting point in identifying technological approaches that would build on present data center operations in the areas of data search, retrieval, sharing, and storage. Methods for data ingest appear to have fewer opportunities for technological innovation. This report is not a conclusive technology assessment but a summary and discussion of the challenges and approaches identified at the workshop. Individual data center operations differ, and in many cases, data centers implement new technologies, though to varying degrees. Chapter 2 expounds upon these technological approaches and potential means of implementation. The agenda, participants, and working group conclusions from the workshop are outlined in Appendixes B, C, and D, respectively. Terms and acronyms used in the report are defined in Appendixes E and F.
Finally, over the past decade, many NRC reports have addressed topics that intersect with this workshop’s focus. This report is not a comprehensive review of individual data center operations, an important topic addressed by NRC (1997). The issue of community access to data