1
About the Data Centers

Investigation of environmental change requires the ability to compare changing conditions through time and between locations. Such comparisons are enabled by access to environmental data stored in government data centers (Table 1.1). These centers have been collecting environmental data for operational and scientific purposes for decades, and, with the lengthening record, the potential usefulness of these data continues to grow (Sidebar 1.1).

TABLE 1.1 U.S. Government Data Centers, Their Sponsoring Agencies, and Their Scientific Specialties

Center

Agency

Specialty

National Data Centers

Carbon Dioxide Information Analysis Center (CDIAC)

<http://cdiac.esd.ornl.gov/home.htm>

DOE

Atmospheric trace gases, global carbon cycle, solar and atmospheric radiation

Center for International Earth Science Information Network (CIESIN)

<http://www.ciesin.org>

Columbia Universitya

Agriculture, biodiversity, ecosystems, world resources, population, environmental assessment and health, land use and land cover change



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 7
Government Data Centers: Meeting Increasing Demands 1 About the Data Centers Investigation of environmental change requires the ability to compare changing conditions through time and between locations. Such comparisons are enabled by access to environmental data stored in government data centers (Table 1.1). These centers have been collecting environmental data for operational and scientific purposes for decades, and, with the lengthening record, the potential usefulness of these data continues to grow (Sidebar 1.1). TABLE 1.1 U.S. Government Data Centers, Their Sponsoring Agencies, and Their Scientific Specialties Center Agency Specialty National Data Centers Carbon Dioxide Information Analysis Center (CDIAC) <http://cdiac.esd.ornl.gov/home.htm> DOE Atmospheric trace gases, global carbon cycle, solar and atmospheric radiation Center for International Earth Science Information Network (CIESIN) <http://www.ciesin.org> Columbia Universitya Agriculture, biodiversity, ecosystems, world resources, population, environmental assessment and health, land use and land cover change

OCR for page 7
Government Data Centers: Meeting Increasing Demands Earth Resources Observation Systems (EROS) Data Center (EDC) <http://edc.usgs.gov> USGS Cartographic and land remote-sensing data products National Earthquake Information Center (NEIC) <http://neic.usgs.gov> USGS Earthquake information, seismograms National Climatic Data Center (NCDC) <http://lwf.ncdc.noaa.gov/oa/ncdc.html> NOAA Climate, meteorology, alpine environments, ocean-atmosphere interactions, vegetation, paleoclimatology National Geophysical Data Center (NGDC) <http://www.ngdc.noaa.gov/ngdc.html> NOAA Bathymetry, topography, geomagnetism, habitat, hazards, marine geophysics National Oceanographic Data Center (NODC) <http://www.nodc.noaa.gov> NOAA Physical, chemical, and biological oceanographic data National Snow and Ice Data Center (NSIDC) <http://nsidc.org> NOAA Snow, land ice, sea ice, atmosphere, biosphere, hydrosphere National Space Science Data Center (NSSDC) <http://nssdc.gsfc.nasa.gov> NASA Astronomy, astrophysics, solar and space physics, lunar and planetary science Distributed Active Archive Centers (DAACs) Oak Ridge National Laboratory (ORNL) DAAC http://www-eosdis.ornl.gov NASA Terrestrial biogeochemistry, ecosystem dynamics Socioeconomic Data and Applications Center (SEDAC) <http://sedac.ciesin.org> NASA Population and administrative boundaries Land Processes (EDC) DAAC <http://edcdaac.usgs.gov/landdaac/main.html> NASA Land remote sensing imagery, elevation, land cover

OCR for page 7
Government Data Centers: Meeting Increasing Demands NSIDC DAAC <http://nsidc.org/daac> NASA Sea ice, snow cover, ice sheet data, brightness, temperature, polar atmosphere Goddard Space Flight Center (GSFC) DAAC <http://daac.gsfc.nasa.gov/DAAC_DOCS/gdaac_home.html> NASA Ocean color, hydrology and precipitation, land biosphere, atmospheric dynamics, and chemistry Langley Research Center (LaRC) DAAC <http://asd-www.larc.nasa.gov/scar/langley.intro.html> NASA Radiation budget, clouds, aerosols, and tropospheric chemistry Physical Oceanography DAAC (PO.DAAC) <http://podaac.jpl.nasa.gov> NASA Atmospheric moisture, climatology, heat flux, ice, ocean wind, sea surface height, temperature Alaska Synthetic Aperture Radar (SAR) Facility DAAC <http://www.asf.alaska.edu> NASA Sea ice, polar processes NOTE: DOE = Department of Energy; EPA= Environmental Protection Agency; FGDC = Federal Geographic Data Committee; NASA = National Aeronautics and Space Administration; NIH = National Institutes of Health; NOAA = National Oceanic and Atmospheric Administration; NSF = National Science Foundation; USDA = U.S. Department of Agriculture; USGS = U.S. Geological Survey. a The center is supported by contracts from 22 nonfederal and federal (e.g., EPA, NIH, FGDC, USDA, NSF) agencies. Much of the research on the interactions of natural and human-induced changes in the global environment and the implications for society is coordinated by the United States Global Change Research Program (USGCRP). The USGCRP was established through a presidential initiative in 1989 as a multiagency effort to: develop and coordinate a comprehensive and integrated program to increase the effectiveness and usefulness of government-supported global change research; address scientific uncertainties about natural and human-induced Earth system changes;

OCR for page 7
Government Data Centers: Meeting Increasing Demands observe, understand, predict, evaluate, and communicate the societal and environmental implications of global change; and provide a sound scientific basis for U.S. policies and resource management (Subcommittee on Global Change Research, 2000). SIDEBAR 1.1 History of Data Centers Data centers are permanent facilities that focus on the long-term maintenance, disribution, and archiving of data and data products. There are 13 discipline-based World Data Centers in the United States, including centers for atmospheric trace gases, glaciology, human interactions in the environment, marine geology and geophysics, meteorology, oceanography, paleoclimatology, remotely sensed land data, rockets and satellites, rotation of the Earth, seismology, solar-terrestrial physics, and solid Earth geophysics. In addition to these World Data Centers, federal science agencies maintain nine national data centers, which provide access to an array of publicly available datasets. Scientists have always collected data, but the creation of data centers for improved archiving and distribution is a relatively recent and evolving activity. For example, U.S. government collection of weather observations began during the War of 1812, although weather records had been maintained in personal “weather diaries” in the United States as long ago as 1644 (Shea, 1987). In 1817 a system of weather observation was established at Weather Bureau land offices, and in 1942 a central Analysis Center was created to prepare and distribute computer weather forecasts, which later became part of the National Meteorological Center (Shea, 1987). The Weather Records Center in Asheville, North Carolina, was created by the Federal Records Act of 1950 (Public Law 754, 81st Congress; CFR § 506[c]), which combined the efforts of the Weather Bureau and the Air Force and Navy Tabulation Units. In 1957 the National Climatic Data Center was established during the International Geophysical Year and now maintains the World Data Center for Meteorology in Asheville. The National Climatic Data Center is the world’s largest active archive of weather data (NOAA, 2002). The National Space Science Data Center was established as part of the National Aeronautics and Space Administration’s (NASA’s) Goddard Space Flight Center in 1966 and is primarily responsible for the long-term maintenance of space science data (NASA, 2002a). Distributed Active Archive Centers (DAACs) provide

OCR for page 7
Government Data Centers: Meeting Increasing Demands access to the complex multidisciplinary Earth Science Enterprise data from the Earth Observation System Data and Information System (EOSDIS). The DAACs differ from data centers because the focus is on the most scientifically active part of a mission or experiment, rather than on the long-term stewardship of all data. Because a permanent storage facility is not available for NASA Earth Science data, they are transferred to the National Oceanic and Atmospheric Administration or the U.S. Geological Survey 15 years after collection (NRC, 2002). The Langley Research Center (LaRC) DAAC, for example, was created in 1989 and maintains no heritage archives. Other DAACs evolved from data centers in the early 1990s, such as the Goddard Space Flight Center (GSFC) and the Earth Resources Observation System (EROS) Data Center. The Goddard Space Flight Center DAAC maintains records from 1978 on atmospheric science and hydrology. The Land Processes Data Center evolved out of the USGS EROS Data Center, created for long-term data storage in 1972 to archive, process, and distribute Landsat data. These DAACs are among 16 major data archives, data centers, and services that disseminate NASA’s Earth Science and Space Science Enterprise data (NRC, 2002). Most of the data collected through the USGCRP, as well as data for the operational purposes of individual agencies, are housed in environmental data centers. Since their inception (Sidebar 1.1), however, demands on government data centers for archiving and distributing data have evolved and increased. Data from space missions, process studies, and field experiments continue to flow into the data centers. The number of datasets and files and the volume of holdings have increased dramatically with the advent of new measurement programs, many of which are space based. For example, the amount of data that the National Oceanic and Atmospheric Administration (NOAA) archives increased from 20 terabytes in 1979 to 760 terabytes in 1999 (NOAA, 2001). Moreover, the use and integration of data across scientific disciplines have increased substantially. In 1979 there were 95,400 requests (accesses) for NOAA’s data compared to 4,200,530 in 1999 (NOAA, 2001). Increasing numbers of individuals and organizations outside the research communities seek information for legal matters, decision making, commercial strategies, education, and general curiosity. These users require specialized information and datasets tailored to their

OCR for page 7
Government Data Centers: Meeting Increasing Demands individual applications and place a heavy demand on data center user services. At the same time, many data center budgets have remained flat or declined, making it difficult for data centers to fulfill their missions. The environmental challenges facing the twenty-first century will place an increasing reliance on the full spectrum of environmental data. These data are critical for understanding how the earth system operates and how to ensure a sustainable future in the face of environmental variability and change. Scientists are interested in issues such as the composition of the atmosphere, changing ecosystems, the way carbon cycles through the environment, the human dimensions of climate change, the variability and change of climate, and the global water cycle. Commercial concerns are prodded to use resources efficiently while minimizing harm to the environment. Policy makers must make decisions on activities that may affect the environment and must determine how best to adapt to environmental changes. Finally, educators work to communicate knowledge to create a more informed populace. Data centers serve all of these user groups, although each requires different products, services, and degrees of assistance. For example, information providers already know what products they want; they will be the least tolerant of barriers to immediate delivery of those products. These users must be offered direct access to standard or custom products via Web services. Information browsers are reasonably familiar with a data product domain but not necessarily with the scope or character of holdings in that domain. They may also wish to perform exploratory analyses on the domain to help identify product subsets of interest. Information seekers have a constrained notion (e.g., geophysical parameter, region, season, etc.) of what they seek but may be unfamiliar with the corresponding providers and products. Nine national data centers and eight distributed active archive centers (DAACs) collect, disseminate, and archive environmental data (Table 1.1). Data center holdings vary and include data collected from a variety of measurement platforms—satellite, aircraft, ship, ground—with different temporal and spatial resolutions and degrees of documentation. In addition, each center focuses on specific scientific disciplines, such as oceanography, remote sensing, climatology, or snow and ice. Another variation in data center operations is with the timing of data distribution: some centers deliver data only on request, while others deliver in real time, and others are on a subscription basis. Government data centers are repositories for the nation’s environmental data. Methods of data archiving and stewardship are complemented by strategies for ingesting large volumes of raw data. In addition, data centers perform a valuable service to the scientific

OCR for page 7
Government Data Centers: Meeting Increasing Demands community through data quality control, integration, and value-added activities, such as processing data and developing tools for data analysis and presentation. In many cases, they have been successful in developing a laudable level of customer service and satisfaction. Increasing amounts of data, differing data types, changing user communities, and steadily increasing demands of users and data providers are precipitating a crisis in the ability of data centers to fulfill their missions. In recognition of this crisis, the centers may have to make trade-offs between maintaining existing holdings and incorporating new holdings, serving more users, or providing quality services. These challenges prompted the USGCRP to ask the National Research Council (NRC) to host a workshop to examine the extent to which emerging technologies can help data centers meet user needs and maintain the long-term record of environmental change. The Committee on Coping with Increasing Demands on Government Data Centers (Appendix A) was charged to examine technological solutions that could enhance the ability of users to find, interpret, and analyze information held in environmental data centers and technological solutions that could help data centers collect, store, share, manage, and distribute large volumes of data. This report results from the requested NRC workshop, which provided a starting point in identifying technological approaches that would build on present data center operations in the areas of data search, retrieval, sharing, and storage. Methods for data ingest appear to have fewer opportunities for technological innovation. This report is not a conclusive technology assessment but a summary and discussion of the challenges and approaches identified at the workshop. Individual data center operations differ, and in many cases, data centers implement new technologies, though to varying degrees. Chapter 2 expounds upon these technological approaches and potential means of implementation. The agenda, participants, and working group conclusions from the workshop are outlined in Appendixes B, C, and D, respectively. Terms and acronyms used in the report are defined in Appendixes E and F. Finally, over the past decade, many NRC reports have addressed topics that intersect with this workshop’s focus. This report is not a comprehensive review of individual data center operations, an important topic addressed by NRC (1997). The issue of community access to data

OCR for page 7
Government Data Centers: Meeting Increasing Demands was the subject of NRC (2001). Finally, NRC (1995) covered the topic of federated distributed data centers.