National Academies Press: OpenBook

Government Data Centers: Meeting Increasing Demands (2003)

Chapter: 1. About the Data Centers

« Previous: Executive Summary
Suggested Citation:"1. About the Data Centers." National Research Council. 2003. Government Data Centers: Meeting Increasing Demands. Washington, DC: The National Academies Press. doi: 10.17226/10664.
×

1
About the Data Centers

Investigation of environmental change requires the ability to compare changing conditions through time and between locations. Such comparisons are enabled by access to environmental data stored in government data centers (Table 1.1). These centers have been collecting environmental data for operational and scientific purposes for decades, and, with the lengthening record, the potential usefulness of these data continues to grow (Sidebar 1.1).

TABLE 1.1 U.S. Government Data Centers, Their Sponsoring Agencies, and Their Scientific Specialties

Center

Agency

Specialty

National Data Centers

Carbon Dioxide Information Analysis Center (CDIAC)

<http://cdiac.esd.ornl.gov/home.htm>

DOE

Atmospheric trace gases, global carbon cycle, solar and atmospheric radiation

Center for International Earth Science Information Network (CIESIN)

<http://www.ciesin.org>

Columbia Universitya

Agriculture, biodiversity, ecosystems, world resources, population, environmental assessment and health, land use and land cover change

Suggested Citation:"1. About the Data Centers." National Research Council. 2003. Government Data Centers: Meeting Increasing Demands. Washington, DC: The National Academies Press. doi: 10.17226/10664.
×

Earth Resources Observation Systems (EROS) Data Center (EDC)

<http://edc.usgs.gov>

USGS

Cartographic and land remote-sensing data products

National Earthquake Information Center (NEIC)

<http://neic.usgs.gov>

USGS

Earthquake information, seismograms

National Climatic Data Center (NCDC)

<http://lwf.ncdc.noaa.gov/oa/ncdc.html>

NOAA

Climate, meteorology, alpine environments, ocean-atmosphere interactions, vegetation, paleoclimatology

National Geophysical Data Center (NGDC)

<http://www.ngdc.noaa.gov/ngdc.html>

NOAA

Bathymetry, topography, geomagnetism, habitat, hazards, marine geophysics

National Oceanographic Data Center (NODC)

<http://www.nodc.noaa.gov>

NOAA

Physical, chemical, and biological oceanographic data

National Snow and Ice Data Center (NSIDC)

<http://nsidc.org>

NOAA

Snow, land ice, sea ice, atmosphere, biosphere, hydrosphere

National Space Science Data Center (NSSDC)

<http://nssdc.gsfc.nasa.gov>

NASA

Astronomy, astrophysics, solar and space physics, lunar and planetary science

Distributed Active Archive Centers (DAACs)

Oak Ridge National Laboratory (ORNL) DAAC

http://www-eosdis.ornl.gov

NASA

Terrestrial biogeochemistry, ecosystem dynamics

Socioeconomic Data and Applications Center (SEDAC)

<http://sedac.ciesin.org>

NASA

Population and administrative boundaries

Land Processes (EDC) DAAC <http://edcdaac.usgs.gov/landdaac/main.html>

NASA

Land remote sensing imagery, elevation, land cover

Suggested Citation:"1. About the Data Centers." National Research Council. 2003. Government Data Centers: Meeting Increasing Demands. Washington, DC: The National Academies Press. doi: 10.17226/10664.
×

NSIDC DAAC

<http://nsidc.org/daac>

NASA

Sea ice, snow cover, ice sheet data, brightness, temperature, polar atmosphere

Goddard Space Flight Center (GSFC) DAAC

<http://daac.gsfc.nasa.gov/DAAC_DOCS/gdaac_home.html>

NASA

Ocean color, hydrology and precipitation, land biosphere, atmospheric dynamics, and chemistry

Langley Research Center (LaRC) DAAC

<http://asd-www.larc.nasa.gov/scar/langley.intro.html>

NASA

Radiation budget, clouds, aerosols, and tropospheric chemistry

Physical Oceanography DAAC (PO.DAAC)

<http://podaac.jpl.nasa.gov>

NASA

Atmospheric moisture, climatology, heat flux, ice, ocean wind, sea surface height, temperature

Alaska Synthetic Aperture Radar (SAR) Facility DAAC

<http://www.asf.alaska.edu>

NASA

Sea ice, polar processes

NOTE: DOE = Department of Energy; EPA= Environmental Protection Agency; FGDC = Federal Geographic Data Committee; NASA = National Aeronautics and Space Administration; NIH = National Institutes of Health; NOAA = National Oceanic and Atmospheric Administration; NSF = National Science Foundation; USDA = U.S. Department of Agriculture; USGS = U.S. Geological Survey.

a The center is supported by contracts from 22 nonfederal and federal (e.g., EPA, NIH, FGDC, USDA, NSF) agencies.

Much of the research on the interactions of natural and human-induced changes in the global environment and the implications for society is coordinated by the United States Global Change Research Program (USGCRP). The USGCRP was established through a presidential initiative in 1989 as a multiagency effort to:

  • develop and coordinate a comprehensive and integrated program to increase the effectiveness and usefulness of government-supported global change research;

  • address scientific uncertainties about natural and human-induced Earth system changes;

Suggested Citation:"1. About the Data Centers." National Research Council. 2003. Government Data Centers: Meeting Increasing Demands. Washington, DC: The National Academies Press. doi: 10.17226/10664.
×
  • observe, understand, predict, evaluate, and communicate the societal and environmental implications of global change; and

  • provide a sound scientific basis for U.S. policies and resource management (Subcommittee on Global Change Research, 2000).

SIDEBAR 1.1 History of Data Centers

Data centers are permanent facilities that focus on the long-term maintenance, disribution, and archiving of data and data products. There are 13 discipline-based World Data Centers in the United States, including centers for atmospheric trace gases, glaciology, human interactions in the environment, marine geology and geophysics, meteorology, oceanography, paleoclimatology, remotely sensed land data, rockets and satellites, rotation of the Earth, seismology, solar-terrestrial physics, and solid Earth geophysics. In addition to these World Data Centers, federal science agencies maintain nine national data centers, which provide access to an array of publicly available datasets. Scientists have always collected data, but the creation of data centers for improved archiving and distribution is a relatively recent and evolving activity. For example, U.S. government collection of weather observations began during the War of 1812, although weather records had been maintained in personal “weather diaries” in the United States as long ago as 1644 (Shea, 1987). In 1817 a system of weather observation was established at Weather Bureau land offices, and in 1942 a central Analysis Center was created to prepare and distribute computer weather forecasts, which later became part of the National Meteorological Center (Shea, 1987). The Weather Records Center in Asheville, North Carolina, was created by the Federal Records Act of 1950 (Public Law 754, 81st Congress; CFR § 506[c]), which combined the efforts of the Weather Bureau and the Air Force and Navy Tabulation Units. In 1957 the National Climatic Data Center was established during the International Geophysical Year and now maintains the World Data Center for Meteorology in Asheville. The National Climatic Data Center is the world’s largest active archive of weather data (NOAA, 2002).

The National Space Science Data Center was established as part of the National Aeronautics and Space Administration’s (NASA’s) Goddard Space Flight Center in 1966 and is primarily responsible for the long-term maintenance of space science data (NASA, 2002a). Distributed Active Archive Centers (DAACs) provide

Suggested Citation:"1. About the Data Centers." National Research Council. 2003. Government Data Centers: Meeting Increasing Demands. Washington, DC: The National Academies Press. doi: 10.17226/10664.
×

access to the complex multidisciplinary Earth Science Enterprise data from the Earth Observation System Data and Information System (EOSDIS). The DAACs differ from data centers because the focus is on the most scientifically active part of a mission or experiment, rather than on the long-term stewardship of all data. Because a permanent storage facility is not available for NASA Earth Science data, they are transferred to the National Oceanic and Atmospheric Administration or the U.S. Geological Survey 15 years after collection (NRC, 2002). The Langley Research Center (LaRC) DAAC, for example, was created in 1989 and maintains no heritage archives. Other DAACs evolved from data centers in the early 1990s, such as the Goddard Space Flight Center (GSFC) and the Earth Resources Observation System (EROS) Data Center. The Goddard Space Flight Center DAAC maintains records from 1978 on atmospheric science and hydrology. The Land Processes Data Center evolved out of the USGS EROS Data Center, created for long-term data storage in 1972 to archive, process, and distribute Landsat data. These DAACs are among 16 major data archives, data centers, and services that disseminate NASA’s Earth Science and Space Science Enterprise data (NRC, 2002).

Most of the data collected through the USGCRP, as well as data for the operational purposes of individual agencies, are housed in environmental data centers.

Since their inception (Sidebar 1.1), however, demands on government data centers for archiving and distributing data have evolved and increased. Data from space missions, process studies, and field experiments continue to flow into the data centers.

The number of datasets and files and the volume of holdings have increased dramatically with the advent of new measurement programs, many of which are space based. For example, the amount of data that the National Oceanic and Atmospheric Administration (NOAA) archives increased from 20 terabytes in 1979 to 760 terabytes in 1999 (NOAA, 2001). Moreover, the use and integration of data across scientific disciplines have increased substantially. In 1979 there were 95,400 requests (accesses) for NOAA’s data compared to 4,200,530 in 1999 (NOAA, 2001). Increasing numbers of individuals and organizations outside the research communities seek information for legal matters, decision making, commercial strategies, education, and general curiosity. These users require specialized information and datasets tailored to their

Suggested Citation:"1. About the Data Centers." National Research Council. 2003. Government Data Centers: Meeting Increasing Demands. Washington, DC: The National Academies Press. doi: 10.17226/10664.
×

individual applications and place a heavy demand on data center user services. At the same time, many data center budgets have remained flat or declined, making it difficult for data centers to fulfill their missions.

The environmental challenges facing the twenty-first century will place an increasing reliance on the full spectrum of environmental data. These data are critical for understanding how the earth system operates and how to ensure a sustainable future in the face of environmental variability and change. Scientists are interested in issues such as the composition of the atmosphere, changing ecosystems, the way carbon cycles through the environment, the human dimensions of climate change, the variability and change of climate, and the global water cycle. Commercial concerns are prodded to use resources efficiently while minimizing harm to the environment. Policy makers must make decisions on activities that may affect the environment and must determine how best to adapt to environmental changes. Finally, educators work to communicate knowledge to create a more informed populace. Data centers serve all of these user groups, although each requires different products, services, and degrees of assistance.

For example, information providers already know what products they want; they will be the least tolerant of barriers to immediate delivery of those products. These users must be offered direct access to standard or custom products via Web services. Information browsers are reasonably familiar with a data product domain but not necessarily with the scope or character of holdings in that domain. They may also wish to perform exploratory analyses on the domain to help identify product subsets of interest. Information seekers have a constrained notion (e.g., geophysical parameter, region, season, etc.) of what they seek but may be unfamiliar with the corresponding providers and products.

Nine national data centers and eight distributed active archive centers (DAACs) collect, disseminate, and archive environmental data (Table 1.1). Data center holdings vary and include data collected from a variety of measurement platforms—satellite, aircraft, ship, ground—with different temporal and spatial resolutions and degrees of documentation. In addition, each center focuses on specific scientific disciplines, such as oceanography, remote sensing, climatology, or snow and ice. Another variation in data center operations is with the timing of data distribution: some centers deliver data only on request, while others deliver in real time, and others are on a subscription basis.

Government data centers are repositories for the nation’s environmental data. Methods of data archiving and stewardship are complemented by strategies for ingesting large volumes of raw data. In addition, data centers perform a valuable service to the scientific

Suggested Citation:"1. About the Data Centers." National Research Council. 2003. Government Data Centers: Meeting Increasing Demands. Washington, DC: The National Academies Press. doi: 10.17226/10664.
×

community through data quality control, integration, and value-added activities, such as processing data and developing tools for data analysis and presentation. In many cases, they have been successful in developing a laudable level of customer service and satisfaction.

Increasing amounts of data, differing data types, changing user communities, and steadily increasing demands of users and data providers are precipitating a crisis in the ability of data centers to fulfill their missions. In recognition of this crisis, the centers may have to make trade-offs between maintaining existing holdings and incorporating new holdings, serving more users, or providing quality services.

These challenges prompted the USGCRP to ask the National Research Council (NRC) to host a workshop to examine the extent to which emerging technologies can help data centers meet user needs and maintain the long-term record of environmental change. The Committee on Coping with Increasing Demands on Government Data Centers (Appendix A) was charged to examine

  • technological solutions that could enhance the ability of users to find, interpret, and analyze information held in environmental data centers and

  • technological solutions that could help data centers collect, store, share, manage, and distribute large volumes of data.

This report results from the requested NRC workshop, which provided a starting point in identifying technological approaches that would build on present data center operations in the areas of data search, retrieval, sharing, and storage. Methods for data ingest appear to have fewer opportunities for technological innovation. This report is not a conclusive technology assessment but a summary and discussion of the challenges and approaches identified at the workshop. Individual data center operations differ, and in many cases, data centers implement new technologies, though to varying degrees. Chapter 2 expounds upon these technological approaches and potential means of implementation. The agenda, participants, and working group conclusions from the workshop are outlined in Appendixes B, C, and D, respectively. Terms and acronyms used in the report are defined in Appendixes E and F.

Finally, over the past decade, many NRC reports have addressed topics that intersect with this workshop’s focus. This report is not a comprehensive review of individual data center operations, an important topic addressed by NRC (1997). The issue of community access to data

Suggested Citation:"1. About the Data Centers." National Research Council. 2003. Government Data Centers: Meeting Increasing Demands. Washington, DC: The National Academies Press. doi: 10.17226/10664.
×

was the subject of NRC (2001). Finally, NRC (1995) covered the topic of federated distributed data centers.

Suggested Citation:"1. About the Data Centers." National Research Council. 2003. Government Data Centers: Meeting Increasing Demands. Washington, DC: The National Academies Press. doi: 10.17226/10664.
×
Page 7
Suggested Citation:"1. About the Data Centers." National Research Council. 2003. Government Data Centers: Meeting Increasing Demands. Washington, DC: The National Academies Press. doi: 10.17226/10664.
×
Page 8
Suggested Citation:"1. About the Data Centers." National Research Council. 2003. Government Data Centers: Meeting Increasing Demands. Washington, DC: The National Academies Press. doi: 10.17226/10664.
×
Page 9
Suggested Citation:"1. About the Data Centers." National Research Council. 2003. Government Data Centers: Meeting Increasing Demands. Washington, DC: The National Academies Press. doi: 10.17226/10664.
×
Page 10
Suggested Citation:"1. About the Data Centers." National Research Council. 2003. Government Data Centers: Meeting Increasing Demands. Washington, DC: The National Academies Press. doi: 10.17226/10664.
×
Page 11
Suggested Citation:"1. About the Data Centers." National Research Council. 2003. Government Data Centers: Meeting Increasing Demands. Washington, DC: The National Academies Press. doi: 10.17226/10664.
×
Page 12
Suggested Citation:"1. About the Data Centers." National Research Council. 2003. Government Data Centers: Meeting Increasing Demands. Washington, DC: The National Academies Press. doi: 10.17226/10664.
×
Page 13
Suggested Citation:"1. About the Data Centers." National Research Council. 2003. Government Data Centers: Meeting Increasing Demands. Washington, DC: The National Academies Press. doi: 10.17226/10664.
×
Page 14
Next: 2. Challenges and Opportunities »
Government Data Centers: Meeting Increasing Demands Get This Book
×
Buy Paperback | $29.00 Buy Ebook | $23.99
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

Environmental data centers have been successfully acquiring, disseminating, and archiving data for decades. However, the increasing volume and number of data sets, coupled with greater demands from more diverse users, are making it difficult for data centers to maintain the record of environmental change. This workshop report focuses on technological approaches that could enhance the ability of environmental data centers to deal with these challenges, and improve the ability of users to find and use information held in data centers. Among the major findings are that data centers should rely more on off-the-shelf technology -- including software and commonly available hardware -- and should shift from tape to disk as the primary storage medium. Such technological improvements will help solve many data management problems, although data centers and their host agencies will have to continue to invest in the scientific and human elements of data center operations.

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    Switch between the Original Pages, where you can read the report as it appeared in print, and Text Pages for the web version, where you can highlight and search the text.

    « Back Next »
  6. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  7. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  8. ×

    View our suggested citation for this chapter.

    « Back Next »
  9. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!