2
Overview of NGDC

The purpose of data centers istoserve users not only now but also in future generations. Doing this well requires that data centers participate in all the major stages in the life cycle of a dataset:

  1. Data collection and product generation. Data centers can seek to clarify both the information being captured and the inputs or parameters imported from other sources and to facilitate the process of recording them.

  2. Management of active datasets. Data centers can strive to understand the data needs of their users; prepare guide information to assist users in evaluating the relevance of the data to their purposes; develop data-handling tools and services to help users find and work with the data; contact experts on behalf of users with complex scientific queries; and reprocess data in response to scientific demands.

  3. Long-term archive. Data centers can assemble and present useful information about datasets to ensure a greater likelihood that the data will remain useful beyond the period when a high volume of exchange, access, and manipulation takes place.1

This chapter describes the National Geophysical Data Center’s (NGDC’s) activities in the data life cycle. This description is based on

1  

National Research Council, 1998, Review of NASA’s Distributed Active Archive Centers, National Academy Press, Washington, D.C., pp. 41-43.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 13
2 Overview of NGDC The purpose of data centers istoserve users not only now but also in future generations. Doing this well requires that data centers participate in all the major stages in the life cycle of a dataset: Data collection and product generation. Data centers can seek to clarify both the information being captured and the inputs or parameters imported from other sources and to facilitate the process of recording them. Management of active datasets. Data centers can strive to understand the data needs of their users; prepare guide information to assist users in evaluating the relevance of the data to their purposes; develop data-handling tools and services to help users find and work with the data; contact experts on behalf of users with complex scientific queries; and reprocess data in response to scientific demands. Long-term archive. Data centers can assemble and present useful information about datasets to ensure a greater likelihood that the data will remain useful beyond the period when a high volume of exchange, access, and manipulation takes place.1 This chapter describes the National Geophysical Data Center’s (NGDC’s) activities in the data life cycle. This description is based on 1   National Research Council, 1998, Review of NASA’s Distributed Active Archive Centers, National Academy Press, Washington, D.C., pp. 41-43.

OCR for page 13
information gathered from meetings, interviews with NGDC staff, and background material related to the committee’s review criteria (Appendix B). An analysis of these issues is given in Chapter 3. HOLDINGS Overview NGDC holdings include seafloor and lakefloor analyses, descriptions, and sample inventories; trackline geophysical measurements; hydrographic sounding surveys; multibeam bathymetry tracks and surveys; sidescan sonar and multichannel seismic profiles; hazards information; ecosystems data and assessments; and solar, magnetospheric, ionospheric, geomagnetic, and cosmic ray data (Appendix C). Data are collected from a variety of platforms—ship, submarine, aircraft, ground and seafloor stations, and satellites. Satellite data held by NGDC include particles and fields, spacecraft anomalies, solar imagery, and solar radiation data from the National Oceanic and Atmospheric Administration’s (NOAA’s) geostationary and polar-orbiting satellites and the Air Force’s Defense Meteorological Satellite Program (DMSP). The DMSP holdings make up 97 percent of the total data volume at NGDC (Figure 2.1) and account for most of the growth in data volume at NGDC in the late 1990s (Figure 2.2). The archive will continue to grow at a rapid rate if NGDC acquires other large datasets currently under discussion (see “Data Acquisition and Transfer Strategy” below). Although almost all of the newer datasets are in digital form, the center also maintains substantial holdings of paper, film, and microfilm records, as well as slide sets and posters (see Appendix C for a list of datasets). About 25 percent of the total volume of all NGDC data are online.2 Of the digital data holdings, 51 percent of solid earth geophysics (SEG) datasets, 62 percent of solar-terrestrial physics (STP) datasets, and 84 percent of marine geology and geophysics (MGG) datasets are online.3 In general, analog datasets are more difficult for staff to manage than digital datasets because the relevant metadata often do not reside with the analog records, which makes it harder to assemble useful datasets. Similarly, small, unique 2   Presentation to the committee by Michael Loughridge, director, National Geophysical Data Center, August 13, 2002. 3   Based on averages of the percent online of datasets for each division listed in Appendix A. Datasets vary in size, and the datasets given in Appendix A are highly aggregated, so these figures differ from the total amount of NGDC data online. Nevertheless, they provide an indication of where each division stands in making its data available online.

OCR for page 13
FIGURE 2.1 Digital data holdings archived as of September 2002, showing the relative volumes of different data types. DMSP data make up 97 percent of the volume of digital holdings. SOURCE: National Geophysical Data Center. FIGURE 2.2 Growth in NGDC holdings from 1988 to 2002. Data volumes include backup copies. The changes in slope reflect the addition of two major data streams: DMSP in 1993 and film scans of DMSP and new bathymetry data in 1999. SOURCE: National Geophysical Data Center.

OCR for page 13
datasets are often more difficult to manage than large datasets such as DMSP because of the diversity of formats and metadata. Most datasets are updated or added to regularly. About 50 percent of the datasets have been updated within the last two years (2001 and 2002) and 80 percent have been updated within the last 10 years (Appendix C). Data Acquisition and Transfer Strategy NGDC is required to archive certain data to fulfill its mission (Appendix D) or to support agreements with other agencies. For example, in some disciplines principal investigators funded by the National Science Foundation and the Office of Naval Research are required to deposit their data in a national archive, such as NGDC, although this requirement is not always enforced and resources for data conditioning and archiving are rarely forth-coming.4 NGDC division chiefs also actively seek relevant datasets and consider requests to take responsibility for data from external organizations, including other divisions of NOAA (e.g., the National Ocean Service, Office of Oceanic and Atmospheric Research), universities, federal agencies (e.g., National Imagery and Mapping Agency, Air Force Weather Agency, U.S. Geological Survey [USGS]), international organizations (e.g., International Hydrographic Organization, Intergovernmental Oceanographic Commission, Ocean Drilling Program), and industry.5 Potential future sources of data include the Solar X-ray Imager (SXI) on GOES-12, a network of continuously operating Global Positioning System reference stations, high-resolution sidescan sonar imagery, and shallow-water multibeam bathymetry.6 In addition, NOAA in coordination with the Department of Defense and the National Aeronautics and Space Administration is preparing the National Polar-Orbiting Operational Environmental Satellite System (NPOESS). The satellites will carry a space environment sensor suite that is similar to the suite carried on the DMSP spacecraft and could be managed by NGDC. NGDC’s reported criteria for acquiring or rejecting new data are given in Box 2.1. 4   NSF’s Ocean Sciences Division is considering modifying its data archival requirements. Under the proposed new guidelines principal investigators (PIs) will be able to send data to any scientific data repository as long as that repository has an agreement to eventually transfer the data to a national archive for permanent stewardship. In their final grant reports PIs will have to demonstrate that the archive requirement has been satisfied. Personal communication from David Epp, program director, NSF Marine Geology and Geophysics program, March 12, 2003. 5   Background material prepared by NGDC for the August 13-14, 2002, committee meeting. 6   Since the review took place, NGDC has begun receiving GOES SXI data, high-resolution sidescan sonar imagery, and data from Continuously Operating Reference Stations.

OCR for page 13
Box 2.1 NGDC Criteria for Acquiring Data relevance to the NGDC, NESDIS, and NOAA missions current or potential scientific significance of the data ability of NGDC to provide a useful service for the data demand for the data immediate and long-term availability of the data elsewhere resources required for acquisition, archive, stewardship, and dissemination ability of NGDC to acquire required resources existing requirement to archive data of that type     NOTE: The criteria do not appear in order of priority. SOURCE: Background material prepared by NGDC for the November 13-15, 2002, committee meeting. Over the years, NGDC has worked with thousands of data providers, some of whom use NGDC as their primary distribution avenue. Examples include digital bathymetry data from NOAA’s National Ocean Service, unclassified and unrestricted geophysical maps from the Naval Oceanographic Office Geomagnetic Data Library,7 and solar data from the Solar Optical Observation Network. In addition, NGDC holds global and regional datasets compiled from organizations around the world. About 60 percent of NGDC datasets by data type partially or completely replicate data from other sources (Appendix C); their distribution through NGDC is a benefit to users since many of these replicated datasets are not easily accessible from the original source. In addition to acquiring data, NGDC transfers data to other organizations (see Chapter 1, “History of NGDC”). Most of the transferred data were managed by the Solid Earth Geophysics Division, which now includes ecological datasets and has a considerably more environmental focus than it has had historically. DATA USERS NGDC users include scientific, technical, and lay users in government agencies, universities, and private companies in the United States and abroad.8 Users are categorized by Web visitors and customers who purchase data. The latter are well defined; Web users are categorized by Internet 7   <http://www.ngdc.noaa.gov/seg/potfld/gdl/map_dds.html>. 8   NOAA Organizational Handbook, <http://www.rdc.noaa.gov/~ohb/E/EH0000.html>.

OCR for page 13
FIGURE 2.3 Profile of the top domains accessing the NGDC Web site in FY 2002. Apart from the U.S. government and educational communities, users (87 percent) are difficult to characterize by domain name. SOURCE: Data from the National Geophysical Data Center. domain names, including NOAA and other government agencies (dot-gov); universities and other educational institutions (dot-edu); foreign government, industry, and academia (country-specific domain name); private industry and publishing (dot-com, dot-org, and dot-net), and the general public (dot-com, dot-org, and dot-net). However, a substantial fraction of users (37 percent) cannot be classified by even these broad user categories. NGDC infers that scientists are no longer the dominant users of NGDC data because dot-com users far exceed dot-edu users (Figure 2.3). NOAA has no formal priorities for responding to requests from different user groups, although internal users are apparently given the highest priority. Staff in the National Weather Service and the Office of Oceanic and Atmospheric Research, for example, are considered the most important users of satellite data.9 Similarly, NGDC reports that it does not give priority to any user group,10 but NGDC staff told the committee that the center tries to meet the needs of its sophisticated users (i.e., scientists, 9   Presentation to the committee by Charles Wooldridge, chief of staff, NOAA’s National Environmental Satellite, Data, and Information Service, November 13, 2002. 10   Background material prepared by NGDC for the November 13-15, 2002, committee meeting.

OCR for page 13
government agencies). Whether the center addresses the needs of unskilled users depends on the ease of filling the data request. DATA ACCESS Users obtain data from NGDC in three ways: (1) by downloading data from the NGDC Web site, (2) by ordering data from NGDC’s online store, and (3) by requesting data from NGDC staff directly. Data downloaded from the NGDC Web site or FTP servers are free of charge, whereas information ordered from the online store may have a charge. In no case does NGDC charge more than the cost of preparing a product for dissemination and distributing it to the public (incremental cost).11 This practice complies with the guidelines set forth in Office of Management and Budget Circular A-130.12 With the rapid growth of Internet usage the total number of distinct hosts served online has increased exponentially over the past decade, with a doubling interval of about 29 months (Figure 2.4a). In 2002 nearly 800,000 distinct hosts were served. Trends in online access by particular user groups were not available because of inconsistencies in year-to-year tracking, but trends in offline user groups are shown in Figure 2.4b. The fraction of users requesting offline data dropped dramatically from fiscal year (FY) 1998 to FY 2002, with the fraction of foreign and general public users decreasing the most (Figure 2.4b). NGDC’s network connectivity appears to be sufficient to enable users to download the data volumes of interest. An OC-12 line was installed in August 2002, and the center has a 1,000-Mbps connection to organizations in Boulder (e.g., National Center for Atmospheric Research, National Institute of Standards and Technology, University of Colorado) and beyond. Several servers can be directed to fill user requests, a lesson NGDC learned when a National Public Radio interview led users to overwhelm the center’s Web server in 1995.13 11   Presentation to the committee by Mary Glackin, deputy assistant administrator for satellite and information services, August 13, 2002. 12   Federal data policy is set forth in the Paperwork Reduction Act (as amended in 1995) and specific guidelines to agencies are given in OMB Circular A-130, Management of Federal Information Resources (1994). Federal information is disseminated to the public on an unrestricted basis for no more than the incremental cost. See 44 U.S.C. § 3506(b)(1)(c) and <http://www.whitehouse.gov/omb/circulars/a130/a130.html>. 13   Personal communication from David Clark, assistant director, NGDC, February 28, 2003. The activity required to bring down the NGDC server at that time (2,563 hosts, 58,550 files, 1,071,826,408 bytes transferred) was trivial compared with the routine activity of NGDC servers today.

OCR for page 13
FIGURE 2.4a Quarterly online user statistics. From FY 1993 to FY 2003, the number of distinct hosts (blue bars) grew to 800,000. SOURCE: National Geophysical Data Center. FIGURE 2.4b Decrease in offline (e.g., phone calls, faxes) data and information requests from FY 1988 to FY 2002. The NGDC Web site was established in FY 1995. The figures for FY 1999 through FY 2002 are for data requests only. SOURCE: National Geophysical Data Center.

OCR for page 13
SERVICES NGDC reports that providing services is its main function.14 This is a commendable priority for any data center. NGDC services include ensuring data quality, developing tools for working with the data, providing background information for interpreting the data, answering questions from data users, and linking the center’s Web site to relevant holdings that reside elsewhere. Some services are provided to all users at no cost; others are designed for specific clients and are undertaken on a reimbursable basis. Each division has its own customer service group that is responsible for helping individual users find the data they need. The customer service staff members the committee talked with at the site visit seemed enthusiastic and dedicated, but they told the committee that their job was getting harder as the number of unsophisticated users increases and number of customer service staff members decreases (see “Management” below). Contact information for customer service is easily found on the NGDC Web site. The divisions are also responsible for providing background material to help users—especially less sophisticated users—learn about the data. There is no education and outreach program as such. Instead the three scientific divisions determine which general information, tutorials, or other resources to provide, which meetings to attend, which schools to visit, etc. For example, the MGG division offers a few educational resources (e.g., a description of the data types found in each subdiscipline, a tutorial on why seafloor data are important) and direct links to tutorials prepared by other organizations (e.g., “volcano expedition” by the Scripps Institution of Oceanography).15 The natural hazards datasets managed by the SEG division include an education section aimed at young students. The STP division offers background information on most subjects at about the middle school or possibly high-school level, but it does not have links to other educational resources. Many subject areas (e.g., geomagnetism, marine geophysics) also have a set of frequently asked questions. Finally, NGDC has two representatives on the NESDIS Outreach and Education Team. Some services are provided by NGDC staff in partnership with private vendors. Examples include interactive map and other geospatial services, which are being developed in partnership with ESRI, and a Web interface to the Blue Angel commercial metadata software package, which facilitates metadata updates. Metadata are key to understanding data quality, and NGDC reports that it complies with Federal Geographic Data Committee metadata standards, which document data quality, among other things. Assessing the 14   Background material prepared by NGDC for the August 13-14, 2002, committee meeting. 15   <http://www.ngdc.noaa.gov/mgg/education.html>.

OCR for page 13
quality of the data, working with data providers to correct errors, and creating metadata for each dataset is the responsibility of the NGDC staff member assigned to the dataset. Creating appropriate metadata is a difficult task, especially when there are many different sources of data. NGDC recognizes the importance of metadata and requests but does not always receive fully documented data. Data that are not sufficiently documented or quality controlled may be flagged or not placed into the database.16 Such data may be made available in the future, but NGDC is not anxious to acquire more undocumented datasets because of the cost and difficulty of managing such data. NGDC also participates in developing standards with other agencies (e.g., many of NGDC’s datasets are formatted for and made accessible through the Global Change Master Directory), professional societies, and international organizations. For example, NGDC developed a NESDIS-wide tool for relating metadata to international standards. Finally, NGDC staff members provide services to other organizations, mostly other divisions of NOAA or other government agencies. Such services range from distributing gravity data on behalf of the National Geodetic Survey to providing the archive for nonnavigational charts for the Office of Coast Survey17 to digitizing and distributing geomagnetic data for the National Imagery and Mapping Agency.18 Some of these activities are carried out on a reimbursable basis. ARCHIVE AND STEWARDSHIP The NGDC computer and storage facility employs a configuration of rack-mounted servers with high-bandwidth networks within NGDC, to other agencies in Boulder, and to the Internet. The computers functioning as Web servers and managing the storage and archiving facilities are running the Linux operating system and employ both disks and tape (robots) for storage. NGDC is currently migrating data from 8-mm tape and IBM 3480 cartridges to an IBM 3590 tape robot system. Another robotic system uses Linear Tape Open (LTO) tapes for backing up every computer in the center. NGDC provides data to Web users from data stored online (on disk) and nearline (on the tape robots). The use of multiple small computers as Web servers provides backup and also facilitates scaling to growing needs. All data users are served from the NGDC Boulder site—there are no mirrored sites for Internet 16   Presentation by NGDC staff at the November 13-15, 2002, committee meeting. 17   Committee teleconference with Charles Challstrom, director of NOAA’s National Geodetic Survey, and Maureen Kinney, deputy chief of NOAA’s Office of Coast Survey, November 15, 2002. 18   Background material prepared by NGDC for the August 13-14, 2002, committee meeting.

OCR for page 13
service, although NGDC’s Space Physics Interactive Data Resource services are mirrored in five countries. Copies of the digital data are kept at a backup facility approximately five miles away. NGDC staff told the committee that data are stored in National Archives and Records Administration-approved climate-controlled environments at both sites. About 10 percent of the archive tapes are randomly tested each year, as specified in NGDC’s performance measures (Appendix E).19 The test compares the volume of data recovered with the volume of data originally written to the tape. The oldest tapes in the 3480 archive are 10 years old. MANAGEMENT Budget NGDC receives three types of funding: (1) base funding, which is allocated from the National Environmental Satellite, Data, and Information Service (NESDIS); (2) funding from other parts of NOAA, particularly the NOAA-wide Environmental Services Data and Information Management Program; and (3) reimbursable work and data sales (e.g., datasets, custom data products, posters, slide sets). Base funding accounted for 47 percent ($4.3 million) of the NGDC budget in FY 2002, other NOAA sources were 29 percent ($2.6 million), and reimbursable projects were 15 percent ($1.3 million). In addition, the center received $0.79 million for one-time, nonrecurring expenses, such as hardware and software. Corrected for inflation, NGDC’s base funding has remained relatively flat for the last 10 years, and the total NGDC budget, which has risen and fallen, is now about at the same level that it was in 1992 (Figure 2.5). Some NGDC staff members believe budgets are flat because NOAA does not consider the center to address “mainstream” NOAA issues.20 The budget picture by division is more variable (Figure 2.6). The budgets for the MGG and SEG divisions peaked in 1995—driven by reimbursable work (MGG) and by funding from other NOAA sources (SEG)—and have declined since then. In contrast, the budgets for STP and the Information Services Division (ISD) have generally grown over the last decade. Base funding is allocated to the divisions by the NGDC director. NGDC staff told the committee that the Office of the Director21 and ISD are funded 19   Background material prepared by NGDC for the November 13-15, 2002, committee meeting. 20   Interviews with NGDC staff members, November 14, 2003. 21   Funding for the Office of the Director includes the director’s staff salaries and center-wide expenses, including performance bonuses, mailing, rent, utilities, phone, network, meeting exhibits, supplies, maintenance, National Snow and Ice Data Center (NSIDC) allocation, and travel.

OCR for page 13
FIGURE 2.5 Ten-year budget history (FY 1992-FY 2002), corrected for inflation, for NGDC. From bottom to top, base funding is shown in violet, funding from non-NESDIS parts of NOAA is shown in maroon, funding from reimbursable work and data sales is shown in yellow, and direct cite funding (one-time, nonrecurring expenses, such as hardware and software) is shown in aqua. SOURCE: Calculated from data provided by the National Geophysical Data Center. first, and the remaining resources are allocated among the three science divisions based on a fixed percentage of the federal salaries of the division. Base funding has been insufficient to cover base operations over the past 10 years.22 The divisions make up budget shortfalls or expand into new areas by seeking reimbursable and other NOAA funding. Staffing NGDC has 85 full- and part-time staff members (not counting vacancies), including 43 federal employees, 3 contractors, 2 NOAA Corps officers, 1 National Ocean Service (NOS) detailee, 22 University of Colorado Cooperative Institute for Research in Environmental Science (CIRES) employees, and 14 visiting scientists. In addition, there are 15 work-study students and interns. The federal workforce consists of 49 full-time equivalents (FTEs), although there are 51 authorized positions at the center. Each division has roughly the same number of FTEs, but the CIRES staff mem- 22   Base operations include labor, rent, utilities, and the NSIDC allocation. From budget information prepared by NGDC for the November 13-15, 2002, committee meeting.

OCR for page 13
bers are concentrated in the solid earth geophysics and solar-terrestrial physics divisions. Funding to pay for the contractors, CIRES employees, work-study students, and to some extent the federal employees comes from reimbursable projects. The visiting scientists—most of whom have retired from NGDC or from federal agencies in the Boulder area—NOAA Corps officers and the NOS detailee are not on the payroll. The average age of the federal employees is in the 50s and the average age of the total workforce is mid-40s.23 More than half the federal employees are eligible for some sort of retirement, including 16 percent eligible now and 48 percent eligible under discontinued service provisions. The aging of the workforce, which increases costs, coupled with flat budgets, have led to a decrease in the staff level at NGDC. The number of federal FTEs has dropped by 50 percent since 1992, while the volume of datasets and the number of Web accesses has grown exponentially (Figures 2.2 and 2.4a). Of course, an exponential growth in Web accesses does not translate into an exponential growth in the level of effort needed to manage the data. Seventy-five percent of the staff members have a bachelor’s degree or higher and 10 percent have a PhD degree. Most of the staff (mainly federal employees) managing the datasets or working with customers have a degree in physical science. Nearly all the scientific programming and a small fraction of the network and database administration is provided by CIRES employees. Most of these have engineering, physical science, or computer science backgrounds. Nevertheless, NGDC staff members told the committee that the center does not have the scientific expertise to manage all the diverse holdings. The center supplements its expertise with individuals in other NOAA laboratories (e.g., the Pacific Marine Environmental Laboratory provides expertise in hazards), the University of Colorado, and with visiting scientists. Organization As noted in Chapter 1 the organizational structure of NGDC is historical. The MGG and information services divisions have existed for about 20 years, and the SEG and STP divisions have existed since the creation of NGDC. A number of the staff members the committee interviewed identified problems with this organizational structure: Some activities are carried out in parallel by more than one division, leading to inefficiencies. 23   Presentation to the committee by Michael Loughridge, director, National Geophysical Data Center, August 13, 2002.

OCR for page 13
FIGURE 2.6 Ten-year budget history (FY 1992-2002), corrected for inflation, for the four NGDC divisions: Marine Geology and Geophysics (MGG), Solid Earth Geophysics (SEG), Solar-Terrestrial Physics (STP), and Information Services (ISD). Base funding is shown in violet, other NOAA funding (including direct cite) is shown in maroon; and funding from reimbursable work and data sales is shown in yellow. SOURCE: Calculated from data provided by the National Geophysical Data Center.

OCR for page 13

OCR for page 13
The division heads are able to act autonomously without considering the consequences of their actions on the budget. As a result the budget is not necessarily aligned with the core activities of the center. Turf battles between the divisions are common, generating morale problems.24 Breaking down the walls between the divisions was seen by some NGDC staff members as one of the most important steps the center should take. 24   Interviews with NGDC staff members, November 14, 2003.