For more information, purchase options, and for other versions (if available) please visit
Contents | Data for Science and Society: The Second National Conference on Scientific and Technical Data | U.S. National Committee for CODATA | National Research Council Chapter 4: Gateway to the Earth--A Framework for a National Decision Support System | Data for Science and Society: The Second National Conference on Scientific and Technical Data | U.S. National Committee for CODATA | National Research Council

U.S. National Committee for CODATA
National Research Council
Interdisciplinary and Intersectoral Data Applications: A Focus on Environmental Observations


Gateway to the Earth--A Framework for a National Decision Support System

Barbara Ryan

     It is a pleasure to be here this morning. I think you will see from this presentation that a number of the issues Dr. Baker mentioned in his opening remarks are also issues that are important to the U.S. Geological Survey (USGS). I will discuss a framework we are using at the USGS to integrate information collected by our organization over the last 121 years. The USGS was created in 1879, and our budgets aren't quite as large as the National Oceanic and Atmospheric Administration's (NOAA's)--our agency is funded at approximately $1 billion a year. We estimate that the total of our information assets, since the inception of the USGS, is valued at about $20 billion. As a country we need to think about further exploiting these information assets, which was something Dr. Baker alluded to in his opening remarks.

     Although this presentation will focus on what we are doing inside the USGS, the issues that we are facing internally are no different than the issues that of all you are facing in each of your respective organizations--and certainly the country as a whole--regardless of what sector you are talking about. Whether it is the public sector, private sector, universities, or nonprofit arenas--we are all facing these same issues.

     Why would we even want something like a framework for a National Decision Support System? You will see in this presentation that the issues we face as a country are going to require that earth and natural resource information be integrated. Yet, our cultures, our organizations, and our systems and processes result in disintegration of this information, rather than integration. So, I support and commend the entire CODATA effort in integrating scientific and technical information.

     Also, in terms of USGS, and this was actually mentioned briefly by Dr. Baker, when Congress gives us money, it doesn't explicitly give it to us to do a good job of integrating this information or even to manage the information. USGS, in fact, is appropriated money for describing and understanding Earth; minimizing the loss of life and property from natural disasters; managing water, biological, energy, and mineral resources; and enhancing and protecting the quality of life. So, it is much easier for USGS to go to Congress and seek appropriated dollars for its earthquake monitoring system or our stream-flow stations. However, within the dollars that are appropriated for stations--earthquake stations or stream gauging stations--we have to build in dollars that are required for actually integrating that information.

     The Gateway to the Earth initiative is trying to open natural science information to the world. What do we mean by this? Gateway to the Earth is a coherent set of interfaces that enables diverse users to find, get, and use natural science information in ways that are meaningful to them, not in ways that we as a federal institution say are meaningful. The point here is that the exploitation of these information assets will be doubled and redoubled if the users looking for this information find these data useful, rather than in prepackaged ways that we send the data out the door.

     The U.S. Geological Survey is organized according to programmatic divisions--a National Mapping Division, a Water Resources Division, a Biological Resources Division, and a Geologic Division. Figures 4.1 through 4.4 describe information assets in each of these program divisions.

Figure 4.1


Figure 4.2


     The Biological Resources Division is our most recent division. It was established when the National Biological Service was abolished. I think many of you in the audience may have heard about the National Biological Information Infrastructure and linkages to the universities through that initiative.

Figure 4.3


     Most of our seismic hazard work is done in the Geologic Division. I believe it was Michael Goodchild, of the University of Santa Barbara, who said that the U.S. Geological Survey does a very good job of integrating data horizontally, with wonderful national coverages, whether this involves geologic maps, surficial geology, soils, seismic hazards, or aquifer maps. The challenge to the institution now is to integrate these data vertically so that we can start superimposing water quality data on top of these framework data, and then superimposing exotic species data on top of that. The framework for a National Decision Support System needs to link earth and natural science information housed in a number of federal institutions with health and other environmental data. Can you imagine the power of a system in which you could start looking at cancer statistics superimposed on radon potential, derived from a bedrock geologic map?

Figure 4.4


     For the Water Resources Division, I want to draw particular attention to the middle row on the left-hand side of Figure 4.4, our real-time, stream-flow data. I want to point this out because we have talked a little bit already about the power of the World Wide Web. This is a data set that goes back to about 1888 and includes approximately 7000 stream-flow stations around the country, many operated in partnership with the National Weather Service; and approximately 4500 are telemetered. The ground stations have sensors linked to satellites with down-links to our offices, so data that are 15 minutes old can now be disseminated over the Web.

     Traditionally, these data were published at the end of every year and shared with water resource managers at either the state or the county level; so for 100 years, these managers were the primary recipients of these data. As soon as we started delivering this information over the Web, an entire new clientele was built around this data set. For example, anglers and recreationalists are now making decisions on whether to go out onto the river on a particular day based on the stream flow that day.

     So, again, when we talk about exploiting these data, we would have had no way to predict a new user group until we changed the delivery venue--that is, the World Wide Web rather than an annual report. The data didn't change; the data were still the same. We still have the historical clientele of the National Weather Service and state and local water resource managers, and we have now added recreationalists and other users of real-time data to the list.

     Historically, data sets within a particular program division were not even integrated within that division. These data sets have to be integrated across divisions and ultimately should include linkages to others' data. We need to have pointers to the National Institutes of Health's (NIH's) data, the Environmental Protection Agency's data, NOAA's data--not that we or any one agency wants to manage those data on behalf of the other institutions, but we ought to be able to point back and forth if you are interested in a particular point on Earth's surface.

     When we talk about a National Decision Support System, users need to be able to access data wherever they are in the country to make land-use or other resource management decisions and not be limited by who owns or manages the data. Initially we are trying to organize our own data and information. Users ought to be able to search and browse these data by subject, by geographical reference, by expertise, maybe by temporal parameters. There even may be non-geospatial data and information for access.

     The USGS also generates 4,000 or 5,000 reports every year. We ought to be able to tag those reports to a particular area or place on the Earth's surface. Users should be able to enter the USGS Web site and access the data by discipline (e.g., biologic or hydrologic information) or theme (e.g., hazard information), regardless of discipline. Users may want to access the data by geography, whether it is a place, ecological region, watershed, point on Earth's surface, or congressional district. Or they may want to come in by our traditional organizational unit, requesting all the data that NIH or NOAA has. Finally, a user may actually want all the data for a particular date in time, or for the last 100 years.

     Traditionally, as organizations, we haven't been able to serve up the data in such a way that users can get access to data based on a query of their choice rather than on how we choose. We have historically sent data out the door based on our own organizational structure and traditions. I believe that most of you in this room have done the same thing.

     Let us spend just a little bit of time pointing to two projects that are described in more detail in our contributed poster.1 One is the National Atlas Initiative and the other is the TerraServer. The National Atlas looks at national- and regional-scale data, while TerraServer looks at neighborhood-scale data. When we talk about future needs as one of the objectives of this conference, we have to look into the technology of advanced spatial search engines because of the sheer volume of data that exist. One has to be able to do meaningful queries or searches on these data.

     The National Atlas is a National Performance Review project with 18 federal agencies and partners. The objective was to cooperatively develop and market a suite of products that delivers different views of scientific, societal, and historical information; makes this information more accessible to individual Americans; provides the national framework of well-maintained data for use by others; and then links to current and real-time events. If you log on to the National Atlas Web site, you would be able to see Web pages of individual agencies. Right now, more than 700 Web sites are linked the National Atlas; 97,000 digital map layers have been downloaded. There are interactive mapping systems and tools so that users can superimpose some environmental data over national parklands. For example, users actually can get access to real-time stream-flow data through this site.

     The TerraServer project is a partnership with Microsoft. Microsoft was looking for a large data set in order to test its software delivery mechanisms. It had heard about the USGS digital orthophoto quadrangles (DOQs), which were built largely in cooperation with the Department of Agriculture. These DOQs look like aerial photographs, but they have been rectified to have maplike qualities. As such, a user could measure the length of a particular lake from a DOQ. This data set is 12 terabytes in size. The USGS entered into a cooperative research and development agreement with Microsoft to increase the visibility of this DOQ data set. Now users can look at a map of the country, click on a particular point on that map, drill down, and actually pull up, for example, Candlestick Park or a picture of your backyard. In January, USGS and Microsoft superimposed our digital topographic maps with the aerial imagery DOQs. Before, users would see only a black-and-white image, with no street names, contour levels, or names of streams.

     Right now, Microsoft gets between 5 and 6 million hits a day on the TerraServer site for this data set. We have something on the order of 300,000 to 400,000 Web pages for U.S. Geological Survey data and don't get 5 to 6 million hits a month.

     The challenge is to integrate these large data sets in a seamless fashion. We have to do this cooperatively because federal agencies and the private sector both have contributions to make. Whether we are facing land and water sustainability issues, species and habitat issues, land and water resource management, or restoration issues, a National Decision Support System to address these issues would benefit from integrated earth and natural science information.


1 See Hedy Rossmeissl, USGS, 2000. "Better Access Through Federal and Private Sector Data Integration," abstract and poster submitted to the U.S. National Committee for CODATA's Conference on Data for Science and Society, March 13-14, 2000, Washington, D.C.

Copyright 2001 the National Academy of Sciences