For more information, purchase options, and for other versions (if available) please visit
Contents | Data for Science and Society: The Second National Conference on Scientific and Technical Data | U.S. National Committee for CODATA | National Research Council Chapter 23: Promoting Data Access for a Broad User Base | Data for Science and Society: The Second National Conference on Scientific and Technical Data | U.S. National Committee for CODATA | National Research Council

U.S. National Committee for CODATA
National Research Council
Promoting Data Applications for Science and Society: Organizational and Management Issues


Promoting Data Access for a Broad User Base

Matthew Schwaller

     Martha Maiden was the invited speaker today. Unfortunately, she was unable to attend, but I'm glad that National Aeronautics and Space Administration (NASA) has the opportunity to make a presentation this afternoon. Martha's presentation was on broadening the user base. I would like to explain the NASA approach in that regard essentially through a discussion of the Earth Observation System (EOS) Data and Information System (EOSDIS), which is active in acquiring, processing, and providing Earth science data from a number of missions that NASA has launched.

     The EOS Terra AM1 spacecraft, which was launched in December 1999, has just begun to provide some data. I will have a few first-light engineering images from that EOS satellite to show you.

     I also should note that while EOSDIS is active and operating right now, there are a number of review panels cosponsored by the National Research Council (NRC) to analyze EOS and the EOSDIS. NASA has taken that advice and those comments, and has worked to make programmatic changes, leading we hope to changes in project implementation of data information systems.

     Finally, I want to provide you with a bit of a look at what the implementation of the EOSDIS would be in the near term and then, hopefully, somewhere in the future--what in the next 6 to 10 years NASA plans to do in terms of its data and information system management.

     The overall goal for NASA obviously is to maximize the usability of and access to the data that we already have. The strategy for the Earth Science Enterprise is (1) to be responsive to the missions and the scientific community that NASA serves and to the larger community of users; (2) to evolve from and leverage existing systems to provide linkage and transition to new technologies; and (3) to serve as a conduit for programmatic and project issues to other organizations, both nationally and internationally. The goal for NASA headquarters especially is to prepare data and information systems and services (DISS) requirements, provide program scientist functions, and develop metrics to make sure that the systems we develop meet the objectives of NASA.

     In terms of the near-term implementation of data and information system management at NASA, it is obviously utilizing EOSDIS, which provides access to terabytes of data and to millions of users every year. What's important for EOSDIS is that it not only is simply a data archive and access system, but also provides for mission operations, for the safe operations of the EOS suite of satellites.

     EOSDIS generates numerous data products, and data production is just beginning for the Terra mission. It will increase again in a quantitative step when the USPM is launched in December 2000. EOSDIS also provides an active archive of Earth science data for EOS missions, again generating products, providing active access to those products, providing human beings to answer questions, and helping with the analysis and understanding of the data.

     EOSDIS exists in a distributive framework. I notice that there are a number of personnel here from the Distributed Active Archive Centers (DAACs) that NASA supports. Of course, EOSDIS also provides interoperable links with other data archives of the U.S. science agencies and with international partners.

     Figure 23.1 is a context diagram for EOSDIS. Toward the left-hand side of the image, data are acquired from the NASA suite of satellites, from our international partners at other space agencies, and from U.S. agencies such as the National Oceanic and Atmospheric Administration (NOAA), where data products are provided for incorporation into the EOSDIS processing system. Data processing mission control is also part of EOSDIS, as I mentioned, as is an internal network that provides reliable and controlled access for the generation of data products from EOSDIS. The DAACs provide the gateway, the access to the outside community for EOSDIS data products and, ultimately, to various users, domestically and internationally.

Figure 23.1


     The existing proof that EOSDIS really works is in the engineering image, the first light images from the new EOSDIS Terra satellite. All images, including the MODIS (Moderate Resolution Imaging Spectroradiometer) images made on February 24 as the covers of the MODIS instrument were just being opened, will be released on the Web site.1 These are just engineering-quality right now. Public release of Terra science data is planned for April 2000.

     Although EOSDIS works, there have been a number of reviews throughout the years of the program and the project. We have had recommendations from NRC groups, from other science groups, and from our own user groups regarding various implementation and procedural improvements of EOSDIS.

     The basic paradigm for the development of EOSDIS is to start with the objectives and requirements to build a data information system, which drives the technology program. This has worked. Any approach has its pluses and minuses.

Figure 23.2


     A new paradigm that is being proposed for NASA is to work through the NASA science enterprise objectives into a technology program (see Figure 23.2). The technology program and the groups defining the objectives, including the science community and the managers, work into the technology program to generate technologies. There is a feedback between technology demonstrations and the objectives to reach a final resolution in the development of technologies over time. Ultimately the technologies that are developed are used to implement a data and information system.

     This approach has been adopted within the Earth Science Information Partners (ESIP) Federation. This group was motivated by an NRC committee, which recommended that the responsibility for product generation, publication, and user services should be transferred to a federation of partners selected through a competitive process open to all. The recommendation emphasized competition primarily, and NASA's response was to initiate a working prototype federation consisting of ESIPs, which were chosen by a competitive process. Twenty-four NASA-funded projects were selected.

     The projects took on the task of forming the federation among themselves. The membership of the federation was initially restricted to the first 24 selected, but it has since expanded to include the NASA DAACs. The federation itself is looking for ways to expand its constituency even beyond that group.

     There are three types of information partners in the federation (see Figure 23.3). Type 3 ESIPs provide data and information products and services beyond the reach of the global change research community. In many cases, these are state and local regional organizations that have been brought together in this quite heterogeneous group of information partners to participate in this NASA data and information management approach.

Figure 23.3


     At the last meeting in February, a formal organizational structure was developed for the federation. A constitution and bylaws were adopted based on the American Chemical Society bylaws. A number of internal working groups and standing committees were formed with the objectives of developing an internal working structure, providing the opportunity for expanding beyond the initial group of partners, and expanding the federation to a larger group of information providers and users.

     That's essentially what I wanted to say about the near-term planning for new approaches to data management and for broadening the user base. In terms of mid-term planning, there is an initiative going on right now that was chartered by Dr. Asrar, the associate administrator for the Earth Science Enterprise, to generate a plan for an Earth Science Enterprise "NewDISS," documenting how the Earth Science Enterprise can best make its data available in a timely manner during the first decade of 2000. The team was intended to be drawn together from the research groups and also from the NASA centers to look at the lessons learned in the U.S. DISS development and initial operations, and to propose a new approach to the development of these data management systems.

     The lesson learned that motivates the NewDISS is that information technology is now outpacing the time required to build large operational data systems and services such as EOSDIS. In fact, when the EOSDIS was first conceived and the initial requirements of development began, the World Wide Web simply did not exist. It was impossible at that time to really predict the kind of changes that would take place over the time it took to develop the EOSDIS. At that time, NASA was a major driver for many of the information system technologies that existed; now NASA finds itself in the position of stepping third or fourth in line behind the entertainment industry, banking, and even real estate. There need to be some new mechanisms and methodologies for developing and implementing such new systems.

     Another lesson learned is that data systems and services should leverage emerging information technology being developed in other sectors, such as the entertainment community and banking. A further lesson learned is that a single data system should not attempt to be all things to all users, which was essentially the motivation for the EOSDIS. A single, large design development effort stifles creativity and may not be appropriate for new applications and related design efforts. Finally, and perhaps most importantly, future information systems will be distributive and heterogeneous in nature. This gets back to the type of partners that we have in the ESIP Federation which run the gamut from small state, local, and regional data users that provide access to their own individual constituency, to a data and information center like the Goddard DAAC, which will have petabytes of data in its archives available for distribution.

     Some of the NewDISS principles are that future requirements will be driven by huge and high data volumes, the need for an ever-increasing variety of data products, and a diverse user base. In terms of the NewDISS, and responding to the Earth Science Enterprise constituency, we agreed that science questions and priorities must determine the design of the function of systems and services. Another principle is that technology change will occur rapidly. There has to be some method of introducing that this change into a data and information system as it is developed, deployed, and utilized. Competition, as I mentioned, is a key tool for the selection of components and infrastructure. Some form of long-term stewardship must also occur. This is something that NASA has been looking to its agency partners at NOAA and the U.S. Geological Survey to cooperate with and to come up with strategies for long-term archiving of EOS data. Finally, principal investigator (PI) processing and PI-led data management will be a significant part of future missions and science. It's obvious that during the development of EOSDIS, the kind of computing power that was available in an archive might now be found on the desktops of most of the investigators that participate in the EOS program.

     In terms of the kinds of elements of NewDISS that we are evaluating and looking at, competition and peer review again are overriding principles. Published open interface standards are needed to create a level playing field. When you have a large number of heterogeneous partners that have to work together, the key is that the interface standards are more important than defining particularly what each of the elements do.

     We hope that NASA will provide the leadership to identify the requirements and set priorities for this new paradigm. We hope also that NASA data will provide the capability to integrate these data services for developing metrics and feedback, to make sure that what is planned and what is implemented actually meet the objectives of the Earth Science Enterprise and of our user constituency.

Figure 23.4


     Figure 23.4 shows pictorially the evolution of the EOSDIS. We started with individual researchers with no data management coordination or a uniform data system, with various data repositories. We now have the EOSDIS model, where the same EOSDIS core system was delivered to a number of different institutions around the country, the DAACs, and they were made to work together with common interfaces, but also with common core software infrastructure. The federation is a group of heterogeneous data centers in which clustering takes place naturally among affinity groups. The final point in the evolution of this system that we can perceive over the next decade is in the NewDISS model that is being designed; there is a greater degree of definition of the interfaces between system elements, so that these elements can originate and operate autonomously but still cluster and coordinate as if they were a uniform data management entity.

1 See <> for additional information.

Copyright 2001 the National Academy of Sciences