the awardees would be helping to demonstrate the feasibility of long-term digital stewardship.

In other fields where federal agencies themselves are not as central to data collection and stewardship efforts, federal capabilities may be more limited. In these cases, nonfederal research sponsors need the support and active participation of research institutions and communities if they are to help ensure the long-term preservation and availability of data. Also, sponsors may be more interested in the initial development of data collections than in maintaining those collections over long periods as an open-ended commitment.

The federal government can also foster data exchange among research institutions and companies in specific, highly applied areas. For example, the Government-Industry Data Exchange Program (GIDEP) is a joint activity of the military services, other federal agencies such as the National Aeronautics and Space Administration and the Department of Energy, defense and space contractors such as Lockheed Martin, Boeing, and Raytheon, and even the Canadian Department of National Defence.21 GIDEP has existed since the 1950s, and is a mechanism for sharing research, development, design, testing, acquisition, and logistics information among government and industry participants in order to reduce or eliminate expenditures.

In recent years, other organizations and networks, including data centers, have taken on important roles in the stewardship of research data. The San Diego Supercomputer Center (SDSC), managed by the University of California at San Diego, is a high-performance computing center and a national data hosting facility, providing an integrated set of data services (access, manipulation, management, and storage).22 SDSC is a data services provider for the Protein Data Bank and the National Virtual Observatory (NVO). For NVO, SDSC stores two replicants of the Sloan Digital Sky Survey as well as other sky surveys, over 88 terabytes in all. SDSC DataCentral also hosts over 100 community data collections, including Molecular Dynamics Simulation Data (chemistry), Human Brain Dynamics Resource data (neuroscience), and Employment Responses to Global Markets data (economics).

SDSC’s agreements with research communities vary substantially with regard to standards, sharing, formats and ontologies, usage scenarios, and intellectual property. SDSC utilizes multiple levels of data reliability and data integrity mechanisms.

Research communities and data centers such as SDSC need to develop common understanding on key issues such as trust, expectations, incentives/penalties, and privacy/security/confidentiality. Good long-term stewardship requires resources for increased capacity, up-to-date reliability tools, and skilled people. Developing sustainable economic models for long-term stewardship is



Francine Berman, Director, SDSC, presentation to the committee, September 17, 2007.

The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement