a challenge. In some very high-priority areas, federal support on an “infinite mortgage” basis might be sustainable. In other cases, some combination of relay funding, user fees, endowments, and other mechanisms may need to be employed.

Companies and Journals

Opportunities for new public-private partnerships for data stewardship also exist. For example Google had announced a free service named Palimpsest that would make massive datasets accessible to researchers, but canceled the official launch of the project in late 2008.23 At the same time, Amazon has launched a service to host large public datasets, allowing researchers to upload their own data.24 Researchers would be charged fees for online data storage and data analysis capability. Many datasets have become so large that they are impossible to download over the Internet in a reasonable time.

Some journals play a role in maintaining the data submitted to support published articles. Journals are also participating in initiatives such as Portico, an archive of electronic scholarly literature.25 However, many journals lack the financial resources for maintaining databases for extended periods. And many journals face financial constraints, especially as they make the transition to electronic publication, which could threaten their ability to preserve and supply data either now or in the future.

ANNOTATING DATA FOR LONG-TERM USE

As noted in Chapter 2, raw data are typically of use only to the research group that generated them. To be useful to others, data must be accompanied by metadata that describe the content, structure, processing, access conditions, and source of the data in a form that permits the data to be used by researchers, educators, policy makers, and others. For computational data, for example, annotation might mean preserving the software used to generate the data along with a simulation of the hardware on which the software ran (or, in some cases, the hardware itself). For observational data, the documentation of the hardware, instrumental calibrations, preprocessing of data, and other circumstances of the observation are generally essential for using the data. In some cases, these metadata can be generated automatically, but annotation can be a labor-intensive process.

23

Alexis Madrigal. 2008. Google shutters its science data service. Wired Science. December 18. Available at http://blog.wired.com/wiredscience/2008/12/googlescienceda.html.

24

Aaron Rowe. 2008. Amazon hosting, crunching massive public databases. Wired Science. December 5. Available at http://blog.wired.com/wiredscience/2008/12/massive-amounts.html.

25

See the Portico Web site: http://www.portico.org/.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement