The following HTML text is provided to enhance online
readability. Many aspects of typography translate only awkwardly to HTML.
Please use the page image
as the authoritative form to ensure accuracy.
Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age
a challenge. In some very high-priority areas, federal support on an “infinite mortgage” basis might be sustainable. In other cases, some combination of relay funding, user fees, endowments, and other mechanisms may need to be employed.
Companies and Journals
Opportunities for new public-private partnerships for data stewardship also exist. For example Google had announced a free service named Palimpsest that would make massive datasets accessible to researchers, but canceled the official launch of the project in late 2008.23 At the same time, Amazon has launched a service to host large public datasets, allowing researchers to upload their own data.24 Researchers would be charged fees for online data storage and data analysis capability. Many datasets have become so large that they are impossible to download over the Internet in a reasonable time.
Some journals play a role in maintaining the data submitted to support published articles. Journals are also participating in initiatives such as Portico, an archive of electronic scholarly literature.25 However, many journals lack the financial resources for maintaining databases for extended periods. And many journals face financial constraints, especially as they make the transition to electronic publication, which could threaten their ability to preserve and supply data either now or in the future.
ANNOTATING DATA FOR LONG-TERM USE
As noted in Chapter 2, raw data are typically of use only to the research group that generated them. To be useful to others, data must be accompanied by metadata that describe the content, structure, processing, access conditions, and source of the data in a form that permits the data to be used by researchers, educators, policy makers, and others. For computational data, for example, annotation might mean preserving the software used to generate the data along with a simulation of the hardware on which the software ran (or, in some cases, the hardware itself). For observational data, the documentation of the hardware, instrumental calibrations, preprocessing of data, and other circumstances of the observation are generally essential for using the data. In some cases, these metadata can be generated automatically, but annotation can be a labor-intensive process.