The following HTML text is provided to enhance online
readability. Many aspects of typography translate only awkwardly to HTML.
Please use the page image
as the authoritative form to ensure accuracy.
Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age
TABLE 4-1 Long-term Data Reliability Issues
Entity at Risk
What Can Go Wrong?
Corrupted media, disk failure
Simultaneous failure of two copies
Systematic errors in vendor software; malicious user; operator error that deletes multiple copies
Natural disaster, obsolescence of standards
SOURCE: Francine Berman, SDSC, presentation to the committee, September 2007.
The loss of valuable data is especially a problem in small research projects. Large projects often have data management plans and funds set aside for data storage and dissemination. Individual investigators, however, typically face much greater challenges in deciding which data may be useful in the future, in documenting those data thoroughly, and in finding funds from limited budgets for adequate data curation and preservation. Furthermore, although large projects can generate immense quantities of data, small research projects can themselves produce substantial quantities and varied kinds of data.
Some research fields that formerly consisted almost exclusively of small projects, such as molecular biology or ecology, have moved in part toward larger and more data-intensive programs. Some of these fields have groups that oversee the collection and annotation of data for use by others. The social sciences, for example, have long sponsored a specialized institution that has data stewardship as part of its mission (see Box 4-1). Other fields, despite generating much larger quantities of data, continue to be characterized by largely disparate and often inadequate data management efforts.
Not all research data should be preserved, but deciding what to save and what to discard becomes increasingly difficult as ever larger quantities of data are generated. Furthermore, there is a financial trade-off between creating new data and preserving old data. While the cost of storage per bit is declining rapidly, as described in Chapter 1, data stewardship requires a long-term commitment of attention and resources. As the secondary use of data becomes more important for fields and disciplines, they need to develop guidance for researchers, research sponsors, and research institutions on what data should be preserved, and whether new organizations or capabilities are needed to perform stewardship functions. A 2002 National Research Council report on geosciences data and collections is a useful example of how research fields can develop criteria for prioritizing the data and collections that should be preserved, and for making the trade-offs between creating new data and preserving existing data.7
National Research Council. 2002. Geoscience Data and Collections: National Resources in Peril. Washington, DC: The National Academies Press.