TABLE 4-1 Long-term Data Reliability Issues

Entity at Risk

What Can Go Wrong?

Frequency

File

Corrupted media, disk failure

1 year

Disk

Simultaneous failure of two copies

5 years

System

Systematic errors in vendor software; malicious user; operator error that deletes multiple copies

15 years

Archive

Natural disaster, obsolescence of standards

50-100 years

SOURCE: Francine Berman, SDSC, presentation to the committee, September 2007.

The loss of valuable data is especially a problem in small research projects. Large projects often have data management plans and funds set aside for data storage and dissemination. Individual investigators, however, typically face much greater challenges in deciding which data may be useful in the future, in documenting those data thoroughly, and in finding funds from limited budgets for adequate data curation and preservation. Furthermore, although large projects can generate immense quantities of data, small research projects can themselves produce substantial quantities and varied kinds of data.

Some research fields that formerly consisted almost exclusively of small projects, such as molecular biology or ecology, have moved in part toward larger and more data-intensive programs. Some of these fields have groups that oversee the collection and annotation of data for use by others. The social sciences, for example, have long sponsored a specialized institution that has data stewardship as part of its mission (see Box 4-1). Other fields, despite generating much larger quantities of data, continue to be characterized by largely disparate and often inadequate data management efforts.

Not all research data should be preserved, but deciding what to save and what to discard becomes increasingly difficult as ever larger quantities of data are generated. Furthermore, there is a financial trade-off between creating new data and preserving old data. While the cost of storage per bit is declining rapidly, as described in Chapter 1, data stewardship requires a long-term commitment of attention and resources. As the secondary use of data becomes more important for fields and disciplines, they need to develop guidance for researchers, research sponsors, and research institutions on what data should be preserved, and whether new organizations or capabilities are needed to perform stewardship functions. A 2002 National Research Council report on geosciences data and collections is a useful example of how research fields can develop criteria for prioritizing the data and collections that should be preserved, and for making the trade-offs between creating new data and preserving existing data.7

7

National Research Council. 2002. Geoscience Data and Collections: National Resources in Peril. Washington, DC: The National Academies Press.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement