Data are a critical part of the research infrastructure, with an importance comparable to that of laboratories, research facilities, and computing devices and networks. Researchers need to access data quickly and from multiple sources. Data need to be annotated so that they can be used by researchers in a wide variety of fields. Data need to be migrated to successive storage platforms as technologies evolve. These observations lead to the committee’s third general principle.
Data Stewardship Principle: Research data should be retained to serve future uses. Data that may have long-term value should be documented, referenced, and indexed so that others can find and use them accurately and appropriately.
As with the two previous broad principles, this principle is not a recommendation but a general statement of intent that can guide specific actions. Also, as with the Data Access and Sharing Principle, the Data Stewardship Principle’s reference to future uses should be seen as limiting rather than broadening the scope of the principle. Decisions must continually be made about which data to save and which data to discard. General heuristics offer some guidance on these decisions.28 Observational data that cannot be re-collected are candidates for being archived indefinitely. Experimental data may or may not be saved depending on whether the experimental conditions can be reproduced precisely at minimal cost. In general, decisions about data retention require focused attention within each research group and field.
Many critical questions involving the retention of data are not directly addressed by the Data Stewardship Principle. For how long should data be retained? In what format and by whom? Who should pay for the preservation of data? These questions can be answered only by the researchers, research institutions, research sponsors, and policy makers who have responsibility for data stewardship.
As with ensuring the integrity and accessibility of data, researchers have unique responsibilities for data stewardship. As stated in an editorial for its issue on “petabyte science,” which appeared in September 2008, the journal Nature states that “Researchers need to be obliged to document and manage