Skip to main content

Currently Skimming:

4 Promoting the Stewardship of Research Data
Pages 95-114

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 95...
... Secondary use of data is of growing importance in an increasing number of fields. In astronomy, for example, the Sloan Digital Sky Survey, a project for which the open provision of both processed and raw data over the Internet is central, is the facility responsible for the most high-impact papers in astronomy in recent years. Repositories of genomic data, such as the Trace Archive of the National Center for Biotechnology Information (NCBI)
From page 96...
... One expert raises the possibility of a "digital dark age," in which large amounts of digital data stored in a variety of proprietary file formats are permanently lost.  Digital media also decay over time, a phenomenon known as "bit rot." Many old magnetic tapes molder in boxes and are now essentially worthless.
From page 97...
... As generations of applications, data formats, operating systems, and digital archives interoperate and succeed one another, multiple locations and systems for data access and sharing might be engaged to preserve a given data collection. Ensuring that archived data are not altered due to human error or intentional mischief is an additional challenge for large data repositories, particularly those utilizing automated processes to ingest large datasets. Table 4-1 shows the various risks to long-term digital data reliability and the time frames in which they might be expected to occur.
From page 98...
... Individual investigators, however, typically face much greater challenges in deciding which data may be useful in the future, in documenting those data thoroughly, and in finding funds from limited budgets for adequate data curation and preservation. Furthermore, although large projects can generate immense quantities of data, small research projects can themselves produce substantial quantities and varied kinds of data.
From page 99...
... - and NIH-sponsored projects that promised to create social science data have not followed through. Investigators typically have little expertise in data annotation or long-term database management.
From page 100...
... Part of a global network of social science data archives, ICPSR is the world's largest archive of digital social science data and is hosted by the University of Michigan.a It is supported by dues from more than 600 member institutions, plus support from government agencies and other research sponsors. ICPSR, which currently houses 7,500 studies and 500,000 data files, has rec ommended guidelines, but not requirements, for submission of data.
From page 101...
... Over time, ICPSR's archival model has proven to be an effective approach to ensuring data integrity, facilitating data sharing, and providing data stewardship across a range of fields and many institutions. Because many social science data are used for secondary analysis, and because the social sciences reward academic pro­ducers of general-purpose data, universities see the value of ICPSR, which makes the member ship funding model sustainable.
From page 102...
... NSF (n=938) FIGURE 4-2  ICPSR LEADS project findings of NSF- and NIH-sponsored awards that created social science data Figure 4-2.eps NOTE: This figure reflects survey results through November 2008 of principal investigators of 1,599 NIH and NSF awards that indicated social science data creation.
From page 103...
... The size and complexity of digital datasets can overwhelm small institutional libraries or archives, which traditionally have dealt with a ­ nalog textual information. Yet new partnerships and approaches hold the promise of overcoming many of these barriers.
From page 104...
... NOAA must deal with the challenges of an increasing volume and diversity of its data holdings -- which include everything from satellite images of clouds to the stomach contents of fish -- as well as a large number of users. A recent National Research Council report offered nine general principles for effective environmental data management, along with a number of guidelines on how the principles could be applied at NOAA.18 The principles and guidelines developed for NOAA are consistent with the principles laid out in this study, and represent an example of how they apply to an agency with significant data management responsibilities in the earth sciences.
From page 105...
... SDSC utilizes multiple levels of data reliability and data integrity mechanisms. Research communities and data centers such as SDSC need to develop common understanding on key issues such as trust, expectations, incentives/­ penalties, and privacy/security/confidentiality.
From page 106...
... For example Google had announced a free service named ­Palimpsest that would make massive datasets accessible to researchers, but canceled the official launch of the project in late 2008.23 At the same time, Amazon has launched a service to host large public datasets, allowing researchers to upload their own data.24 Researchers would be charged fees for online data storage and data analysis capability. Many datasets have become so large that they are impossible to download over the Internet in a reasonable time.
From page 107...
... In late 2007, the Blue Ribbon Task Force on Sustainable Digital Preservation and Access was created to "analyze previous and current models for sustainable digital preservation, and identify current best practices among existing collections, repositories and analogous enterprises."27 The Task Force is developing recommendations and a research agenda aimed at catalyzing and supporting sustainable economic models for stewardship of digital information, including research data. The Task Force is supported by NSF, the Andrew W
From page 108...
... In addition, the working group calls for the creation of a subcommittee on digital scientific data preservation, access, and interoperability under the National Science and Technology Council that would track and recommend policies on such issues as national and international coordination; education and workforce development; interoperability; data systems implementation and deployment; and data assurance, quality, discovery, and dissemination. At the nongovernmental level, in fall 2008 the National Research ­ Council established a new Board on Research Data and Information.
From page 109...
... Who should pay for the preservation of data? These questions can be answered only by the researchers, research institutions, research sponsors, and policy makers who have responsibility for data stewardship.
From page 110...
... Data stewardship must start at the beginning of a project, not partway through or at the end of a project. Recommendation 9: Researchers should establish data management plans at the beginning of each research project that include appropriate provisions for the stewardship of research data.
From page 111...
... Recommendation 10: As part of the development of standards for the management of digital data, research fields should develop guidelines for assessing the data being produced in that field and establish criteria for researchers about which data should be retained. As research data become more voluminous, complex, and valuable, a need may arise to formalize the process of making data management decisions within research fields.
From page 112...
... RESPONSIBILITIES OF RESEARCH INSTITUTIONS, RESEARCH SPONSORS, AND JOURNALS Researchers need a supportive institutional environment to fulfill their responsibilities toward the stewardship of data. Recommendation 11: Research institutions and research sponsors should study the needs for data stewardship by the researchers they employ and support.
From page 113...
... This committee was not in a position to comprehensively evaluate whether the current, largely decentralized, approach is likely to meet the needs of the research enterprise. The relevant communities are actively engaged in addressing these issues, through groups such as the Blue Ribbon Task Force for Sustainable Digital Preservation and Access mentioned earlier.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.