be transferred from one storage platform and software environment to another if they are not to be lost. Digital data also need to be annotated in sufficient detail that future researchers, sometimes in fields well removed from those of the data’s original creators, can both use the data and understand their limitations. Maintaining data collections for long-term use thus requires continued investment and planning, which can compete with expenditures for ongoing research.
In describing issues as broad as those covered in this report, it is essential to have clear understanding of the basic terms.
Despite the importance of research data, there exists no standard or widely accepted definition of exactly what research data are. For the purposes of this report, we have treated data as information used in scientific, engineering, and medical research as inputs to generate research conclusions (see Box 1-4 for definitions from other reports). This usage encompasses a wide variety of information. It includes textual information, numeric information, instrumental readouts, equations, statistics, images (whether fixed or moving), diagrams, and audio recordings. It includes raw data, processed data, published data, and archived data. It includes the data generated by experiments, by models and simulations, and by observations of natural phenomena at specific times and locations. It includes data gathered specifically for research as well as information gathered for other purposes that is then used in research. It includes data stored on a wide variety of media, including magnetic and optical media.17
Though our concerns in this report lie largely with the application of digital technologies in research, our examination of the issues is not limited to digital data. Nor does this report address just those areas traditionally considered “science.” It applies to all efforts to derive new knowledge about the physical, biological, or social worlds and thus encompasses research in engineering and in all of the physical, biological, behavioral, and social sciences. The conclusions in the report generally apply to quantitative data. However, many of our conclusions also apply to qualitative data, though we have not focused on the issues unique to qualitative data. Also, this report does not address research in the humanities, which lies outside the committee’s charge and expertise.
As a point of comparison, the Office of Management and Budget defines research data as “the recorded factual material commonly accepted in the scientific community as necessary to validate research findings, but not any of the following: preliminary analyses, drafts of scientific papers, plans for future research, peer reviews, or communications with colleagues. This “recorded” material excludes physical objects (e.g., laboratory samples).” See OMB Circular A-110 at http://www.whitehouse.gov/omb/circulars/a110/a110.html.