In the broadest possible sense, the term “utility” in the name of our committee refers to all of the various applications of research data. Both integrity and accessibility are critical elements of utility, because research data must have integrity and be broadly accessible to be effectively utilized.
However, our focus in this report is on a specific aspect of utility that we refer to as data stewardship—the long-term preservation of data so as to ensure their continued value, sometimes for unanticipated uses. Stewardship goes beyond simply making data accessible. It implies preserving data and metadata so that they can be used by researchers in the same field and in fields other than that of the data’s creators. It implies the active curation and preservation of data over extended periods, which generally requires moving data from one storage platform to another. The term “stewardship” embodies a conception of research in which data are both an end product of research and a vital component of the research infrastructure.
As the examples presented throughout this report illustrate, research data are so varied that they can be described in their entirety only in the most general terms. Different research fields have very different approaches to the treatment of research data. Even at the level of individual research groups, expectations and demands can vary greatly from one investigator to another. This tremendous variety within the research community complicates the task of arriving at conclusions that apply across all fields of research. Research fields are also characterized by diversity in the origins of data and by the size and other characteristics of data collections.
There is great diversity in the ways data are gathered and analyzed both among and within disciplines. The sidebars in this and other chapters describe some of the diversity among disciplines, but individual disciplines also harbor great diversity in the ways data are gathered and analyzed. Data in physics, for example, range from small datasets generated by a “tabletop” experiment to the terabytes of data generated by an accelerator-based experiment. Databases in the social sciences may be freely available to all researchers in some fields and tightly restricted in other fields. Some fields within a discipline may have traditions of storing data for extended periods while others discard data relatively quickly. (In this report, “field” refers to an area of research smaller than a discipline. In many cases, a field can be roughly associated with the community of researchers who follow and publish articles in a relatively small collection