BOX 1-5

Three Types of Data Collections

The National Science Board (NSB) has organized data collections into the three categories described below. In addition, the NSB defined “collection” to refer “not only to stored data but also to the infrastructure, organizations, and individuals necessary to preserve access to the data.”


Research data collections are the products of one or more focused research projects and typically contain data that are subject to limited processing or curation. They may or may not conform to community standards, such as standards for file formats, metadata structure, and content access policies. Quite often, applicable standards may be nonexistent or rudimentary because the data types are novel and the size of the user community [is] small. Research collections may vary greatly in size but are intended to serve a specific group, often limited to immediate participants. There may be no intention to preserve the collection beyond the end of a project. One reason for this is funding. These collections are supported by relatively small budgets, often through research grants funding a specific project. (Example: The Fluxes Over Snow Surfaces Project, http://www.atd.ucar.edu/rtf/projects/FLOSS.)


Resource or community data collections serve a single science or engineering community. These digital collections often establish community-level standards either by selecting from among preexisting standards or by bringing the community together to develop new standards where they are absent or inadequate. The budgets for resource or community data collections are intermediate in size and generally are provided through direct funding from agencies. Because of changes in agency priorities, it is often difficult to anticipate how long a resource or community data collection will be maintained. (Example: The Arabidopsis Information Resource, http://www.arabidopsis.org.)


Reference data collections are intended to serve large segments of the research and education community. Characteristic features of this category of digital collections are a broad scope and a diverse set of user communities including scientists, students, and educators from a wide variety of disciplinary, institutional, and geographical settings. In these circumstances, conformance to robust, well-established, and comprehensive standards is essential, and the selection of standards by reference collections often has the effect of creating a universal standard. Budgets supporting reference collections are often large, reflecting the scope of the collection and breadth of impact. Typically, the budgets come from multiple sources and are in the form of direct, long-term support, and the expectation is that these collections will be maintained indefinitely. (Example: Protein Data Bank, http://www.pdb.org.)


SOURCE: National Science Board (2005), Long-Lived Data Collections: Enabling Research and Education in the 21st Century, Arlington, VA, National Science Foundation.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement