tive statements, or matters of personal judgment, such as peer reviews, plans for future research, communications with colleagues, or personnel assessments. Of course, the line between research data and subjective judgments is sometimes difficult to draw, since subjective judgments can influence the structure ascribed to data. Nevertheless, a distinction exists, and we do not mean to imply that all of the information associated with research necessarily constitutes research data.

Metadata

As used in this report, the term “metadata” refers to descriptions of the content, context, and structure of information objects, including research data, at any level of aggregation (for example, a single data item, many items, or an entire database). According to the National Science Foundation report Cyber-infrastructure Vision for the 21st Century, metadata “summarize data content, context, structure, interrelationships, and provenance (information on history and origins). They add relevance and purpose to data, and enable the identification of similar data in different data collections.”19 Metadata make it easier for data users to find and utilize data, particularly if they are machine-readable.

Metadata are extremely diverse, ranging from written descriptions of instruments and software to the largely tacit knowledge on which the success of an investigation often depends. They are a critical part of the context needed to assess the integrity of data and use data accurately. Metadata are themselves data, since they consist of descriptive, factual information about data. Thus, conclusions about data in this report generally apply to metadata as well, although special considerations sometimes apply to metadata.

Until fairly recently, the term “metadata” was used primarily by the library community and by individual research communities.20 As digital data has become more important in a variety of disciplines and fields, the scope and value of metadata have grown, leading to the development of metadata standards. Metadata standards represent an agreed set of terminologies, definitions, and values to be provided for data in a given field or community.21

19

NSF Cyberinfrastructure Council (2005), NSF’s Cyberinfrastructure Vision for 21st Century Discovery, Arlington, VA, National Science Foundation.

20

Tony Gill, Anne J. Gilliland, Maureen Whalen, and Mary S. Woodley. 2008. Introduction to Metadata, Version 3.0. Los Angeles, CA: J. Paul Getty Trust. Available at www.getty.edu/research/conducting_research/standards/intrometadata/index.html.

21

U.S. Geological Survey, Coastal and Marine Biology InfoBank. USGS CMG “Formal Metadata” Definition. See walrus.wr.usgs.gov/infobank/programs/html/definition/fmeta.html. Accessed December 8, 2008.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement