Box 1.1 Definitions of Key Terms Used in This Report

Data are facts, numbers, letters, and symbols that describe an object, idea, condition, situation, or other factors. A data element is the smallest unit of information to which reference is made. This report is concerned primarily with digital data, although a large portion of raw data is recorded as analog data, which also can be digitized. For purposes of this report the terms data and facts are treated interchangeably, as is the case in legal contexts.

Data in a database may be characterized as predominantly word oriented (e.g., as in a text, bibliography, directory, dictionary), numeric (e.g., properties, statistics, experimental values), image (e.g., fixed or moving video, such as a film of microbes under magnification or time-lapse photography of a flower opening), or sound (e.g., a sound recording of a tornado or a fire). Word oriented, numeric, image, and sound databases are processed by different types of software (text or word processing, data processing, image processing, and sound processing).

Data can also be referred to as raw, processed, or verified. Raw data consist of original observations, such as those collected by satellite and beamed back to Earth, or initial experimental results, such as laboratory test data. After they are collected, raw data can be processed or refined in many different ways. Processing usually makes data more usable, ordered, or simplified, thus increasing their intelligibility. Verified data are data whose quality and accuracy have been assured. For experimental results, verification signifies that the data have been shown to be reproducible in a test or experiment that repeats the original. For observational data, verification means that the data have been compared with other data whose quality is known or that the instrument with which they were obtained has been properly calibrated and tested.

Digital data may be processed or stored on various types of media, including magnetic (RAM, hard drive, diskettes, tapes) and optical (CD-ROM, DVD) media. Data can be made accessible either through portable media or, increasingly, online.

A database is a collection of related data and information—generally numeric, word oriented, sound, and/or image—organized to permit search and retrieval or processing and reorganizing. A data set is a collection of similar and related data records or data points. Many databases are a resource from which specific data points, facts, or textual information are extracted for use in building a derivative database or data product. A derivative database, also called a value-added or transformative database, is built from one or more preexisting database(s) and frequently includes extractions from multiple databases, as well as original data.

A database producer acquires data in raw, reduced, or otherwise processed from—either directly, through experimentation or observation, or indirectly, from one or more organizations or preexisting databases—for inclusion in a database that the database producer is generating. Such database creators—sometimes known as database publishers or originators but for the purpose of this report referred to as database producers—traditionally are the rights holders of the intellectual property rights in the databases.

The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement