Identity

In addition to a signature derivable from their content, data granules need names that identify their content (semantics) independent of the content’s representation (structure). The separation between identity and integrity is subtle but important, since although it is desirable for their relationship to be one to one, in practice it is likely to be one to many. It may simply not be possible for a lossless transformation of a granule (e.g., reformatting from HDF to TIFF) to preserve the same signature. Even if the relationship were always one to one, there is a general need to be able to refer to granules with names that make some sense to users, as opposed to the apparently random bit strings of digital signatures.

Implication: All data streams distributed by NASA and NOAA should have a well-defined granule naming convention. Where possible, services should be supported that map between granule names and signatures, so that names of granules may be recovered from their signatures.

Quality

Quality is simply a set of assertions about a data granule that provide sufficient ancillary/contextual information (metadata) about the granule to enable effective interpretation of its contents. Conventionally these assertions either are packaged with the granule (as embedded metadata) or are implicit in the granule’s parent data set (e.g., MODIS ocean color product). However, if a granule’s identity can be reliably established, then quality assertions can be provided by services that accept the granule’s name as input. This avoids the problem of constantly extending data formats to accommodate new forms of embedded metadata.

Implication: NASA and NOAA should provide services by which a data granule’s name may be used to recover all metadata relevant to that granule.

Lineage

Static assertions about a data granule are only part of the context required to interpret the data. Even more important is the lineage of the data: the graph of antecedent data granules and transformations from which the granule was produced. Lineage, the “pedigree” of a data granule, is often a key determinant of a granule’s fitness for a particular use. For example, retrospectively updating the calibration of a low-level data product invalidates any derived products. If the lineage of the derived products is available, then such broken dependencies will be obvious.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement