• Development of process monitoring tools. To maintain active oversight without being overwhelmed by production operations, the algorithm developer must have tools to efficiently track the performance of the algorithm in the production environment. Thus, a central task is to create the theoretical and empirical envelope of expected values, for both instrument data and the results of the algorithm, so exceptions can trigger retention of diagnostic material for off-line analysis.

  • Maintaining product consistency as instruments or algorithms change. Each change should be accompanied by a theoretical and an empirical analysis to determine whether post hoc corrections can be applied to the product within acceptable error or whether reprocessing the entire record will be necessary.

  • Documenting the process. Inadequate documentation often prevents scientific data products from being used to their full potential, and adequate documentation is clearly essential for products monitoring long-term global change. The critical issue is whether future generations of scientists will be able to determine from archived and contemporary observations whether an apparent change is real or is an artifact of the observational and data-processing methods. Two concepts, still at embryonic stages within the Earth system science community, are likely to become a standard part of data operations.

The first is dataset publication—an institutionalized procedure to accomplish everything necessary to produce a high-quality, self-standing product that can be widely distributed with confidence and pride. Essential ingredients include a rigorous review of presentation and content prior to publication and a formal wrapper analogous to a book cover that uniquely identifies the product by key reference metadata, such as a title, subject, identifier, authors, and publisher. An emerging minimal standard for such metadata is the Dublin Core.11 Adhering to this standard would at once enable library-style searches and referencing by a very broad community on the same footing as books or journal articles. A verifiable sum check and institutional signature would be added to the wrapper to provide users with an assurance of authenticity, an important attribute of trustworthy information.

The second new idea is science concept modeling, a process that aims to formally capture in an object-oriented metadatabase a logically complete description of the science concepts and the assumed relationships among them that were actually invoked while converting the input data into the final product, including a declarative representation of the algorithms and the decision criteria for control functions. At the highest conceptual level of such a model, abstract scientific concepts, such as four-dimensional fields of physical variables, are defined by the mathematical equations and the names that link them heuristically to the scientific literature. Beneath this descriptive layer are links to a variety of finite representa-

The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement