ANNEX 3
Producing Trustworthy Scientific Information in the EOS Information Federation
The most significant challenge to the EOS Information Federation would be to provide trustworthy information that scientists can integrate into their research and teaching. Creating well-designed and integrated approaches to product design, production environment, and interaction with users would allow the federation and its members to address and resolve issues for the quality and trustworthiness of data and information that are long standing but rarely sufficiently resolved.
Perhaps the most important step in producing trustworthy information is managing the transition from algorithm development to maintaining a production environment. One set of issues involves scientists who develop algorithms to create information from instrument data or from other data products. If the new information is useful and in high demand, then the scientist must become the manager of a production data facility or find a way to transfer the production responsibility to others. The federation should develop a template to guide this process as it recurs within EOS and the community.
The second set of issues concerns maintaining a comprehensive perspective on product marketing. The intended purposes of a product must be thought through as part of the process of designing the product and developing an approach to distribution and delivery. Once the product is flowing to users, its success in stimulating and facilitating scientific progress must be evaluated. If a product is not being used or is not contributing to scientific advance, then its design should be reevaluated.
The following considerations will be important in creating streams of trustworthy Earth observations and derivative information:
-
Development of process monitoring tools. To maintain active oversight without being overwhelmed by production operations, the algorithm developer must have tools to efficiently track the performance of the algorithm in the production environment. Thus, a central task is to create the theoretical and empirical envelope of expected values, for both instrument data and the results of the algorithm, so exceptions can trigger retention of diagnostic material for off-line analysis.
-
Maintaining product consistency as instruments or algorithms change. Each change should be accompanied by a theoretical and an empirical analysis to determine whether post hoc corrections can be applied to the product within acceptable error or whether reprocessing the entire record will be necessary.
-
Documenting the process. Inadequate documentation often prevents scientific data products from being used to their full potential, and adequate documentation is clearly essential for products monitoring long-term global change. The critical issue is whether future generations of scientists will be able to determine from archived and contemporary observations whether an apparent change is real or is an artifact of the observational and data-processing methods. Two concepts, still at embryonic stages within the Earth system science community, are likely to become a standard part of data operations.
The first is dataset publication—an institutionalized procedure to accomplish everything necessary to produce a high-quality, self-standing product that can be widely distributed with confidence and pride. Essential ingredients include a rigorous review of presentation and content prior to publication and a formal wrapper analogous to a book cover that uniquely identifies the product by key reference metadata, such as a title, subject, identifier, authors, and publisher. An emerging minimal standard for such metadata is the Dublin Core.11 Adhering to this standard would at once enable library-style searches and referencing by a very broad community on the same footing as books or journal articles. A verifiable sum check and institutional signature would be added to the wrapper to provide users with an assurance of authenticity, an important attribute of trustworthy information.
The second new idea is science concept modeling, a process that aims to formally capture in an object-oriented metadatabase a logically complete description of the science concepts and the assumed relationships among them that were actually invoked while converting the input data into the final product, including a declarative representation of the algorithms and the decision criteria for control functions. At the highest conceptual level of such a model, abstract scientific concepts, such as four-dimensional fields of physical variables, are defined by the mathematical equations and the names that link them heuristically to the scientific literature. Beneath this descriptive layer are links to a variety of finite representa-
tions of the concepts and their associated data structures, and criteria for establishing approximate equivalence among such representations (i.e., equality within permitted tolerances). The mathematical equations are replaced by finite transformation algorithms acting on those representations and are expressed in a programming language that minimizes hidden side effects. Such a science concept model would extend the object-oriented modeling used in the ECS Data Model12 to the metadata and scientific theory surrounding operational processing. By isolating how and where qualitative judgments were inserted by the algorithm developer or production scientist, such a model would aid the design of effective oversight procedures, which in turn could be described and recorded similarly.
-
Interactions of personnel. The operations of the EOS Information Federation would involve delicate relationships among professional research scientists and information systems specialists. It is more likely that a variety of advantageous role models would be developed, rather than a crisp formula for success. Regardless of the size of the data-processing operation, leaders and managers must be sensitive to both the importance of addressing critical underlying issues and to the personal motivations and professional aspirations of all groups of specialists.
-
Cost benefits should accompany the effort to produce trustworthy information.
A thoughtfully integrated processing environment leads to a higher-quality, better-documented product, with greater prospects of accurately recording the state of the Earth system and of surviving the sieve of time as a trusted source of useful long-term information. Immediate benefits come from more effective use of skilled people. Long-term benefits accrue to society through the existence of reliable information about the Earth, on which economic and policy decisions can be based. Though quality assurance appears to be costly, the alternative is the effective loss of expensive and irreplaceable data about the state of our planet.
NOTES
1. NRC (1995b).
2. NRC (1982).
3. NASA (1984).
4. NASA (1986).
5. Dutton (1989).
6. Ibid.
7. NRC (1995a).
8. NASA (1986).
9. Handy (1992).
10. NRC (1997).
11. Weibel et al. (1995).
12. Dopplick (1995).
REFERENCES AND BIBLIOGRAPHY
Dopplick, T. 1995. A Science User's Guide to the EOSDIS Core System (ECS) Development Process. Science Office, EOSDIS Core System Project, Technical Paper 160-TP-003-001
Dutton, J.A. 1989. The EOS data and information system: Concepts for design. IEEE Transactions on Geoscience and Remote Sensing 27:109-116.
Handy, C. 1992. Balancing corporate power: A new federalist paper. Harvard Business Review, (Nov.-Dec.):59-72
National Aeronautics and Space Administration. 1984. Earth Observing Science: Science and Mission Requirements Working Group Report. Technical Memorandum 86129. NASA, Washington, D.C.
National Aeronautics and Space Administration. 1986. Report of the EOS Data Panel. Technical Memorandum 87777. NASA, Washington, D.C.
National Research Council. 1982. Data Management and Computation, Volume 1, Issues and Recommendations. National Academy Press, Washington, D.C.
National Research Council. 1995a. A Review of the U.S. Global Change Research Program and NASA's Mission to Planet Earth/Earth Observing System. National Academy Press, Washington, D.C.
National Research Council. 1995b. Earth Observations from Space: History, Promise, Reality. National Academy Press, Washington, D.C.
National Research Council. 1997. Bits of Power: Issues in Global Access to Scientific Data. National Academy Press, Washington, D.C.
National Research Council. 1998. Toward an Earth Science Enterprise Federation: Results from a Workshop. National Academy Press, Washington, D.C.
Weibel S., and C. Lagoze. 1997. An element set to support resource discovery—the state of the Dublin Core: January 1997. International Journal on Digital Libraries 1(2):176-186.