4.2.6 Data Mediators/Middleware

In the middleware approach, an intermediate processing layer (a “mediator”) decouples the underlying heterogeneous, distributed data sources and the client layer of end users and applications.11 The mediator layer (i.e., the middleware) performs the core functions of data transformation and integration, and communicates with the database “wrappers” and the user application layer. (A “wrapper” is a software component associated with an underlying data source that is generally used to handle the tasks of access to specified data sources, extraction and retrieval of selected data, and translation of source data formats into a common data model designed for the integration system.)

The common model for data derived from the underlying data sources is the responsibility of the mediator. This model must be sufficiently rich to accommodate various data formats of existing biological data sources, which may include unstructured text files, semistructured XML and HTML files, and structured relational, object-oriented, and nested complex data models. In addition, the internal data model must facilitate the structuring of integrated biological objects to present to the user application layer. Finally, the mediator also provides services such as filtering, managing metadata, and resolving semantic inconsistency in source databases.

There are many flavors of mediator approaches in life science domains. IBM’s DiscoveryLink for the life sciences is one of the best known.12 The Kleisli system provides an internal nested complex data model and a high-power query and transformation language for data integration.13 K2 shares many design principles with Kleisli in supporting a complex data model, but adopts more object-oriented features.14 OPM supports a rich object model and a global schema for data integration.15 TAMBIS provides a global ontology (see Section 4.2.8 on ontologies) to facilitate queries across multiple data sources.16 TSIMMIS is a mediation system for information integration with its own data model (Object-Exchange Model, OEM) and query language.17

4.2.7 Databases as Models

A natural progression for databases established to meet the needs and interests of specialized communities, such as research on cell signaling pathways or programmed cell death, is the evolution of

11  

G. Wiederhold, “Mediators in the Architecture of Future Information Systems,” IEEE Computer 25(3):38-49, 1992; G. Wiederhold and M. Genesereth, “The Conceptual Basis for Mediation Services,” IEEE Expert, Intelligent Systems and Their Applications 12(5):38-47, 1997. (Both cited in Chung and Wooley, 2003.)

12  

L.M. Haas et al., “DiscoveryLink: A System for Integrated access to Life Sciences Data Sources,” IBM Systems Journal 40(2):489-511, 2001.

13  

S. Davidson, C. Overton, V. Tannen, and L. Wong, “BioKleisli: A Digital Library for Biomedical Researchers,” International Journal of Digital Libraries 1(1):36-53, 1997; L. Wong, “Kleisli, a Functional Query System,” Journal of Functional Programming 10(1):19-56, 2000. (Both cited in Chung and Wooley, 2003.)

14  

J. Crabtree, S. Harker, and V. Tannen, “The Information Integration System K2,” available at http://db.cis.upenn.edu/K2/K2.doc; S.B. Davidson, J. Crabtree, B.P. Brunk, J. Schug, V. Tannen, G.C. Overton, and C.J. Stoeckert, Jr., “K2/Kleisli and GUS: Experiments in Integrated Access to Genomic Data Sources,” IBM Systems Journal 40(2):489-511, 2001. (Both cited in Chung and Wooley, 2003.)

15  

I-M.A. Chen and V.M. Markowitz, “An Overview of the Object-Protocol Model (OPM) and OPM Data Management Tools,” Information Systems 20(5):393-418, 1995; I-M.A. Chen, A.S. Kosky, V.M. Markowitz, and E. Szeto, “Constructing and Maintaining Scientific Database Views in the Framework of the Object-Protocol Model,” Proceedings of the Ninth International Conference on Scientific and Statistical Database Management, Institute of Electrical and Electronic Engineers, Inc., New York, 1997, pp. 237–248. (Cited in Chung and Wooley, 2003.)

16  

N.W. Paton, R. Stevens, P. Baker, C.A. Goble, S. Bechhofer, and A. Brass, “Query Processing in the TAMBIS Bioinformatics Source Integration System,” Proceedings of the 11th International Conference on Scientific and Statistical Database Management, IEEE, New York 1999, pp. 138-147; R. Stevens, P. Baker, S. Bechhofer, G. Ng, A. Jacoby, N.W. Paton, C.A. Goble, and A. Brass, “TAMBIS: Transparent Access to Multiple Bioinformatics Information Sources,” Bioinformatics 16(2):184-186, 2000. (Both cited in Chung and Wooley, 2003.)

17  

Y. Papakonstantinou, H. Garcia-Molina, and J. Widom, “Object Exchange Across Heterogeneous Information Sources,” Proceedings of the IEEE Conference on Data Engineering, IEEE, New York, 1995, pp. 251-260. (Cited in Chung and Wooley, 2003.)



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement