The following HTML text is provided to enhance online
readability. Many aspects of typography translate only awkwardly to HTML.
Please use the page image
as the authoritative form to ensure accuracy.
Mathematical Challenges from Theoretical/Computational Chemistry
DARC, Daylight) or self-written software.
A key element to the success of such chemical information systems was the recognition that a two-dimensional chemical structure diagram can be treated as a labeled graph (Sussenguth, 1965). Many algorithms and concepts from graph theory (Harary, 1976) have been applied to chemical information problems, for example, the concepts of graph isomorphism to identify whether a particular compound is in a database and subgraph isomorphism to identify compounds that contain substructures of interest, algorithms to detect the smallest set of smallest rings as an aid to unique atom numbering heuristics, and subgraph (''clique-detection''—see Box 3.4) algorithms to detect the maximum common substructure in two molecules. These ideas have recently been extended to provide rapid searches of databases of tens to hundreds of thousands of molecules to find those that match a three-dimensional query—typically based on distances, angles, and torsions between points in the stored three-dimensional structure of the molecules (Borman, 1989; Bures et al., 1994). Generally there are between 4 and 20 distance and angle constraints to be matched in a three-dimensional query: the number of hits decreases with the number of constraints. Recently, commercial programs have been updated to include consideration of conformational flexibility.
Graph algorithms are used increasingly to solve similar problems in molecular modeling and computational chemistry (Martin et al., 1992). For example, to use a molecular mechanics program to optimize a molecular structure, each atom must be assigned an atom type based on its substructural environment. Chemical information tools are used to recognize such substructures. Modeled three-dimensional structures are stored in a chemical information database with the result that it is easy to find a prebuilt analogue of a new compound one wishes to build. Others have devised programs that build three-dimensional structures of molecules from their two-dimensional structures by finding the maximum common overlapping substructures in a database of three-dimensional structures and piecing these together (Wipke and Hahn, 1988; Leach et al., 1990).
Methods based on graph theory have also been used to find common three-dimensional features within a set of molecules (Crandell and Smith, 1983; Brint and Willett, 1987; Martin et al., 1993). In particular, the Bron-Kerbosh clique-detection algorithm has been found to be especially fast (Brint and Willett, 1987). Such common three-dimensional features might represent a pharmacophore, the set of three-dimensional features that determines whether or not a molecule will show a particular biological activity. For example, Figure 3.6 depicts three different molecules that activate the D2 dopamine receptor. The figure shows that although these molecules look different in two dimensions, in three dimensions they share the arrangement of a hydrogen bond donor, its projection to a receptor hydrogen bond acceptor, a positively charged nitrogen, and its projection to an anionic site on the receptor (Martin et al., 1993).
Two-dimensional structures describe the connectedness of the atoms in a molecule. The training of a chemist involves learning how to translate these two-dimensional pictures into chemical properties. Thus, an OH means one thing to a chemist, but something different to other folks. Since molecules are really three-dimensional (with added dimensions of properties), it is important to translate the two-dimensional structure into three dimensions for computer processing. People have used the same type of graph-processing algorithms to detect parts of molecules that have certain three-dimensional properties and to then glue the three-dimensional pieces together much as when using a Tinkertoy. The methods are expert systems in that they use other knowledge, not first principles. They operate by using graph-matching ideas.
Clique-detection methods are also used to propose docking orientations of small molecules to macromolecules (Kuntz et al., 1982; Kuhl et al., 1984; Smellie et al., 1991; Kuntz, 1992). The computer program DOCK searches databases of tens of thousands of molecules to find those that fit into a macromolecular binding site of known three-dimensional structure (Kuntz, 1992). A number of structurally novel enzyme inhibitors have been identified by this means.