Mathematical Challenges from Theoretical/Computational Chemistry


CHAPTER 4 continued

Statistical Analyses of Families of Structures

The diversity of chemical structures is one of the hallmarks of modern experimental chemistry. The problems of diversity and similarity are most prevalent in the study of biological molecules for which very different sequences--that is, fundamental str uctures--give rise to molecules that have very similar overall three-dimensional structures and often very similar functional properties. The Human Genome Project is devoted to characterizing the myriad of proteins encoded in humans, but a still larger u niverse of proteins exists in other living beings. Furthermore, it is easy to understand that the existing proteins are just a small subset of all possible random heteropolymers. The same type of combinatorial complexity exists for many other classes of molecules. In nature, we are familiar with the complexity of alkaloids or terpenes.

More and more molecular scientists are trying to understand how to use the information from a variety of different molecules to understand the structure and function of a given one. It is now becoming possible, by using combinatorial syntheses in the l aboratory, to make 10 million variants of a single protein or 10,000 covalently connected frameworks such as those in a natural product. The most well known of these techniques is that employed to make catalytic antibodies, but many other approaches are p ossible. A variety of mathematical problems arise when one tries to make use of these resulting longitudinal data about molecular systems.

For naturally occurring biomolecules, one of the most important approaches is to understand the evolutionary relationships between macromolecules. This study of the evolutionary relationship between biomolecules has given rise to a variety of mathemati cal questions in probability theory and sequence analysis. Biological macromolecules can be related to each other by various similarity measures, and at least in simple models of molecular evolution, these similarity measures give rise to an ultrametric o rganization of the proteins. A good deal of work has gone into developing algorithms that take the known sequences and infer from these a parsimonious model of their biological descent.

Similar analyses based on the three-dimensional structure of molecules also present ongoing mathematical problems. At the moment, the use of evolutionary similarity to infer three-dimensional structure is a common and very important algorithm for peopl e who have practical interests in the prediction of biomolecular structure. Use of the theory of spin glasses to characterize random heteropolymers has also allowed the phrasing of interesting questions such as the probability in a single experiment of o btaining a foldable protein molecule. This is a question in which the statistics of low-lying energy states on the surface and the statistics of sequences must be analyzed jointly and related to each other. Experiments of this type have recently been don e and seem to agree in many respects with the results of theory, but there are many questions of physicochemical principle and of mathematical analysis for this theory.

An emerging technology is the use of multiple rounds of mutation, recombination, and selection to obtain interesting macromolecules or combinatorial covalent structures. Very little is understood as yet about the mathematical constraints on finding mol ecules in this way, but the mathematics of such artificial evolution approaches should be quite challenging. Understanding the navigational problems in a high-dimensional sequence space may also have great relevance to understanding natural evolution. Is it punctuated or is it gradual as many have claimed in the past? Artificial evolution approaches may obviate the need to completely understand and design biological macromolecules, but there will be a large number of interesting mathematical problems co nnected with the design of efficient artificial evolution experiments.


Previous Section | HTML Home Page | Next Section

NAS Home Page | NAP Home Page | Reading Room | Report Home Page