Calculating the Secrets of Life

Applications of the Mathematical Sciences in Molecular Biology

Eric S. Lander and
Michael S. Waterman, Editors

Committee on the Mathematical Sciences in
Genome and Protein Structure Research

Board on Mathematical Sciences

Commission on Physical Sciences, Mathematics, and Applications

National Research Council

NATIONAL ACADEMY PRESS
Washington, D.C. 1995



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page R1
Calculating the Secrets of Life Applications of the Mathematical Sciences in Molecular Biology Eric S. Lander and Michael S. Waterman, Editors Committee on the Mathematical Sciences in Genome and Protein Structure Research Board on Mathematical Sciences Commission on Physical Sciences, Mathematics, and Applications National Research Council NATIONAL ACADEMY PRESS Washington, D.C. 1995

OCR for page R1
Page ii National Academy Press · 2101 Constitution Avenue, N.W. . Washington, D.C. 20418 NOTICE: The project that is the subject of this report was approved by the Governing Board of the National Research Council, whose members are drawn from the councils of the National Academy of Sciences, the National Academy of Engineering, and the Institute of Medicine. The members of the committee responsible for the report were chosen for their special competences and with regard for appropriate balance. This report has been reviewed by a group other than the authors according to procedures approved by a Report Review Committee consisting of members of the National Academy of Sciences, the National Academy of Engineering, and the Institute of Medicine. The National Research Council established the Board on Mathematical Sciences in 1984. The objectives of the Board are to maintain awareness and active concern for the health of the mathematical sciences and to serve as the focal point in the National Research Council for issues connected with the mathematical sciences. The Board holds symposia and workshops and prepares reports on emerging issues and areas of research and education, conducts studies for federal agencies, and maintains liaison with the mathematical sciences communities, academia, professional societies, and industry. Support for this project was provided by the Fondation des Treilles, Alfred P. Sloan Foundation, National Science Foundation, Department of Energy, and National Library of Medicine. Library of Congress Cataloging-in-Publication Data Calculating the secrets of life : applications of the mathematical sciences in molecular biology/Eric S. Lander, editor. p.  cm. Includes bibliographical references and index. ISBN 0-309-04886-9 1. Genetics -- Mathematical models. 2. Genetics -- Statistical methods. 3. Molecular biology -- Mathematical models. 4. Molecular biology -- Statistical methods. I. Lander, Eric S. QH438.4.M3C35  1994 574.8'8'0151--dc20                                                        94-37628         CIP Copyright 1995 by the National Academy of Sciences. All rights reserved. Printed in the United States of America.

OCR for page R1
Page iii COMMITTEE ON THE MATHEMATICAL SCIENCES IN GENOME AND PROTEIN STRUCTURE RESEARCH ERIC S. LANDER, Whitehead Institute for Biomedical Research and Massachusetts Institute of Technology, Chair WALTER GILBERT, Harvard University HERBERT HAUPTMAN, Medical Foundation of Buffalo MICHAEL S. WATERMAN, University of Southern California JAMES H. WHITE, University of California at Los Angeles Staff JOHN R. TUCKER, Director RUTH E. O'BRIEN, Staff Associate

OCR for page R1
Page iv BOARD ON MATHEMATICAL SCIENCES SHMUEL WINOGRAD, IBM Corporation, Chair JEROME SACKS, National Institute of Statistical Sciences, Vice-Chair LOUIS AUSLANDER, City University of New York HYMAN BASS, Columbia University LAWRENCE D. BROWN, Cornell University AVNER FRIEDMAN, University of Minnesota JOHN F. GEWEKE, University of Minnesota JAMES GLIMM, State University of New York at Stony Brook GERALD J. LIEBERMAN, Stanford University PAUL S. MUHLY, University of Iowa RONALD F. PEIERLS, Brookhaven National Laboratory DONALD ST. P. RICHARDS, University of Virginia KAREN K. UHLENBECK, University of Texas at Austin MARY F. WHEELER, Rice University ROBERT J. ZIMMER, University of Chicago Ex Officio Member JON R. KETTENRING, Bell Communications Research Chair, Committee on Applied and Theoretical Statistics Staff JOHN R. TUCKER, Director RUTH E. O'BRIEN, Staff Associate BARBARA WRIGHT, Administrative Assistant

OCR for page R1
Page v COMMISSION ON PHYSICAL SCIENCES, MATHEMATICS, AND APPLICATIONS RICHARD N. ZARE, Stanford University, Chair RICHARD S. NICHOLSON, American Association for the Advancement of Science, Vice-Chair STEPHEN L. ADLER, Institute for Advanced Study SYLVIA T. CEYER, Massachusetts Institute of Technology SUSAN L. GRAHAM, University of California at Berkeley ROBERT J. HERMANN, United Technologies Corporation RHONDA J. HUGHES, Bryn Mawr College SHIRLEY A. JACKSON, Rutgers University KENNETH I. KELLERMANN, National Radio Astronomy Observatory HANS MARK, University of Texas at Austin THOMAS A. PRINCE, California Institute of Technology JEROME SACKS, National Institute of Statistical Sciences L.E. SCRIVEN, University of Minnesota A. RICHARD SEEBASS III, University of Colorado at Boulder LEON T. SILVER, California Institute of Technology CHARLES P. SLICHTER, University of Illinois at Urbana-Champaign ALVIN W. TRIVELPIECE, Oak Ridge National Laboratory SHMUEL WINOGRAD, IBM T. J. Watson Research Center CHARLES A. ZRAKET, MITRE Corporation (retired) NORMAN METZGER, Executive Director

OCR for page R1
Page vi The National Academy of Sciences is a private, nonprofit, self-perpetuating society of distinguished scholars engaged in scientific and engineering research, dedicated to the furtherance of science and technology and to their use for the general welfare. Upon the authority of the charter granted to it by the Congress in 1863, the Academy has a mandate that requires it to advise the federal government on scientific and technical matters. Dr. Bruce Alberts is president of the National Academy of Sciences. The National Academy of Engineering was established in 1964, under the charter of the National Academy of Sciences, as a parallel organization of outstanding engineers. It is autonomous in its administration and in the selection of its members, sharing with the National Academy of Sciences the responsibility for advising the federal government. The National Academy of Engineering also sponsors engineering programs aimed at meeting national needs, encourages education and research, and recognizes the superior achievements of engineers. Dr. Robert M. White is president of the National Academy of Engineering. The Institute of Medicine was established in 1970 by the National Academy of Sciences to secure the services of eminent members of appropriate professions in the examination of policy matters pertaining to the health of the public. The Institute acts under the responsibility given to the National Academy of Sciences by its congressional charter to be an advisor to the federal government and, upon its own initiative, to identify issues of medical care, research, and education. Dr. Kenneth I. Shine is president of the Institute of Medicine. The National Research Council was organized by the National Academy of Sciences in 1916 to associate the broad community of science and technology with the Academy's purposes of furthering knowledge and advising the federal government. Functioning in accordance with general policies determined by the Academy, the Council has become the principal operating agency of both the National Academy of Sciences and the National Academy of Engineering in providing services to the government, the public, and the scientific and engineering communities. The Council is administered jointly by both Academies and the Institute of Medicine. Dr. Bruce Alberts and Dr. Robert M. White are chairman and vice chairman, respectively, of the National Research Council.

OCR for page R1
Page vii Preface Molecular biology represents one of the greatest intellectual syntheses in the twentieth century. It has fused the traditional disciplines of genetics and biochemistry into an agent for understanding virtually any problem in biology or medicine. Moreover, it has produced a set of powerful techniques—called recombinant DNA technology—applicable to fundamental research and to biological engineering. Even as molecular biology establishes itself as the dominant paradigm throughout biology, the field itself is undergoing a new and profound transformation. With the availability of ever more powerful tools, molecular biologists have begun to assemble massive databases of information about the structure and function of genes and proteins. It is becoming clear that it will soon be possible to catalogue virtually all genes and to identify virtually all basic protein structures. What began as an enterprise akin to butterfly collecting has become an effort to construct biology's equivalent of the Periodic Table: a complete delineation of the molecular building blocks of life on this planet. The new thrust is most obvious in the Human Genome Project,1 but it is paralleled by similarly oriented efforts in structural and functional biology as well. As molecular biology works toward characterizing the genetic basis of biological processes, mathematical and computational sciences are beginning to play an increasingly important role: they will be essential for organization, interpretation, and prediction of the burgeoning experimental information. The role of mathematical theory in biology is, to be sure, different from its role in physics (which is more amenable to description by a set of simple equations), but it is no less crucial. The National Research Council organized the Committee on the Mathematical Sciences in Genome and Protein Structure Research to evaluate whether there was a need for increased interaction between mathematics and molecular biology. In its initial meeting, the committee 1Dausset, J., and H. Cann, 1994, "Our Genetic Patrimony," Science 264 (September 30), 1991; National Research Council, 1988, Mapping and Sequencing the Human Genome, Washington, D.C.: National Academy Press.

OCR for page R1
Page viii unanimously agreed that a need was evident. Focusing on the impediments to progress in the area, the committee concluded that the greatest obstacle to progress at the interface of these fields was not a lack of talented mathematicians, talented biologists, or grant funding. Rather, the major barrier was communication: mathematicians interested in working on problems in molecular biology faced an uphill battle in learning about a completely new and fast-moving field. In most cases, researchers working successfully at the interface of mathematics and molecular biology had solved this problem by finding a colleague willing to invest considerable time to teach them enough to be able to identify important problems and to begin productive work. The committee decided that it could make its greatest contribution not by writing a report confirming the need for interactions between mathematics and molecular biology, but rather by (to put it in biological terms) lowering the activation energy barrier for those interested in working at the interface. Specifically, the committee members agreed to produce a book that could serve as an introduction to the interface between mathematics and molecular biology. This book of signed chapters is the result of some three years of effort to create a product that would be interesting and accessible to both mathematicians and biologists. The book is not intended as a textbook, but rather as an introduction and an invitation to learn more. Each chapter aims to describe an important biological problem to which mathematical methods have made a significant contribution. As the examples make clear, mathematical and statistical issues have contributed key insights and advances to molecular biology, and, conversely, molecular biology has posed new challenges in the mathematical sciences. The book highlights those areas of the mathematical, statistical, and computational sciences that are important in cutting-edge research in molecular biology. It also tries to illustrate to the molecular biology community the role of mathematical methodologies in solving biomolecular problems. Although there is a growing community of researchers working at the interface of molecular biology and the mathematical sciences, the need still far outstrips the supply. The Board on Mathematical Sciences hopes this book will inspire more individuals to become involved. This book would not have been possible without sustained efforts by a number of people, to whom the committee and the Board on

OCR for page R1
Page ix Mathematical Sciences are grateful: John Tucker, Lawrence Cox, Hans Oser, and John Lavery played key roles in coordinating the study. Ruth O'Brien, Roseanne Price, and Susan Maurizi edited the text and oversaw production. Anonymous reviewers contributed to the clarity and understanding of the final text. The Alfred P. Sloan Foundation, the National Science Foundation, the Department of Energy, and the National Library of Medicine provided financial support. The Fondation des Treilles hosted and supported a week-long meeting at which the committee members presented extended lectures that became the basis for most of the chapters here. The committee wishes to thank all of these people and organizations for their assistance.

OCR for page R1

OCR for page R1
Page xi Contents 1 The Secrets Of Life: A Mathematician's Introduction To Molecular Biology Eric S. Lander And Michael S. Waterman 1 2 Mapping Heredity: Using Probabilistic Models And Algorithms To Map Genes And Genomes Eric S. Lander 25 3 Seeing Conserved Signals: Using Algorithms To Detect Similarities between Biosequences Eugene W. Myers 56 4 Hearing Distant Echoes: Using Extremal Statistics To Probe Evolutionary Origins Michael S. Waterman 90 5 Calibrating The Clock: Using Stochastic Processes To Measure The Rate Of Evolution Simon Tavaré 114 6 Winding The Double Helix: Using Geometry, Topology, And Mechanics Of DNA James H. White 153 7 Unwinding The Double Helix: Using Differential Mechanics To Probe Conformational Changes In DNA Craig J. Benham 179

OCR for page R1
Page xii 8 Lifting the Curtain: Using Topology to Probe the Hidden Action of Enzymes Dewitt Sumners 202 9 Folding the Sheets: Using Computational Methods to Predict the Structure of Proteins Fred E. Cohen 236 Appendix: Chapter Authors 272 Index 277

OCR for page R1
Applications of the Mathematical Sciences in Molecular Biology

OCR for page R1