MATHEMATICS AND 21ST CENTURY BIOLOGY

Committee on Mathematical Sciences Research for DOE’s Computational Biology

Board on Mathematical Sciences and Their Applications

Division on Engineering and Physical Sciences

NATIONAL RESEARCH COUNCIL OF THE NATIONAL ACADEMIES

THE NATIONAL ACADEMIES PRESS
Washington, D.C. www.nap.edu



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page R1
MATHEMATICS AND 21ST CENTURY BIOLOGY Committee on Mathematical Sciences Research for DOE’s Computational Biology Board on Mathematical Sciences and Their Applications Division on Engineering and Physical Sciences

OCR for page R1
THE NATIONAL ACADEMIES PRESS 500 Fifth Street, N.W. Washington, DC 20001 NOTICE: The project that is the subject of this report was approved by the Gov- erning Board of the National Research Council, whose members are drawn from the councils of the National Academy of Sciences, the National Academy of Engi- neering, and the Institute of Medicine. The members of the committee responsible for the report were chosen for their special competences and with regard for ap- propriate balance. This study was supported by Contract No. DE-AT01-03ER25552 between the Na- tional Academy of Sciences and the Department of Energy. Any opinions, find- ings, conclusions, or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the views of the organizations or agen- cies that provided support for the project. International Standard Book Number 0-309-09584-0 (Book) International Standard Book Number 0-309-54856-X (PDF) Library of Congress Catalog Card Number 2005024164 Additional copies of this report are available from the National Academies Press, 500 Fifth Street, N.W., Lockbox 285, Washington, DC 20055; (800) 624-6242 or (202) 334-3313 (in the Washington metropolitan area); Internet, http://www.nap.edu Copyright 2005 by the National Academy of Sciences. All rights reserved. Printed in the United States of America

OCR for page R1
The National Academy of Sciences is a private, nonprofit, self-perpetuating soci- ety of distinguished scholars engaged in scientific and engineering research, dedi- cated to the furtherance of science and technology and to their use for the general welfare. Upon the authority of the charter granted to it by the Congress in 1863, the Academy has a mandate that requires it to advise the federal government on scientific and technical matters. Dr. Bruce M. Alberts is president of the National Academy of Sciences. The National Academy of Engineering was established in 1964, under the charter of the National Academy of Sciences, as a parallel organization of outstanding engineers. It is autonomous in its administration and in the selection of its mem- bers, sharing with the National Academy of Sciences the responsibility for advis- ing the federal government. The National Academy of Engineering also sponsors engineering programs aimed at meeting national needs, encourages education and research, and recognizes the superior achievements of engineers. Dr. Wm. A. Wulf is president of the National Academy of Engineering. The Institute of Medicine was established in 1970 by the National Academy of Sciences to secure the services of eminent members of appropriate professions in the examination of policy matters pertaining to the health of the public. The Insti- tute acts under the responsibility given to the National Academy of Sciences by its congressional charter to be an adviser to the federal government and, upon its own initiative, to identify issues of medical care, research, and education. Dr. Harvey V. Fineberg is president of the Institute of Medicine. The National Research Council was organized by the National Academy of Sci- ences in 1916 to associate the broad community of science and technology with the Academy’s purposes of furthering knowledge and advising the federal gov- ernment. Functioning in accordance with general policies determined by the Academy, the Council has become the principal operating agency of both the Na- tional Academy of Sciences and the National Academy of Engineering in provid- ing services to the government, the public, and the scientific and engineering com- munities. The Council is administered jointly by both Academies and the Institute of Medicine. Dr. Bruce M. Alberts and Dr. Wm. A. Wulf are chair and vice chair, respectively, of the National Research Council. www.national-academies.org

OCR for page R1
COMMITTEE ON MATHEMATICAL SCIENCES RESEARCH FOR DOE’S COMPUTATIONAL BIOLOGY MAYNARD V. OLSON, University of Washington, Chair PETER J. BICKEL, University of California at Berkeley JACK D. COWAN, University of Chicago NINA FEDOROFF, Pennsylvania State University LESLIE GREENGARD, New York University RICHARD HUDSON, University of Chicago JAMES KEENER, University of Utah ROBERT LIPSHUTZ, Affymetrix, Inc. JILL P. MESIROV, Massachusetts Institute of Technology CLAUDIA NEUHAUSER, University of Minnesota STANISLAV Y. SHVARTSMAN, Princeton University GARY D. STORMO, Washington University MICHAEL S. WATERMAN, University of Southern California PETER G. WOLYNES, University of California at San Diego WING H. WONG, Stanford University JOHN WOOLEY, University of California at San Diego Staff SCOTT WEIDMAN, Director, Board on Mathematical Sciences and Their Applications JENNIFER SLIMOWITZ, Program Officer (through February 18, 2005) BARBARA WRIGHT, Administrative Assistant iv

OCR for page R1
BOARD ON MATHEMATICAL SCIENCES AND THEIR APPLICATIONS DAVID W. McLAUGHLIN, New York University, Chair TANYA STYBLO BEDER, Tribeca Investments, LLC PATRICK L. BROCKETT, University of Texas at Austin ARAVINDA CHAKRAVARTI, Johns Hopkins University School of Medicine PHILLIP COLELLA, Lawrence Berkeley National Laboratory LAWRENCE CRAIG EVANS, University of California at Berkeley JOHN E. HOPCROFT, Cornell University ROBERT KASS, Carnegie Mellon University KATHRYN B. LASKEY, George Mason University C. DAVID LEVERMORE, University of Maryland ROBERT LIPSHUTZ , Affymetrix, Inc. CHARLES M. LUCAS, AIG CHARLES MANSKI, Northwestern University JOYCE McLAUGHLIN, Rensselaer Polytechnic Institute PRABHAKAR RAGHAVAN, Verity, Inc. STEPHEN M. ROBINSON, University of Wisconsin-Madison EDWARD WEGMAN, George Mason University DETLOF VON WINTERFELDT, University of Southern California Staff SCOTT WEIDMAN, Director, Board on Mathematical Sciences and Their Applications JENNIFER SLIMOWITZ, Program Officer (through February 18, 2005) BARBARA WRIGHT, Administrative Assistant For more information on BMSA, see its Web site at http://www7. nationalacademies.org/bms v

OCR for page R1

OCR for page R1
Preface This report was commissioned by the Office of Advanced Scientific Computing Research (OASCR) at the Department of Energy (DOE). This office, which has broad responsibilities for applications of mathematics and computing to all fields of science of importance to DOE, sought ad- vice as specified in the charge to the committee: The study will recommend mathematical sciences research activities to the Department of Energy that will enable science to make effective use of the large amount of existing genomic information and the much larger and more diverse collections of structural and functional genomic infor- mation that are being created. The recommended activities should cover both current research needs and also include some higher-risk research that might lead to innovative approaches for the future. In discussions with OASCR officials, it became apparent that the in- tent was to sponsor a broad, scientifically based view of the opportunities that now lie at the interface between the mathematical sciences and biol- ogy. “The mathematical sciences” was to be broadly defined to include statistics, computational science, and all areas of applied mathematics.1 Although the Department of Energy is an agency with deep roots in ap- plying the mathematical sciences to the physical sciences—as well as a pioneer in selected biological applications such as protein-structure de- 1An upcoming National Academies report from the Computer Science and Telecommuni- cations Board will address the interface between computer science and biology. vii

OCR for page R1
viii PREFACE termination and genome sequencing—there was no intent that the com- mittee analyze specific DOE programs or restrict itself to DOE’s existing programmatic boundaries. Hence, the recommendations are stated in gen- eral terms and are applicable to programs at any of the funding organiza- tions whose missions encompass the mathematical sciences, biology, and the interactions between these fields, including but not limited to DOE. The committee has worked very hard to provide substantiated guidance about the scientific opportunities that these organizations are poised to support. This report has been reviewed in draft form by individuals chosen for their diverse perspectives and technical expertise, in accordance with pro- cedures approved by the NRC’s Report Review Committee. The purpose of this independent review is to provide candid and critical comments that will assist the institution in making its published report as sound as possible and to ensure that the report meets institutional standards for objectivity, evidence, and responsiveness to the study charge. The review comments and draft manuscript remain confidential to protect the integ- rity of the deliberative process. We wish to thank the following individu- als for their review of this report: James Collins, Boston University, Terry Gaasterland, Rockefeller University, David Haussler, University of California at Santa Cruz, Douglas Lauffenburger, Massachusetts Institute of Technology, and Simon Levin, Princeton University. Although the reviewers listed above have provided many construc- tive comments and suggestions, they were not asked to endorse the con- clusions or recommendations, nor did they see the final draft of the report before its release. The review of this report was overseen by Ronald Dou- glas, Texas A&M University. Appointed by the National Research Coun- cil, he was responsible for making certain that an independent examina- tion of this report was carried out in accordance with institutional procedures and that all review comments were carefully considered. Re- sponsibility for the final content of this report rests entirely with the authoring committee and the institution. In addition, the committee thanks Mark Daly, Avner Friedman, and Alan Perelson for their remarks and suggestions during the study process.

OCR for page R1
Contents EXECUTIVE SUMMARY 1 1 THE NATURE OF THE FIELD 12 Introduction, 12 The Mathematics-Biology Interface, 12 What Has Changed in Recent Years?, 15 What Makes Computational Biology Problems Hard?, 19 Factors Common to Successful Interactions Between the Mathematical Sciences and the Biosciences, 20 Preparing the Ground for Improved Synergies of Benefit to Both Fields, 22 Structure of This Report, 27 References, 27 2 HISTORICAL SUCCESSES 29 The Beginnings of Population Biology, 29 Inference of Gene Function by Homology, 30 Evolutionary Processes in Populations, 32 Modeling, 33 Medical and Biological Imaging, 34 Summary, 35 References, 36 ix

OCR for page R1
x CONTENTS 3 UNDERSTANDING MOLECULES 38 Introduction, 38 The Mathematics-Biology Connection, 39 Areas of Mathematical Applications for Molecules, 41 Sequence Analysis, 41 Structure Analysis, 43 Dynamics, 45 Interactions, 47 Future Directions, 48 References, 49 4 UNDERSTANDING CELLS 51 Introduction, 51 Exemplification of These Issues, 52 Cellular Structures, 55 Discovery of Cellular Networks and Their Functions, 57 From Networks to Cellular Functions, 60 From Cells to Tissues, 66 Data Integration, 68 Biological Considerations, 70 Future Directions, 72 References. 73 5 UNDERSTANDING ORGANISMS 80 Cardiac Physiology, 81 Circulatory Physiology, 84 Respiratory Physiology, 85 Information Processing, 86 Endocrine Physiology, 87 Morphogenesis and Pattern Formation, 88 Locomotion, 90 Cancer, 91 Delivery of Therapy to Target Tumor Cells, 91 Mechanisms of Drug Action, 92 Growth and Differentiation of Cell Populations, 92 Development of Resistance, 92 In Vivo Dynamics of the HIV-1 Infection, 93 Future Directions, 94 References, 95

OCR for page R1
xi CONTENTS 6 UNDERSTANDING POPULATIONS 99 Population Genetics, 99 Ecological Aspects of Populations, 104 A Synthesis of Ecology and Evolution, 106 References, 108 7 UNDERSTANDING COMMUNITIES AND ECOSYSTEMS 110 Computation, 116 Future Directions, 117 References, 122 8 CROSSCUTTING THEMES 127 The “Small n, Large P“ Problem, 127 Finding Patterns in Gene-Expression Data, 128 Supervised Learning, 131 Unsupervised Learning, 132 Analysis of Ordered Systems, 134 Applications of Hidden Markov Models to the Analysis of DNA, RNA, and Protein Sequences, 134 Profile HMMs, 135 HMMs in Gene Finding, 136 Applications of Monte Carlo Methods in Computational Biology, 138 Gibbs Sampling in Motif Finding, 139 Inference of Regulatory Networks, 139 Sampling Protein Conformations, 140 Lessons from Mathematical Themes of Current Import, 140 Processing of Low-Level Data, 142 Epilogue, 144 References, 145

OCR for page R1