1
The Nature of the Field

INTRODUCTION

Biology is in dramatic flux due to a surge of new sources of data, access to high-performance computing, increasing reliance on quantitative research methods, and an internally driven need to produce more quantitative and predictive models of biological processes. The growing infusion of mathematical tools and reasoning into biology may therefore be expected to further transform the life sciences during the decades ahead. This transformation will have profound effects on all areas of basic and applied biology.

Nonetheless, we are not starting from scratch in applying the mathematical sciences to biology. To a greater extent than is widely recognized—by biologists and nonbiologists alike—there has been a string of dramatic successes over more than a century that have been critical to advances in biology and have also led to new mathematics. The role that biological problems played in motivating the development of modern statistics is just one example that will be described in Chapter 2, “Historical Successes.”

THE MATHEMATICS-BIOLOGY INTERFACE

The interface between mathematics and biology can be examined across scales of biological problems and across all the major areas of mathematical sciences. Biological scales range from molecules, cells, organisms,



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 12
1 The Nature of the Field INTRODUCTION Biology is in dramatic flux due to a surge of new sources of data, access to high-performance computing, increasing reliance on quantita- tive research methods, and an internally driven need to produce more quantitative and predictive models of biological processes. The growing infusion of mathematical tools and reasoning into biology may therefore be expected to further transform the life sciences during the decades ahead. This transformation will have profound effects on all areas of basic and applied biology. Nonetheless, we are not starting from scratch in applying the math- ematical sciences to biology. To a greater extent than is widely recog- nized—by biologists and nonbiologists alike—there has been a string of dramatic successes over more than a century that have been critical to advances in biology and have also led to new mathematics. The role that biological problems played in motivating the development of modern sta- tistics is just one example that will be described in Chapter 2, “Historical Successes.” THE MATHEMATICS-BIOLOGY INTERFACE The interface between mathematics and biology can be examined across scales of biological problems and across all the major areas of math- ematical sciences. Biological scales range from molecules, cells, organisms, 12

OCR for page 12
13 THE NATURE OF THE FIELD and populations to communities.1 Much of the remainder of this report is organized around these biological scales, articulating examples of the bio- logical problems to be addressed at each scale. These scales may be briefly described as follows: • Molecules. Molecular biology focuses on the chemical components of life and their interactions. These components differ greatly in size and complexity, ranging from atoms and simple ions, through the basic mo- lecular building blocks of life such as nucleic and amino acids, sugars, and fats, to polymers and homogeneous and heterogeneous aggregates of the more basic units, forming macromolecular assemblies and super- molecular structures that carry out many of the fundamental processes in the life of a cell. The structures of these objects, as well as the dynamics of molecules and interactions between them, are central to biological function. • Cells. Cell biology is concerned with the self-replicating units of life, including bacteria, plant, and animal cells, as well as the viruses and other parasites that infect them. The study of the cell also includes consid- eration of many interconnected units or subcellular structures, such as organelles, which range in complexity from peroxisomes, proteosomes, or lysosomes, to mitochondria and chloroplasts, up to the nucleolus and the nucleus itself for eukaryotic organisms, and other structural compo- nents intrinsic to cell function such as the endoplasmic reticulum. The mechanisms and consequences of cell–cell communication are also of pri- mary interest. • Organisms. Organismal biology includes both the properties of whole organisms and the complex multicellular structures of which they are composed—the tissues, organs, organ systems, and integrative pro- cesses that create a robust whole out of diverse parts. Organisms sustain health and well-being in the face of considerable insults and environmen- tal disturbances, a process known as homeostasis. Another feature at this scale is the study of the breakdown of this robustness—in other words, the etiology and nature of disease. 1Dividing life into levels, or scales, is obvious, is essential for understanding, and reflects an intrinsic feature of biology. Nonetheless, the levels interact, and some of the division is for human convenience or is an artifact of scholarly history. Characterizing any one level requires at least considering its immediately adjacent levels; one could also provide a finer subdivision of some of the scales, but for clarity, the committee used the most commonly employed and obvious distinctions, ones that are important for how biologists think about the object of their study and that provide a means for mathematicians to think about how to engage biology.

OCR for page 12
14 MATHEMATICS AND 21ST CENTURY BIOLOGY • Populations. Population biology concerns groups of organisms of the same species. Genetic variation among individuals is of primary inter- est, as is the behavior of populations over time in real environments—for example, speciation, population fluctuation, and extinction. • Communities and ecosystems. Community ecology is the study of as- semblages of populations of different species and their interactions. Inter- actions between living and nonliving components, nutrient and carbon fluxes, and overall responses of ecosystems to changes in the physical environment are central issues of ecosystem ecology. The ecology of in- fectious disease is an example of an interspecies interaction of consider- able recent interest. Across all these levels of biological organizations, modeling of bio- logical processes plays a central role. Much of this report describes the types of models—and the associated mathematical techniques—that have been productive in biology. In considering this diverse landscape, it is important to understand that the word “model” has many meanings in biology. Concepts in biology are often illustrated by simple verbal or vi- sual models that are entirely qualitative. For example, most models of gene-regulatory circuits are of this nature: They specify, in simple draw- ings, which components of a pathway inhibit, and which stimulate, either their own synthesis or that of other components. Other models may be formulated mathematically even though they are primarily intended for heuristic use rather than for data analysis: Simple differential equations describing idealized predator-prey interactions are in this category. Fi- nally, sophisticated models designed to capture subtle features of large, real data sets are also diverse. Some are sets of partial differential equa- tions that would look familiar to a classical physicist. Others are designed to capture subtle statistical properties of data sets without reference to operative biological mechanisms. Many are stochastic models that guide sampling from combinatorially explosive sets of possible relationships between biological objects: Examples include coalescent models of pos- sible phylogenetic relationships between DNA sequences or between or- ganisms defined by sets of discrete phenotypes. Hidden Markov models of sites of transcription-factor binding in the regulatory regions of genes are also in this category. Throughout the report, this diversity of models should be kept firmly in mind. Of course, diversity in types of models is found in all fields of science. Nonetheless, biology is perhaps unique in the extent to which diversity in modeling practices is the rule, with the existence of a small set of standard paradigms that are applicable to broad sets of problems being the exception.

OCR for page 12
15 THE NATURE OF THE FIELD WHAT HAS CHANGED IN RECENT YEARS? While scientists have been studying biological systems at these vari- ous scales for many years and applying varying levels and types of math- ematics and statistics to them, recent achievements in biology and tech- nology have combined to create a dramatically new world of opportunity for the application of mathematics to biology. Rather suddenly, new ex- perimental methods and technologies have allowed the generation of bio- logical information at an astonishing rate. This phenomenon is playing out at all scales of biological analysis: • On the molecular scale, the human-genome sequence and the se- quences of many other genomes have been determined; methods are also available to measure the expression levels of all genes in an organism in a single experiment. New techniques in protein chemistry, as well as new radiation sources for structural analyses, are accelerating the rate at which proteins can be detected in complex biological samples, purified, and char- acterized structurally at atomic resolution. • On the cellular scale, new methods of cellular imaging are making it possible to track subcellular processes and to trace the propagation of signals at millisecond timescales. • For organisms, reductions in the cost of noninvasive imaging such as computed tomography (CT) scanning, magnetic resonance imaging (MRI), and positron emission tomography (PET) are making these meth- ods available as routine experimental tools. New methods, such as high- throughput patch-clamp studies, are providing electrophysiology data at previously unattainable rates. • For populations, it is now possible to measure the genetic differ- ences between organisms at hundreds of thousands of sites in a single, inexpensive experiment. • At the level of communities, new remote-sensing technologies are making it possible to measure entire ecosystems across multiple observa- tion channels at high resolution. The data that guide biology are diverse, and their integration is chal- lenging. Data sets span the entire range from genomic data to satellite data. Data may be collected at one point in time or continuously, resulting in a real-time data stream (Turner et al., 2004; Running et al., 2004). In addition, there are some cases (e.g., the National Science Foundation (NSF)-funded Long-term Ecological Research Sites or the Framingham Heart Study) where data have been collected about biological entities over long periods of time.

OCR for page 12
16 MATHEMATICS AND 21ST CENTURY BIOLOGY Biological data may contain large errors of unknown origin owing to complex interactions that are either poorly understood or inherently due to stochastic processes. Quantities may be inferred using proxy data—for instance, seasonal or annual temperature in the past can be inferred from tree rings. Model development and model parameterization must take these sources of uncertainty—complex interactions and stochastic pro- cesses—into account. In the past, data were almost exclusively collected by individual re- searchers or small groups of researchers, and they remained the property of those who collected them. Now, data are increasingly collected by larger groups of scientists and made publicly available. The Protein Data Bank and the National Center for Ecological Analysis and Synthesis are two examples of institutions established to promote the sharing of bio- logical data. One impediment to synthetic analytical work that relies on large data sets collected by different groups has been the lack of com- monly accepted standards for collecting, archiving, and annotating data and of agreement on what kinds of data should be collected. Nonetheless, there are increasing efforts to come to some agreement. For instance, Eco- logical Metadata Language (EML) is a way to standardize data annota- tion that is increasingly embraced by the ecological community. Some of the greatest computational challenges today come from data collected at the two extreme ends of spatial scales: genomic data and sat- ellite data. The explosive rate of genomic data generation is well known. In the case of more global data, Palumbi and colleagues (2003), for ex- ample, present a number of new data acquisition methods, including re- mote sensing to measure characteristics of the ocean (temperature, wind, surface elevation) or trace changes in ocean currents and DNA sequenc- ing to assess spatial and temporal trends in genetic diversity. New com- putational methods need to be developed to extract useful informa- tion from these vast amounts of data. The analysis of spatial data poses particular challenges due to the cor- relations that are inherent in spatial processes and due to local interac- tions and stochastic effects. As an example, a method widely used for detecting anomalies in space is the spatial-scan statistic. This statistic has been applied to the detection of disease outbreaks or invasive species (Patil and Taillie, 2003). One challenge is to couple on-the-ground observations and remotely sensed data. Visualization tools are indispensable when analyzing spatial data. In parallel with the accelerating rate of data acquisition, there has been an increase in the computational power available to the scientific commu- nity—on the desktop, in a research unit, or through the Internet, from a national resource or a grid of independent systems. Before computers were widely available, discoveries were made by combining data from

OCR for page 12
17 THE NATURE OF THE FIELD experiments or observational studies and their statistical analysis with simple models that served as the conceptual framework. When comput- ing power became available, this paradigm was extended to include com- putation. This process started in areas of biology that are closely aligned with the physical sciences (e.g., protein structure determination) and gradually spread throughout biology. Concomitantly, biologists became increasingly dependent on sophisticated data-analysis tools and complex, data-driven models. When analytical models are improved to the point of being good rep- resentations of a biological system, they are often analytically intractable, and biologists must turn to computation. In many cases, such models are systems of differential equations, which are fairly amenable to solution on computers thanks to mathematical advances of the past few decades. The numerical analysis community continues to increase the set of partial dif- ferential equations for which reliable, fast solvers exist. Robust techniques are now available for solving problems in electrostatics, diffusion, elastic- ity, and fluid dynamics. However, the needs of biology are among the most challenging. Particularly because it often calls for multiscale models, which include both deterministic and stochastic elements, the solution of sets of biologically motivated equations frequently exceeds current capa- bilities. There are not yet adequate capabilities for evaluating the range of uncertainties embedded in a computational model due to its parameters, discretization, and structure. Few tools are available to deal with these issues when the models are applied to large systems. Mathematical meth- ods are also being used in new ways to inform experimental design. Tra- ditionally, experimental design decisions, such as choosing the nature of a perturbation, the response measurements, whether to or how to do gene disruption, and the timing and scope of response measurement, have been made by the experimental biologist with only minimal consideration of the computational analysis that will be performed based on the resulting data. This was perhaps unavoidable, as techniques for gene disruption and high-throughput assays have until recently been the major limiting factors. However, as experimental genomic science advances, options are becoming increasingly available. In the future, experimental design con- siderations must be tightly coupled to the mathematical representations to be used to model the system and the computational and statistical meth- ods to be used for model identification and parameter estimation. For example, switches for transcription or protein modification have recently become available (Shimizu-Sato, 2002; Zeidler et al., 2004), mak- ing it feasible to implement oscillatory perturbations in a systematic man- ner. A natural question arises: Does oscillatory perturbation have any ad- vantage over traditional impulse or step perturbation? And if so, how do we quantify the advantages? A related question is what mathematical and

OCR for page 12
18 MATHEMATICS AND 21ST CENTURY BIOLOGY statistical techniques are useful for the analysis of the response to such oscillatory signals. To answer these questions, we need to study how to estimate the system parameters based on each type of perturbation and how the estimation may be affected by intrinsic and measurement noise. The mathematical techniques involved include Laplace transforms, opti- mal time and frequency sampling, and estimation theory under ill-posed conditions. Even for a stochastic network as simple as an autoregulatory gene whose protein activities are modulated by an input signal, these questions have not been studied until very recently (Lipan and Wong, 2005), and much more remains to be done. Although the benefits of oscil- latory perturbations are widely appreciated for the analysis of physical systems, this approach is rarely used in the study of cellular systems. Care- ful mathematical and simulation studies may help to interest experimen- tal investigators in evaluating the promise of novel perturbations of bio- logical systems. From the point of view of software, there are a number of open issues that are common to many scientific disciplines. These issues include the fact that codes are complex and difficult to support, especially on parallel computers (Post and Kendall, 2004). Software architectures that allow problem specification and access to all layers of code development would constitute an important step forward. Message passing interface (MPI)- based development platforms such as Portable, Extensible, Toolkit for Scientific Computation (PETSc) can accelerate the rate of progress, and efforts like DOE’s Scientific Discovery through Advanced Computing (SciDAC) program are promising. An upcoming report from the National Academies’ Computer Science and Telecommunications Board will dis- cuss, in particular, the interface between biology and the computing world. Another issue is verification and validation. Verification is defined as determining whether the calculations truly correspond to the equa- tions that constitute the analytical model. There are well-defined tech- niques for verification, but they are rarely used systematically. Valida- tion is a broader concept that is generally understood to mean the assessment of the model quality—that is, does the software correspond to biological reality? A part of the validation process that is common in the physical sciences but little used in biology is the conduct of experi- ments designed specifically to test the computational model itself rather than to study new phenomena. That is, we need to consider the output of a computational model as a testable hypothesis and then design biologi- cal experiments that try to disprove the hypothesis by collecting appro- priate data or exploring whether qualitative features of the computer output exist in real systems. In order for such approaches to contribute significantly to progress in understanding biology, the experimental com-

OCR for page 12
19 THE NATURE OF THE FIELD munity will need to be convinced of the value of such studies, which may not directly address the experimentalists’ goals. WHAT MAKES COMPUTATIONAL BIOLOGY PROBLEMS HARD? While the challenges posed by rapidly increasing amounts of data cut across all the sciences, those challenges posed by increased amounts of data in biology are uniquely difficult. At all scales of analysis, biology involves large numbers of types of objects, large numbers of objects of each type, and complex interactions between objects. In addition, biologi- cal objects can possess individuality, a history (e.g., of external stimuli, environmental insults, or inheritance), and a contingent existence (e.g., the location of components or of neighbors can be significant.) Except for relatively small contributions from phenomena such as bilateral body plan (when and where relevant), schemes for simplification that arise from symmetry are rarely possible at any scale in biology. The systems are ex- traordinarily heterogeneous in space and time, yet stunningly robust in the face of perturbation. Interactions across vastly different scales can have dramatic effects on system behavior. Thus, a tremendous quantity of data must be managed in creating useful biological models. Moreover, some of those data are very difficult to obtain. These issues are described in more detail below for the molecular scale; similar issues arise at the other scales and across scales: • Large number of types of objects and objects of each type. At the molecu- lar scale, with minor exceptions, proteins are synthesized from only 20 different amino acids. These units combine to produce tens of thousands of independently encoded proteins in humans, and there are many differ- ent mechanisms that can lead to the creation of variants—sometimes as- tonishing numbers of variants—of each of these independently encoded proteins. Analogous phenomena occur with nucleic acids and the poly- mers of sugars, fats, and other molecules. • Complex interactions between objects. DNA and RNA interact in- tramolecularly and with each other. DNA is the template for creating RNA, and RNA is the template for proteins. RNA and protein combine to form superstructures that themselves play central roles in the translation of RNA into proteins. Proteins interact intramolecularly and with other proteins, as well as with RNA, DNA, and a large variety of other mol- ecules to act as enzymes, structural components, signals, receptors of sig- nals, and inhibitors of signals. • Robustness. The many types of molecules in biological systems com- bine to form extraordinarily robust subcellular organelles, cells, tissues, organs, organ systems, organisms, populations, and communities. Bio-

OCR for page 12
20 MATHEMATICS AND 21ST CENTURY BIOLOGY logical interaction networks achieve this robustness through high levels of redundancy, modularity, heterogeneity, and feedback. It is not uncom- mon to find that genetic ablation of a normally critical signaling pathway not only fails to kill a cell but causes only the most subtle changes in its behavior. Other pathways can often provide similar functions even if they do not normally do so when the primary pathway is present. Similarly, feedback mechanisms make biological systems extraordinarily robust against both internal and external perturbations. Genetic variation is a particularly important example of an internal perturbation. Between the two copies of the genome present in each human, there are millions of sequence differences, many of which affect the regulation of genes and the structures of the encoded proteins. • Complex interactions across scales. All of this complexity is present at each scale of organization and in the interactions between scales. For ex- ample, it is infeasible to simulate organ-scale electrophysiology by mod- eling the ion fluxes through every membrane channel in every cell. There is, at the present time, no systematic way of bypassing this problem. Ex- isting approaches tend to be hybrid methods, which overcome such bottle- necks by using different models on different temporal or spatial scales coupled with heuristic models to transfer information between them. Sta- tistical methods are also used to integrate information obtained from fine- scale calculations to estimate the net response of an organ, tissue, or neu- ral network. FACTORS COMMON TO SUCCESSFUL INTERACTIONS BETWEEN THE MATHEMATICAL SCIENCES AND THE BIOSCIENCES As the committee examined the historical record and contemporary experience in applying mathematics to biology, a few simple observations that commonly underlie successful interactions came to the fore: • The biological problem has always been primary. Successful appli- cations of mathematics to biology are driven by a deep understanding of the relevant biology. Until this understanding is in place, it is not possible to state the problem with sufficient clarity and at a sufficient level of ab- straction to allow a meaningful mathematical formulation. Successful ap- plications always involve major simplifications of the actual system. How- ever, these simplifications must preserve the system’s essential features. This first observation gives rise to the following recommendation: Recommendation: Funding agencies supporting mathematical research related to the life sciences should give preference to proposals that indicate a clear understanding of the specific bio-

OCR for page 12
21 THE NATURE OF THE FIELD logical objectives of the research and include a realistic plan for how mathematicians and biologists will collaborate to achieve them. There are dual benefits of preferential support for proposals rich in both biological understanding and clarity about the mechanisms by which the collaboration will advance. Naturally, well-organized and well-posed research aimed at important biological problems will pay off early on, will help sustain further studies, and will open up new directions for fruit- ful inquiry. In addition, establishing such preferential support minimizes the risk of applying mathematics to poorly posed biological problems and maximizes the potential impact of quantitative tools. Rigorous prioritization will support a structural change in the biological sciences that encourages the use of quantitative approaches of all categories. More generally, success stories based on such considerations will be readily ex- ported to other biological research problems and will serve to validate the role of mathematicians in biology qua mathematicians rather than just as technical contributors and to validate, for experimental biologists, the role of mathematics itself in understanding biology.2 • As the committee discusses in more detail below, cultural and lin- guistic barriers create a potentially large divide between mathematicians and biologists. It is only after achieving a common language in which to discuss a particular problem that mathematics can be applied effectively. The common ground can lie anywhere along the spectrum from the lan- guage of biology to that of mathematics, but it has to be found, and each side has to move toward the other to do so. That said, it is important to recognize that communication barriers that appear to be linguistic often have deeper roots. Many of the difficulties that researchers trained in the physical sciences, engineering, and mathematics have in communicating with biologists relate to fundamental differences between biology and the physical sciences. Basic laws typically do not exist, and even basic prin- ciples are often still undiscovered. Once they understand that progress is possible despite these obstacles, some nonbiologists thrive in this strange, new scientific environment. Others find that their skills are best applied in better-defined settings. • Initial progress has almost always depended on existing math- ematical tools, often quite elementary ones. The complexity, particularly at early stages of analysis, is in the biology, not the mathematics. Any improvements to mathematical tools come later. 2 Of course this gap does not exist if one individual is well grounded in both fields. How- ever, it is more common, and generally more practical, to collaborate rather than to learn two disparate fields.

OCR for page 12
22 MATHEMATICS AND 21ST CENTURY BIOLOGY • Formulation of the problem has been as important as solving it. As they are first formulated, biological problems are typically ill posed or incompletely posed. The process for translating them into formal state- ments in the language of mathematics introduces a rigor that often uncov- ers questions that might not otherwise have been asked. The translation process causes both bioscientists and mathematicians to think carefully about all of the parts of the system and to decide systematically which variables, effects, and interactions to take into account. This process is also a critical test of whether the biologists and mathematicians working together on a problem have actually arrived at a common language. • Even though many biological problems have been solved using simple mathematics, a sophisticated and experienced mathematical scien- tist has often been required to find the solution. This paradox arises be- cause of the difficulty of abstracting the problem from its biological messi- ness and sifting through the enormous collection of tools and methods already potentially available for addressing it. In addition, the solution often involves applying familiar mathematical methods in unfamiliar ways or contexts. There have been cases where mathematical techniques were applied to biological problems with inadequate appreciation for the finer points of the biology, leading some to overstate the significance of their math- ematical results. The result was statements such as that of Mayr (1982, p. 304), who when explaining the role mathematics played in evolving the thinking of the ancient Greeks, wrote “This was the first of countless epi- sodes in the history of biology where mathematics or the physical sci- ences exerted a harmful influence on the development of biology.” This notion has held back the full introduction and exploitation of the power of mathematics in the study of biology. At the same time, a healthy skep- ticism is necessary for making progress in the sciences, and too-universal acceptance of approaches can impede progress as much as outright rejec- tion. A balance of different approaches often yields the greatest gains, as eloquently expressed by Naeem (2002) in the context of ecology: “. . . eco- logical truth lies in the confluence of observation, theory, and experiment. It is through discourse among empiricists and theorists that findings and theory are sorted and matched and where there is a lack of correspon- dence, new challenges identified.” PREPARING THE GROUND FOR IMPROVED SYNERGIES OF BENEFIT TO BOTH FIELDS Progress in the life sciences will increasingly depend on deep and broad integration of mathematical analysis into the study of all levels of

OCR for page 12
23 THE NATURE OF THE FIELD biological organization. No one level of organization stands out as offer- ing singularly attractive opportunities for mathematical applications. The challenges faced at different levels have distinctive characteristics, but there are also unifying themes. Recommendation: Funding agencies supporting mathematical research related to the life sciences should be receptive to re- search proposals that pertain to any level of biological organi- zation: molecules, cells, organisms, populations, and ecosys- tems. While much current research can be productively confined to a particular level, there are also substantial challenges and rewards associated with analyzing interactions between levels. The empirical factors for success listed in the previous section all point to one critical element: A true collaboration that brings together skills from the mathematical sciences and a deep knowledge of biology must be es- tablished. In response to this basic need, funding programs, research in- stitutions, and groups can experiment with conditions to facilitate such an establishment. Some of the factors to be addressed include these: • Communication. It is clear from the above that mathematical scien- tists and biologists have to find a common language so that all of the essential richness of a biological problem can be captured and formulated in mathematical terms. This can and should happen in both directions, with some biologists developing a deeper and more sophisticated under- standing of quantitative methods and many mathematical scientists ex- panding their understanding of biology to appreciate the scope of the problems to be addressed. (The primary model in the mind of the com- mittee is mathematical scientists contributing to biology research teams, not, for the most part, biologists learning all the necessary mathematics and statistics.) Interestingly, some of the most successful practitioners at the interface have come out of the physical and mathematical sciences, bringing a deep understanding of quantitative methods as well as biol- ogy, but neither to the exclusion of the other. • Timescales. The professional timescales of the fields are often mis- matched, and both sides of the collaboration need to develop an apprecia- tion for this reality. On the one hand, if a biological challenge demands the development of deep new mathematics or statistics, this process will typically require a detour of months or years, time that is not consistent with the competitive nature of researchers in the biological sciences and the expectations of them. On the other hand, existing mathematical meth- ods might require the generation of additional data (e.g., to enable good bounds on parameters or uncertainties), which might be time consuming and initially unrewarding to biologists.

OCR for page 12
24 MATHEMATICS AND 21ST CENTURY BIOLOGY • Recognition and advancement. If mathematical scientists are to in- vest time and effort in learning biology and to contribute what, from a mathematical perspective, may be relatively simple methods, then the mathematical sciences must adjust their reward systems. This difficulty is an age-old problem in academic departments: It flares up as practitio- ners in a field venture out to the interface with another field and devote more intellectual energy to transitioning research results than to directly advancing their own field’s research agenda. Of course, university de- partments will not adjust something as fundamental as their own inter- nal reward system in the absence of external stimuli and external re- wards. While simply putting forward funds for collaborations at the interface will provide some incentive, the funding agencies need to con- sider special honorific awards and special programs, and possibly other mechanisms, to encourage the needed changes in systems for recognition and advancement. Adjustments would also help with the differences in timescales between the expectations and realities of doing biology and doing mathematics, and agencies could consider mechanisms to satisfy both timescales. Provision of more funding at the interface, as planned, is the first step. Recommendation: Funding agencies supporting mathematical research related to the life sciences should place increased em- phasis on funding mechanisms and novel approaches to the organization of interdisciplinary research. The goal should be to foster effective collaboration between mathematical scientists and bioscientists by working to eliminate barriers posed by in- adequate communication, disparate timescales for achieving research objectives, inequitable recognition of contributors to interdisciplinary projects, and cultural divisions within univer- sities, research institutes, and national laboratories. In spite of the committee’s belief that most problems in biology can initially be addressed with fairly standard mathematics or statistics, there are occasions where exceptionally innovative researchers may be driven by the particularities of a problem to break out of traditional mathematics paradigms and develop truly novel methods. R.A. Fisher’s work on the analysis of variance is a dramatic example addressed in Chapter 2, “His- torical Successes.” There are also many examples where interesting mathematical prob- lems were abstracted away from the biological problems that motivated them, leading to mathematical sciences research that is valuable in its own right. Examples of this type are particularly common in combinatorics, algorithmics, and computational complexity theory. A typical example is the “adjacent ones” problem, which first arose in the 1950s in the context

OCR for page 12
25 THE NATURE OF THE FIELD of fine-structure genetic mapping. Once posed, it continued to interest mathematicians—and occasionally found new biological applications, in- cluding in the Human Genome Project—for 40 years (Benzer, 1959; Alizadeh et al., 1995). The committee’s sense is that the flow of research problems from biology back into mathematics is likely to become increas- ingly common as research expands at the interface of the two fields. It is important that more biologists recognize the value of true col- laboration with mathematical scientists. There is a common presumption that mathematical sciences research can be done in a vacuum—that is, that mathematical scientists tend to learn about a problem, retreat to their offices for several months, and reappear only when they have completed their research. This model is not at all true in applied areas, but many biologists have not been engaged in the iterative give-and-take that melds the complementary skills of mathematical and biological scientists to cre- ate an advance that neither could have achieved alone. Similarly, many biologists have not seen the powerful difference between using off-the- shelf formulas or software and using a method that is adapted by an expe- rienced mathematical scientist for a particular application. The charge to the committee asked for recommendations on how the DOE’s applied mathematics program can best support its computational biology aims. One thrust for that program should be the refinement of general-purpose tools whose broad biological utility has already been es- tablished. Some knowledge of biological applications is often important for pointing this research in optimally useful directions, but intimate fa- miliarity with specific biological problems may be unnecessary. A good example of this dynamic involves applications of Markov chain Monte Carlo (MCMC) methods in biology. These applications are now suffi- ciently well established that classes of mathematical problems, such as those governing the convergence properties of Markov chains, can be identified whose solution would almost surely prove relevant to a wide array of biological problems. Recommendation: Funding agencies supporting mathematical research related to the life sciences should support the refine- ment of general-purpose tools whose broad biological utility has already been established. Such research might require spe- cialized review criteria, particularly when the focus is on tool enhancement rather than breakthrough research. The committee believes that most advances in the near future in com- putational biology at all scales will come from adapting established math- ematical tools to biological problems. Biology is complicated, and what is needed is insight about which complications can be ignored and which are essential; it is easier to reach that insight when dealing with well-char-

OCR for page 12
26 MATHEMATICS AND 21ST CENTURY BIOLOGY acterized mathematical tools rather than novel ones that might add com- plexity. These insights will guide the application of sophisticated, but of- ten familiar, mathematical tools to extract as much information as pos- sible from large data sets. In some happy instances, this process will spawn new mathematics. However, no amount of mathematical sophisti- cation can overcome the intrinsic complexity of biological systems. The key will be to achieve steady improvements in our ability to simplify and approximate these systems without losing their essential characteristics. While this process of reduction will certainly require researchers with a good sense of the power and limitations of relevant mathematical tools, it will predominantly require an intimate knowledge of the living systems that they are attempting to approximate. By working for the most part with well-established mathematical tools, the mathematician and the bi- ologist can focus on what data might be missing or what approaches might not have been tried, in order to make the problem tractable. It should be easier to ascertain which features of the complexity can be neglected or ignored, which are essential, and which approaches can provide the best input for mathematical analysis. The range of mathematical sciences methods that have successfully contributed to biology is very large, as indicated in the rest of this report. Therefore, recommending that the DOE applied mathematics program cover those demonstrated areas of mathematics is not a restriction; in fact, it would require a substantial enlargement of that program’s traditional scope. Some of the most promising areas are discussed in the chapter “Crosscutting Themes,” but these should be seen as illustrative, not ex- clusive. As biology itself proceeds, the range of applicable mathematical methods might well expand. Openness, or inclusiveness, will be impor- tant to ensure that the methods of mathematics can contribute most effec- tively to biology. The federal agencies have set up processes recently to be more re- sponsive to tool development, to the more general aspects of infrastruc- ture support, to the provision of new methods, and to the development of new instruments, new approaches, or software, along with the more tra- ditional forms of infrastructure such as equipment. The agencies have also provided some funding to support what is called discovery science: data mining or exploratory work aimed at gaining a novel insight rather than testing a specific hypothesis. Interdisciplinary research, in general, often requires review processes carefully constructed to permit effective evalu- ation of novel approaches. More specifically, the plans for generalized tool development will need similar careful review and a mandate pro- vided through the call for proposals. Recommendation: Funding agencies supporting mathematical research related to the life sciences should give priority to re-

OCR for page 12
27 THE NATURE OF THE FIELD search that addresses intrinsic characteristics of biological sys- tems that reappear at many levels of biological organization: high dimensionality, heterogeneity, robustness, and the exist- ence of multiple spatial and temporal scales. The committee attempted to identify subdisciplines of mathematics in which broadly based advances would be particularly likely to enhance biological research. However, it concluded that since critical advances had come from nearly every subdiscipline within the mathematical sciences, any such prognostication would be mere guesswork. The committee be- lieves that excellent biology research can be achieved only by answering key questions within that discipline. Specifying a priori the tools to be developed inverts that goal. However, it is clear that if DOE’s applied mathematics program is to contribute to computational biology, it should focus on research that is linked to the intrinsic characteristics of biological systems that reappear at many levels of biological organization: high di- mensionality, heterogeneity, robustness, and the existence of multiple spa- tial and temporal scales. All areas of biology will benefit from improved mathematical representations of biological systems. STRUCTURE OF THIS REPORT Future biologists will use an enormous variety of mathematical tools. What will be distinctive about their research are the problems they aspire to solve rather than the tools they use to solve them. For this reason, this report is organized primarily around biological, rather than mathemati- cal, themes. Its survey of mathematical challenges in biology, which ranges from molecular to ecological levels of organization, is necessarily cursory. However, the report provides an introduction to the diverse chal- lenges that characterize contemporary applications of mathematics to bi- ology. The daunting task facing policy makers will be to develop mecha- nisms that encourage the deep integration of mathematics and biology needed for sustained progress across this vast, exciting, and rapidly evolv- ing scientific frontier. REFERENCES Alizadeh, F., R.M. Karp, D.K. Weisser, and G. Zweig. 1995. Physical mapping of chromo- somes using unique probes. J. Comput. Biol. 2: 159-184. Benzer, S. 1959. On the topology of the genetic fine structure. Proc. Natl. Acad. Sci. U.S.A. 45: 1607-1620. Lipan, O., and W.H. Wong. 2005. The use of oscillatory signals in the study of genetic net- works. Proc. Natl. Acad. Sciences U.S.A. 10.1073. Mayr, E. 1982. The Growth of Biological Thought. Cambridge, Mass.: Belknap Press.

OCR for page 12
28 MATHEMATICS AND 21ST CENTURY BIOLOGY Naeem, S., M. Loreau, and P. Inchausti. 2002. Biodiversity and ecosystem functioning: The emergence of a synthetic ecological framework. Pp. 3-11 in Biodiversity and Ecosystem Functioning, M. Loreau, S. Naeem, and P. Inchausti, eds. New York: Springer. Palumbi, S.R., S.D. Gaines, H. Leslie, and R.R. Warner. 2003. New wave: High-tech tools to help marine reserve research. Front. Ecol. Environ. 1(2): 73-79. Patil, G.P., and C. Taillie. 2003. Geographic and network surveillance via scan statistics for critical area detection. Statist. Sci. 18(4): 457-465. Post, D.E., and R.P. Kendall. 2004. Software project management and quality engineering practices for complex, coupled multiphysics, massively parallel computational simula- tions: Lessons learned from ASCI. Int. J. High Perform. Comput. Applic. 18(4): 399-416. Running, S.W., R.R. Nemani, F.A. Heinsch, M. Zhao, M. Reeves, and H. Hashimoto. 2004. A continuous satellite-derived measure of global terrestrial primary production. Bioscience 6: 547-560. Shimizu-Sato, S., E. Huq, J.M. Tepperman, and P.H. Quail. 2002. A light switchable gene promoter system. Nat. Biotechnol. 20(10): 1041-1044. Turner, D.P., S.V. Ollinger, and J.S. Kimball. 2004. Integrating remote sensing and ecosystem process models for landscape- to regional-scale analysis of the carbon cycle. Bioscience 6: 573-584. Zeidler, M.P., C. Tan, Y. Bellaiche, S. Cherry, S. Hader, U. Gayko, and N. Perrimon. 2004. Temperature-sensitive control of protein activity by conditionally splicing inteins. Nat. Biotechnol. 22(7): 871-876.