The Nature of the Field
Biology is in dramatic flux due to a surge of new sources of data, access to high-performance computing, increasing reliance on quantitative research methods, and an internally driven need to produce more quantitative and predictive models of biological processes. The growing infusion of mathematical tools and reasoning into biology may therefore be expected to further transform the life sciences during the decades ahead. This transformation will have profound effects on all areas of basic and applied biology.
Nonetheless, we are not starting from scratch in applying the mathematical sciences to biology. To a greater extent than is widely recognized—by biologists and nonbiologists alike—there has been a string of dramatic successes over more than a century that have been critical to advances in biology and have also led to new mathematics. The role that biological problems played in motivating the development of modern statistics is just one example that will be described in Chapter 2, “Historical Successes.”
THE MATHEMATICS-BIOLOGY INTERFACE
The interface between mathematics and biology can be examined across scales of biological problems and across all the major areas of mathematical sciences. Biological scales range from molecules, cells, organisms,
and populations to communities.1 Much of the remainder of this report is organized around these biological scales, articulating examples of the biological problems to be addressed at each scale. These scales may be briefly described as follows:
Molecules. Molecular biology focuses on the chemical components of life and their interactions. These components differ greatly in size and complexity, ranging from atoms and simple ions, through the basic molecular building blocks of life such as nucleic and amino acids, sugars, and fats, to polymers and homogeneous and heterogeneous aggregates of the more basic units, forming macromolecular assemblies and super-molecular structures that carry out many of the fundamental processes in the life of a cell. The structures of these objects, as well as the dynamics of molecules and interactions between them, are central to biological function.
Cells. Cell biology is concerned with the self-replicating units of life, including bacteria, plant, and animal cells, as well as the viruses and other parasites that infect them. The study of the cell also includes consideration of many interconnected units or subcellular structures, such as organelles, which range in complexity from peroxisomes, proteosomes, or lysosomes, to mitochondria and chloroplasts, up to the nucleolus and the nucleus itself for eukaryotic organisms, and other structural components intrinsic to cell function such as the endoplasmic reticulum. The mechanisms and consequences of cell–cell communication are also of primary interest.
Organisms. Organismal biology includes both the properties of whole organisms and the complex multicellular structures of which they are composed—the tissues, organs, organ systems, and integrative processes that create a robust whole out of diverse parts. Organisms sustain health and well-being in the face of considerable insults and environmental disturbances, a process known as homeostasis. Another feature at this scale is the study of the breakdown of this robustness—in other words, the etiology and nature of disease.
Populations. Population biology concerns groups of organisms of the same species. Genetic variation among individuals is of primary interest, as is the behavior of populations over time in real environments—for example, speciation, population fluctuation, and extinction.
Communities and ecosystems. Community ecology is the study of assemblages of populations of different species and their interactions. Interactions between living and nonliving components, nutrient and carbon fluxes, and overall responses of ecosystems to changes in the physical environment are central issues of ecosystem ecology. The ecology of infectious disease is an example of an interspecies interaction of considerable recent interest.
Across all these levels of biological organizations, modeling of biological processes plays a central role. Much of this report describes the types of models—and the associated mathematical techniques—that have been productive in biology. In considering this diverse landscape, it is important to understand that the word “model” has many meanings in biology. Concepts in biology are often illustrated by simple verbal or visual models that are entirely qualitative. For example, most models of gene-regulatory circuits are of this nature: They specify, in simple drawings, which components of a pathway inhibit, and which stimulate, either their own synthesis or that of other components. Other models may be formulated mathematically even though they are primarily intended for heuristic use rather than for data analysis: Simple differential equations describing idealized predator-prey interactions are in this category. Finally, sophisticated models designed to capture subtle features of large, real data sets are also diverse. Some are sets of partial differential equations that would look familiar to a classical physicist. Others are designed to capture subtle statistical properties of data sets without reference to operative biological mechanisms. Many are stochastic models that guide sampling from combinatorially explosive sets of possible relationships between biological objects: Examples include coalescent models of possible phylogenetic relationships between DNA sequences or between organisms defined by sets of discrete phenotypes. Hidden Markov models of sites of transcription-factor binding in the regulatory regions of genes are also in this category. Throughout the report, this diversity of models should be kept firmly in mind. Of course, diversity in types of models is found in all fields of science. Nonetheless, biology is perhaps unique in the extent to which diversity in modeling practices is the rule, with the existence of a small set of standard paradigms that are applicable to broad sets of problems being the exception.
WHAT HAS CHANGED IN RECENT YEARS?
While scientists have been studying biological systems at these various scales for many years and applying varying levels and types of mathematics and statistics to them, recent achievements in biology and technology have combined to create a dramatically new world of opportunity for the application of mathematics to biology. Rather suddenly, new experimental methods and technologies have allowed the generation of biological information at an astonishing rate. This phenomenon is playing out at all scales of biological analysis:
On the molecular scale, the human-genome sequence and the sequences of many other genomes have been determined; methods are also available to measure the expression levels of all genes in an organism in a single experiment. New techniques in protein chemistry, as well as new radiation sources for structural analyses, are accelerating the rate at which proteins can be detected in complex biological samples, purified, and characterized structurally at atomic resolution.
On the cellular scale, new methods of cellular imaging are making it possible to track subcellular processes and to trace the propagation of signals at millisecond timescales.
For organisms, reductions in the cost of noninvasive imaging such as computed tomography (CT) scanning, magnetic resonance imaging (MRI), and positron emission tomography (PET) are making these methods available as routine experimental tools. New methods, such as high-throughput patch-clamp studies, are providing electrophysiology data at previously unattainable rates.
For populations, it is now possible to measure the genetic differences between organisms at hundreds of thousands of sites in a single, inexpensive experiment.
At the level of communities, new remote-sensing technologies are making it possible to measure entire ecosystems across multiple observation channels at high resolution.
The data that guide biology are diverse, and their integration is challenging. Data sets span the entire range from genomic data to satellite data. Data may be collected at one point in time or continuously, resulting in a real-time data stream (Turner et al., 2004; Running et al., 2004). In addition, there are some cases (e.g., the National Science Foundation (NSF)-funded Long-term Ecological Research Sites or the Framingham Heart Study) where data have been collected about biological entities over long periods of time.
Biological data may contain large errors of unknown origin owing to complex interactions that are either poorly understood or inherently due to stochastic processes. Quantities may be inferred using proxy data—for instance, seasonal or annual temperature in the past can be inferred from tree rings. Model development and model parameterization must take these sources of uncertainty—complex interactions and stochastic processes—into account.
In the past, data were almost exclusively collected by individual researchers or small groups of researchers, and they remained the property of those who collected them. Now, data are increasingly collected by larger groups of scientists and made publicly available. The Protein Data Bank and the National Center for Ecological Analysis and Synthesis are two examples of institutions established to promote the sharing of biological data. One impediment to synthetic analytical work that relies on large data sets collected by different groups has been the lack of commonly accepted standards for collecting, archiving, and annotating data and of agreement on what kinds of data should be collected. Nonetheless, there are increasing efforts to come to some agreement. For instance, Ecological Metadata Language (EML) is a way to standardize data annotation that is increasingly embraced by the ecological community.
Some of the greatest computational challenges today come from data collected at the two extreme ends of spatial scales: genomic data and satellite data. The explosive rate of genomic data generation is well known. In the case of more global data, Palumbi and colleagues (2003), for example, present a number of new data acquisition methods, including remote sensing to measure characteristics of the ocean (temperature, wind, surface elevation) or trace changes in ocean currents and DNA sequencing to assess spatial and temporal trends in genetic diversity. New computational methods need to be developed to extract useful information from these vast amounts of data.
The analysis of spatial data poses particular challenges due to the correlations that are inherent in spatial processes and due to local interactions and stochastic effects. As an example, a method widely used for detecting anomalies in space is the spatial-scan statistic. This statistic has been applied to the detection of disease outbreaks or invasive species (Patil and Taillie, 2003). One challenge is to couple on-the-ground observations and remotely sensed data. Visualization tools are indispensable when analyzing spatial data.
In parallel with the accelerating rate of data acquisition, there has been an increase in the computational power available to the scientific community—on the desktop, in a research unit, or through the Internet, from a national resource or a grid of independent systems. Before computers were widely available, discoveries were made by combining data from
experiments or observational studies and their statistical analysis with simple models that served as the conceptual framework. When computing power became available, this paradigm was extended to include computation. This process started in areas of biology that are closely aligned with the physical sciences (e.g., protein structure determination) and gradually spread throughout biology. Concomitantly, biologists became increasingly dependent on sophisticated data-analysis tools and complex, data-driven models.
When analytical models are improved to the point of being good representations of a biological system, they are often analytically intractable, and biologists must turn to computation. In many cases, such models are systems of differential equations, which are fairly amenable to solution on computers thanks to mathematical advances of the past few decades. The numerical analysis community continues to increase the set of partial differential equations for which reliable, fast solvers exist. Robust techniques are now available for solving problems in electrostatics, diffusion, elasticity, and fluid dynamics. However, the needs of biology are among the most challenging. Particularly because it often calls for multiscale models, which include both deterministic and stochastic elements, the solution of sets of biologically motivated equations frequently exceeds current capabilities. There are not yet adequate capabilities for evaluating the range of uncertainties embedded in a computational model due to its parameters, discretization, and structure. Few tools are available to deal with these issues when the models are applied to large systems. Mathematical methods are also being used in new ways to inform experimental design. Traditionally, experimental design decisions, such as choosing the nature of a perturbation, the response measurements, whether to or how to do gene disruption, and the timing and scope of response measurement, have been made by the experimental biologist with only minimal consideration of the computational analysis that will be performed based on the resulting data. This was perhaps unavoidable, as techniques for gene disruption and high-throughput assays have until recently been the major limiting factors. However, as experimental genomic science advances, options are becoming increasingly available. In the future, experimental design considerations must be tightly coupled to the mathematical representations to be used to model the system and the computational and statistical methods to be used for model identification and parameter estimation.
For example, switches for transcription or protein modification have recently become available (Shimizu-Sato, 2002; Zeidler et al., 2004), making it feasible to implement oscillatory perturbations in a systematic manner. A natural question arises: Does oscillatory perturbation have any advantage over traditional impulse or step perturbation? And if so, how do we quantify the advantages? A related question is what mathematical and
statistical techniques are useful for the analysis of the response to such oscillatory signals. To answer these questions, we need to study how to estimate the system parameters based on each type of perturbation and how the estimation may be affected by intrinsic and measurement noise. The mathematical techniques involved include Laplace transforms, optimal time and frequency sampling, and estimation theory under ill-posed conditions. Even for a stochastic network as simple as an autoregulatory gene whose protein activities are modulated by an input signal, these questions have not been studied until very recently (Lipan and Wong, 2005), and much more remains to be done. Although the benefits of oscillatory perturbations are widely appreciated for the analysis of physical systems, this approach is rarely used in the study of cellular systems. Careful mathematical and simulation studies may help to interest experimental investigators in evaluating the promise of novel perturbations of biological systems.
From the point of view of software, there are a number of open issues that are common to many scientific disciplines. These issues include the fact that codes are complex and difficult to support, especially on parallel computers (Post and Kendall, 2004). Software architectures that allow problem specification and access to all layers of code development would constitute an important step forward. Message passing interface (MPI)-based development platforms such as Portable, Extensible, Toolkit for Scientific Computation (PETSc) can accelerate the rate of progress, and efforts like DOE’s Scientific Discovery through Advanced Computing (SciDAC) program are promising. An upcoming report from the National Academies’ Computer Science and Telecommunications Board will discuss, in particular, the interface between biology and the computing world.
Another issue is verification and validation. Verification is defined as determining whether the calculations truly correspond to the equations that constitute the analytical model. There are well-defined techniques for verification, but they are rarely used systematically. Validation is a broader concept that is generally understood to mean the assessment of the model quality—that is, does the software correspond to biological reality? A part of the validation process that is common in the physical sciences but little used in biology is the conduct of experiments designed specifically to test the computational model itself rather than to study new phenomena. That is, we need to consider the output of a computational model as a testable hypothesis and then design biological experiments that try to disprove the hypothesis by collecting appropriate data or exploring whether qualitative features of the computer output exist in real systems. In order for such approaches to contribute significantly to progress in understanding biology, the experimental com-
munity will need to be convinced of the value of such studies, which may not directly address the experimentalists’ goals.
WHAT MAKES COMPUTATIONAL BIOLOGY PROBLEMS HARD?
While the challenges posed by rapidly increasing amounts of data cut across all the sciences, those challenges posed by increased amounts of data in biology are uniquely difficult. At all scales of analysis, biology involves large numbers of types of objects, large numbers of objects of each type, and complex interactions between objects. In addition, biological objects can possess individuality, a history (e.g., of external stimuli, environmental insults, or inheritance), and a contingent existence (e.g., the location of components or of neighbors can be significant.) Except for relatively small contributions from phenomena such as bilateral body plan (when and where relevant), schemes for simplification that arise from symmetry are rarely possible at any scale in biology. The systems are extraordinarily heterogeneous in space and time, yet stunningly robust in the face of perturbation. Interactions across vastly different scales can have dramatic effects on system behavior. Thus, a tremendous quantity of data must be managed in creating useful biological models. Moreover, some of those data are very difficult to obtain. These issues are described in more detail below for the molecular scale; similar issues arise at the other scales and across scales:
Large number of types of objects and objects of each type. At the molecular scale, with minor exceptions, proteins are synthesized from only 20 different amino acids. These units combine to produce tens of thousands of independently encoded proteins in humans, and there are many different mechanisms that can lead to the creation of variants—sometimes astonishing numbers of variants—of each of these independently encoded proteins. Analogous phenomena occur with nucleic acids and the polymers of sugars, fats, and other molecules.
Complex interactions between objects. DNA and RNA interact intramolecularly and with each other. DNA is the template for creating RNA, and RNA is the template for proteins. RNA and protein combine to form superstructures that themselves play central roles in the translation of RNA into proteins. Proteins interact intramolecularly and with other proteins, as well as with RNA, DNA, and a large variety of other molecules to act as enzymes, structural components, signals, receptors of signals, and inhibitors of signals.
Robustness. The many types of molecules in biological systems combine to form extraordinarily robust subcellular organelles, cells, tissues, organs, organ systems, organisms, populations, and communities. Bio-
logical interaction networks achieve this robustness through high levels of redundancy, modularity, heterogeneity, and feedback. It is not uncommon to find that genetic ablation of a normally critical signaling pathway not only fails to kill a cell but causes only the most subtle changes in its behavior. Other pathways can often provide similar functions even if they do not normally do so when the primary pathway is present. Similarly, feedback mechanisms make biological systems extraordinarily robust against both internal and external perturbations. Genetic variation is a particularly important example of an internal perturbation. Between the two copies of the genome present in each human, there are millions of sequence differences, many of which affect the regulation of genes and the structures of the encoded proteins.
Complex interactions across scales. All of this complexity is present at each scale of organization and in the interactions between scales. For example, it is infeasible to simulate organ-scale electrophysiology by modeling the ion fluxes through every membrane channel in every cell. There is, at the present time, no systematic way of bypassing this problem. Existing approaches tend to be hybrid methods, which overcome such bottlenecks by using different models on different temporal or spatial scales coupled with heuristic models to transfer information between them. Statistical methods are also used to integrate information obtained from finescale calculations to estimate the net response of an organ, tissue, or neural network.
FACTORS COMMON TO SUCCESSFUL INTERACTIONS BETWEEN THE MATHEMATICAL SCIENCES AND THE BIOSCIENCES
As the committee examined the historical record and contemporary experience in applying mathematics to biology, a few simple observations that commonly underlie successful interactions came to the fore:
The biological problem has always been primary. Successful applications of mathematics to biology are driven by a deep understanding of the relevant biology. Until this understanding is in place, it is not possible to state the problem with sufficient clarity and at a sufficient level of abstraction to allow a meaningful mathematical formulation. Successful applications always involve major simplifications of the actual system. However, these simplifications must preserve the system’s essential features. This first observation gives rise to the following recommendation:
Recommendation: Funding agencies supporting mathematical research related to the life sciences should give preference to proposals that indicate a clear understanding of the specific bio-
logical objectives of the research and include a realistic plan for how mathematicians and biologists will collaborate to achieve them.
There are dual benefits of preferential support for proposals rich in both biological understanding and clarity about the mechanisms by which the collaboration will advance. Naturally, well-organized and well-posed research aimed at important biological problems will pay off early on, will help sustain further studies, and will open up new directions for fruitful inquiry. In addition, establishing such preferential support minimizes the risk of applying mathematics to poorly posed biological problems and maximizes the potential impact of quantitative tools. Rigorous prioritization will support a structural change in the biological sciences that encourages the use of quantitative approaches of all categories. More generally, success stories based on such considerations will be readily exported to other biological research problems and will serve to validate the role of mathematicians in biology qua mathematicians rather than just as technical contributors and to validate, for experimental biologists, the role of mathematics itself in understanding biology.2
As the committee discusses in more detail below, cultural and linguistic barriers create a potentially large divide between mathematicians and biologists. It is only after achieving a common language in which to discuss a particular problem that mathematics can be applied effectively. The common ground can lie anywhere along the spectrum from the language of biology to that of mathematics, but it has to be found, and each side has to move toward the other to do so. That said, it is important to recognize that communication barriers that appear to be linguistic often have deeper roots. Many of the difficulties that researchers trained in the physical sciences, engineering, and mathematics have in communicating with biologists relate to fundamental differences between biology and the physical sciences. Basic laws typically do not exist, and even basic principles are often still undiscovered. Once they understand that progress is possible despite these obstacles, some nonbiologists thrive in this strange, new scientific environment. Others find that their skills are best applied in better-defined settings.
Initial progress has almost always depended on existing mathematical tools, often quite elementary ones. The complexity, particularly at early stages of analysis, is in the biology, not the mathematics. Any improvements to mathematical tools come later.
Formulation of the problem has been as important as solving it. As they are first formulated, biological problems are typically ill posed or incompletely posed. The process for translating them into formal statements in the language of mathematics introduces a rigor that often uncovers questions that might not otherwise have been asked. The translation process causes both bioscientists and mathematicians to think carefully about all of the parts of the system and to decide systematically which variables, effects, and interactions to take into account. This process is also a critical test of whether the biologists and mathematicians working together on a problem have actually arrived at a common language.
Even though many biological problems have been solved using simple mathematics, a sophisticated and experienced mathematical scientist has often been required to find the solution. This paradox arises because of the difficulty of abstracting the problem from its biological messiness and sifting through the enormous collection of tools and methods already potentially available for addressing it. In addition, the solution often involves applying familiar mathematical methods in unfamiliar ways or contexts.
There have been cases where mathematical techniques were applied to biological problems with inadequate appreciation for the finer points of the biology, leading some to overstate the significance of their mathematical results. The result was statements such as that of Mayr (1982, p. 304), who when explaining the role mathematics played in evolving the thinking of the ancient Greeks, wrote “This was the first of countless episodes in the history of biology where mathematics or the physical sciences exerted a harmful influence on the development of biology.” This notion has held back the full introduction and exploitation of the power of mathematics in the study of biology. At the same time, a healthy skepticism is necessary for making progress in the sciences, and too-universal acceptance of approaches can impede progress as much as outright rejection. A balance of different approaches often yields the greatest gains, as eloquently expressed by Naeem (2002) in the context of ecology: “… ecological truth lies in the confluence of observation, theory, and experiment. It is through discourse among empiricists and theorists that findings and theory are sorted and matched and where there is a lack of correspondence, new challenges identified.”
PREPARING THE GROUND FOR IMPROVED SYNERGIES OF BENEFIT TO BOTH FIELDS
Progress in the life sciences will increasingly depend on deep and broad integration of mathematical analysis into the study of all levels of
biological organization. No one level of organization stands out as offering singularly attractive opportunities for mathematical applications. The challenges faced at different levels have distinctive characteristics, but there are also unifying themes.
Recommendation: Funding agencies supporting mathematical research related to the life sciences should be receptive to research proposals that pertain to any level of biological organization: molecules, cells, organisms, populations, and ecosystems. While much current research can be productively confined to a particular level, there are also substantial challenges and rewards associated with analyzing interactions between levels.
The empirical factors for success listed in the previous section all point to one critical element: A true collaboration that brings together skills from the mathematical sciences and a deep knowledge of biology must be established. In response to this basic need, funding programs, research institutions, and groups can experiment with conditions to facilitate such an establishment. Some of the factors to be addressed include these:
Communication. It is clear from the above that mathematical scientists and biologists have to find a common language so that all of the essential richness of a biological problem can be captured and formulated in mathematical terms. This can and should happen in both directions, with some biologists developing a deeper and more sophisticated understanding of quantitative methods and many mathematical scientists expanding their understanding of biology to appreciate the scope of the problems to be addressed. (The primary model in the mind of the committee is mathematical scientists contributing to biology research teams, not, for the most part, biologists learning all the necessary mathematics and statistics.) Interestingly, some of the most successful practitioners at the interface have come out of the physical and mathematical sciences, bringing a deep understanding of quantitative methods as well as biology, but neither to the exclusion of the other.
Timescales. The professional timescales of the fields are often mismatched, and both sides of the collaboration need to develop an appreciation for this reality. On the one hand, if a biological challenge demands the development of deep new mathematics or statistics, this process will typically require a detour of months or years, time that is not consistent with the competitive nature of researchers in the biological sciences and the expectations of them. On the other hand, existing mathematical methods might require the generation of additional data (e.g., to enable good bounds on parameters or uncertainties), which might be time consuming and initially unrewarding to biologists.
Recognition and advancement. If mathematical scientists are to invest time and effort in learning biology and to contribute what, from a mathematical perspective, may be relatively simple methods, then the mathematical sciences must adjust their reward systems. This difficulty is an age-old problem in academic departments: It flares up as practitioners in a field venture out to the interface with another field and devote more intellectual energy to transitioning research results than to directly advancing their own field’s research agenda. Of course, university departments will not adjust something as fundamental as their own internal reward system in the absence of external stimuli and external rewards. While simply putting forward funds for collaborations at the interface will provide some incentive, the funding agencies need to consider special honorific awards and special programs, and possibly other mechanisms, to encourage the needed changes in systems for recognition and advancement. Adjustments would also help with the differences in timescales between the expectations and realities of doing biology and doing mathematics, and agencies could consider mechanisms to satisfy both timescales. Provision of more funding at the interface, as planned, is the first step.
Recommendation: Funding agencies supporting mathematical research related to the life sciences should place increased emphasis on funding mechanisms and novel approaches to the organization of interdisciplinary research. The goal should be to foster effective collaboration between mathematical scientists and bioscientists by working to eliminate barriers posed by inadequate communication, disparate timescales for achieving research objectives, inequitable recognition of contributors to interdisciplinary projects, and cultural divisions within universities, research institutes, and national laboratories.
In spite of the committee’s belief that most problems in biology can initially be addressed with fairly standard mathematics or statistics, there are occasions where exceptionally innovative researchers may be driven by the particularities of a problem to break out of traditional mathematics paradigms and develop truly novel methods. R.A. Fisher’s work on the analysis of variance is a dramatic example addressed in Chapter 2, “Historical Successes.”
There are also many examples where interesting mathematical problems were abstracted away from the biological problems that motivated them, leading to mathematical sciences research that is valuable in its own right. Examples of this type are particularly common in combinatorics, algorithmics, and computational complexity theory. A typical example is the “adjacent ones” problem, which first arose in the 1950s in the context
of fine-structure genetic mapping. Once posed, it continued to interest mathematicians—and occasionally found new biological applications, including in the Human Genome Project—for 40 years (Benzer, 1959; Alizadeh et al., 1995). The committee’s sense is that the flow of research problems from biology back into mathematics is likely to become increasingly common as research expands at the interface of the two fields.
It is important that more biologists recognize the value of true collaboration with mathematical scientists. There is a common presumption that mathematical sciences research can be done in a vacuum—that is, that mathematical scientists tend to learn about a problem, retreat to their offices for several months, and reappear only when they have completed their research. This model is not at all true in applied areas, but many biologists have not been engaged in the iterative give-and-take that melds the complementary skills of mathematical and biological scientists to create an advance that neither could have achieved alone. Similarly, many biologists have not seen the powerful difference between using off-the-shelf formulas or software and using a method that is adapted by an experienced mathematical scientist for a particular application.
The charge to the committee asked for recommendations on how the DOE’s applied mathematics program can best support its computational biology aims. One thrust for that program should be the refinement of general-purpose tools whose broad biological utility has already been established. Some knowledge of biological applications is often important for pointing this research in optimally useful directions, but intimate familiarity with specific biological problems may be unnecessary. A good example of this dynamic involves applications of Markov chain Monte Carlo (MCMC) methods in biology. These applications are now sufficiently well established that classes of mathematical problems, such as those governing the convergence properties of Markov chains, can be identified whose solution would almost surely prove relevant to a wide array of biological problems.
Recommendation: Funding agencies supporting mathematical research related to the life sciences should support the refinement of general-purpose tools whose broad biological utility has already been established. Such research might require specialized review criteria, particularly when the focus is on tool enhancement rather than breakthrough research.
The committee believes that most advances in the near future in computational biology at all scales will come from adapting established mathematical tools to biological problems. Biology is complicated, and what is needed is insight about which complications can be ignored and which are essential; it is easier to reach that insight when dealing with well-char-
acterized mathematical tools rather than novel ones that might add complexity. These insights will guide the application of sophisticated, but often familiar, mathematical tools to extract as much information as possible from large data sets. In some happy instances, this process will spawn new mathematics. However, no amount of mathematical sophistication can overcome the intrinsic complexity of biological systems. The key will be to achieve steady improvements in our ability to simplify and approximate these systems without losing their essential characteristics. While this process of reduction will certainly require researchers with a good sense of the power and limitations of relevant mathematical tools, it will predominantly require an intimate knowledge of the living systems that they are attempting to approximate. By working for the most part with well-established mathematical tools, the mathematician and the biologist can focus on what data might be missing or what approaches might not have been tried, in order to make the problem tractable. It should be easier to ascertain which features of the complexity can be neglected or ignored, which are essential, and which approaches can provide the best input for mathematical analysis.
The range of mathematical sciences methods that have successfully contributed to biology is very large, as indicated in the rest of this report. Therefore, recommending that the DOE applied mathematics program cover those demonstrated areas of mathematics is not a restriction; in fact, it would require a substantial enlargement of that program’s traditional scope. Some of the most promising areas are discussed in the chapter “Crosscutting Themes,” but these should be seen as illustrative, not exclusive. As biology itself proceeds, the range of applicable mathematical methods might well expand. Openness, or inclusiveness, will be important to ensure that the methods of mathematics can contribute most effectively to biology.
The federal agencies have set up processes recently to be more responsive to tool development, to the more general aspects of infrastructure support, to the provision of new methods, and to the development of new instruments, new approaches, or software, along with the more traditional forms of infrastructure such as equipment. The agencies have also provided some funding to support what is called discovery science: data mining or exploratory work aimed at gaining a novel insight rather than testing a specific hypothesis. Interdisciplinary research, in general, often requires review processes carefully constructed to permit effective evaluation of novel approaches. More specifically, the plans for generalized tool development will need similar careful review and a mandate provided through the call for proposals.
Recommendation: Funding agencies supporting mathematical research related to the life sciences should give priority to re-
search that addresses intrinsic characteristics of biological systems that reappear at many levels of biological organization: high dimensionality, heterogeneity, robustness, and the existence of multiple spatial and temporal scales.
The committee attempted to identify subdisciplines of mathematics in which broadly based advances would be particularly likely to enhance biological research. However, it concluded that since critical advances had come from nearly every subdiscipline within the mathematical sciences, any such prognostication would be mere guesswork. The committee believes that excellent biology research can be achieved only by answering key questions within that discipline. Specifying a priori the tools to be developed inverts that goal. However, it is clear that if DOE’s applied mathematics program is to contribute to computational biology, it should focus on research that is linked to the intrinsic characteristics of biological systems that reappear at many levels of biological organization: high dimensionality, heterogeneity, robustness, and the existence of multiple spatial and temporal scales. All areas of biology will benefit from improved mathematical representations of biological systems.
STRUCTURE OF THIS REPORT
Future biologists will use an enormous variety of mathematical tools. What will be distinctive about their research are the problems they aspire to solve rather than the tools they use to solve them. For this reason, this report is organized primarily around biological, rather than mathematical, themes. Its survey of mathematical challenges in biology, which ranges from molecular to ecological levels of organization, is necessarily cursory. However, the report provides an introduction to the diverse challenges that characterize contemporary applications of mathematics to biology. The daunting task facing policy makers will be to develop mechanisms that encourage the deep integration of mathematics and biology needed for sustained progress across this vast, exciting, and rapidly evolving scientific frontier.
Alizadeh, F., R.M. Karp, D.K. Weisser, and G. Zweig. 1995. Physical mapping of chromosomes using unique probes. J. Comput. Biol. 2: 159-184.
Benzer, S. 1959. On the topology of the genetic fine structure. Proc. Natl. Acad. Sci. U.S.A. 45: 1607-1620.
Lipan, O., and W.H. Wong. 2005. The use of oscillatory signals in the study of genetic networks. Proc. Natl. Acad. Sciences U.S.A. 10.1073.
Mayr, E. 1982. The Growth of Biological Thought. Cambridge, Mass.: Belknap Press.
Naeem, S., M. Loreau, and P. Inchausti. 2002. Biodiversity and ecosystem functioning: The emergence of a synthetic ecological framework. Pp. 3-11 in Biodiversity and Ecosystem Functioning, M. Loreau, S. Naeem, and P. Inchausti, eds. New York: Springer.
Palumbi, S.R., S.D. Gaines, H. Leslie, and R.R. Warner. 2003. New wave: High-tech tools to help marine reserve research. Front. Ecol. Environ. 1(2): 73-79.
Patil, G.P., and C. Taillie. 2003. Geographic and network surveillance via scan statistics for critical area detection. Statist. Sci. 18(4): 457-465.
Post, D.E., and R.P. Kendall. 2004. Software project management and quality engineering practices for complex, coupled multiphysics, massively parallel computational simulations: Lessons learned from ASCI. Int. J. High Perform. Comput. Applic. 18(4): 399-416.
Running, S.W., R.R. Nemani, F.A. Heinsch, M. Zhao, M. Reeves, and H. Hashimoto. 2004. A continuous satellite-derived measure of global terrestrial primary production. Bioscience 6: 547-560.
Shimizu-Sato, S., E. Huq, J.M. Tepperman, and P.H. Quail. 2002. A light switchable gene promoter system. Nat. Biotechnol. 20(10): 1041-1044.
Turner, D.P., S.V. Ollinger, and J.S. Kimball. 2004. Integrating remote sensing and ecosystem process models for landscape- to regional-scale analysis of the carbon cycle. Bioscience 6: 573-584.
Zeidler, M.P., C. Tan, Y. Bellaiche, S. Cherry, S. Hader, U. Gayko, and N. Perrimon. 2004. Temperature-sensitive control of protein activity by conditionally splicing inteins. Nat. Biotechnol. 22(7): 871-876.