The exponentially increasing amounts of biological data at all scales of biological organization, along with comparable advances in computing power, create the potential for scientists to construct quantitative, predictive models of biological systems. Broad success would transform basic biology, medicine, agriculture, and environmental science. The main push in biology during the coming decades will be toward an increasingly quantitative understanding of biological function; the rate at which progress occurs will depend on a deeper, effective implementation of quantitative methods and a quantitative perspective within the biological sciences.
The success of this transformation will depend in part on the creation and nurturance of a robust interface between biology and mathematics, which should become a top priority of science policy. The policy challenges will be substantial and multifaceted. The interface between biology and mathematics is an interdisciplinary frontier sprawling across a vast expanse of intellectual terrain that is extraordinarily diverse, indistinctly marked, and growing. The committee will explore this frontier in the chapters that follow. While it is not possible to capture all of the terrain in a single study, the committee attempted to identify striking features that exemplify the opportunities and also the challenges.
The committee offers five recommendations.
Recommendation: Funding agencies supporting mathematical research related to the life sciences should be receptive to research proposals that pertain to any level of biological organization: molecules, cells, organisms, populations, and ecosystems. While much current research can be productively confined to a particular level, there are also substantial challenges and rewards associated with analyzing interactions between levels.
The biological sciences are already becoming more quantitative and data-intensive; indeed, the explosion of data production and the potential for quantitative analysis replete with estimates of precision are the most visible qualities of the biological sciences of the 21st century. Progress in the biosciences will increasingly depend on deep and broad integration of mathematical analysis into studies at all levels of biological organization. No one level of organization stands out as offering singularly attractive opportunities for mathematical applications. The challenges faced at different levels have distinctive characteristics, but there are also unifying themes. Some chapters of the report are organized around the different levels of biological organization, but others—including “The Nature of the Field,” “Historical Successes,” and “Crosscutting Themes”—look more broadly at the commonalities of past and current applications of mathematics to biology.
Recommendation: Funding agencies supporting mathematical research related to the life sciences should give preference to proposals that indicate a clear understanding of the specific biological objectives of the research and include a realistic plan for how mathematicians and biologists will collaborate to achieve them.
The committee regards the interface between mathematics and biology as biology-driven. Research that proceeds by abstracting biological problems away from specific biological contexts and explores the properties of the resultant abstraction is less likely to be effective than research that stays more tightly focused on actual biological questions. However, to maximize productivity, the most powerful and appropriate mathematical tools should be selected to address important biological problems, and this quest benefits from involving the dedicated expertise of mathematical scientists. There are also many cases where results developed within pure mathematics, or in applications of mathematics to physical systems and engineering, later find powerful applications to biology, but this process, too, is most productive when it is biology-driven. Furthermore, the committee was impressed with the sheer scope of mathematical applications to biology and the diverse types of mathematics that are playing
important roles in the life sciences. Hence, it strongly cautions against prejudging which subfields of mathematical research are most likely to contribute to biology.
Recommendation: Funding agencies supporting mathematical research related to the life sciences should give priority to research that addresses intrinsic characteristics of biological systems that reappear at many levels of biological organization: high dimensionality, heterogeneity, robustness, and the existence of multiple spatial and temporal scales.
Biological systems at all scales are characterized by high dimensionality, heterogeneity, robustness to perturbations, and the existence of strongly interacting, highly disparate spatial and temporal scales. While these characteristics also appear in some physical systems that have been successfully modeled mathematically—the modeling of heterogeneous and multiscale phenomena is in particular a vibrant topic of mathematical and engineering research—the modeling of biological systems will require greatly expanded capabilities in these areas. As is widely documented in the report, the characteristics enumerated above recur at all levels of biological organization, from molecules to ecosystems.
Recommendation: Funding agencies supporting mathematical research related to the life sciences should support the refinement of general-purpose tools whose broad biological utility has already been established. Such research might require specialized review criteria, particularly when the focus is on tool enhancement rather than breakthrough research.
Although the committee feels strongly that mathematical research based on premature abstraction of biological problems risks irrelevance, there are more and more instances where mathematical tools have already proven their utility in a broad range of biological applications. Many examples are described in this report. In some instances, as biological applications of these tools have expanded, limitations on their effectiveness have become apparent. Nonetheless, there are opportunities here for effective and important mathematical research that is less tightly tied to particular biological applications than is typically the case. Such research will have varying degrees of innate mathematical interest but can have an important impact on biology.
Recommendation: Funding agencies supporting mathematical research related to the life sciences should place increased emphasis on funding mechanisms and novel approaches to the
organization of interdisciplinary research. The goal should be to foster effective collaboration between mathematical scientists and bioscientists by working to eliminate barriers posed by inadequate communication, disparate timescales for achieving research objectives, inequitable recognition of contributors to interdisciplinary projects, and cultural divisions within universities, research institutes, and national laboratories.
The committee’s charge was to explore research areas at the interface between mathematics and biology that are likely to offer particular promise in the years ahead. Hence, it did not undertake a broad examination of funding mechanisms, training, and the organization of interdisciplinary research projects. However, these issues came up so frequently in its deliberations and are so central to the future prosperity of research at this interface that the committee recommends they receive increased attention. Given the many cultural factors that impede optimum collaboration between mathematical scientists and bioscientists, it would be desirable to explore a variety of mechanisms for overcoming them or minimizing their deleterious effects.
RATIONALE FOR THE RECOMMENDATIONS
The committee’s recommendations are notable both for what they say and what they omit. There is, for example, no call in this report for a major initiative to develop an in silico cell or any other major, potentially multiagency initiative with a specified goal. A recommendation suggesting such a high level of administrative organization so singularly directed seems premature at the least and, the committee believes, would likely be counterproductive at this time. The committee opted for a patient, broadly based, vigorous effort to expand research at the interface between mathematics and biology rather than for a commitment to a small number of high-profile projects with monolithic goals. Because this decision was perhaps the most consequential outcome of the committee’s deliberation, it is appropriate to summarize the committee’s rationale.
The committee undertook this study at a time of dramatic change throughout the biological sciences. During the past decade, a “perfect storm” of developments has touched off broad changes in biology. Unlike most past discontinuities in the biosciences, this one was not triggered by major scientific discoveries: It was triggered instead by a confluence of new technologies that has swept broadly across science and society, as well as by developments internal to biology. Key contributors to this perfect storm include the following:
The development and widespread adoption of automated instruments that produce high fluxes of digital data relevant to all levels of biological organization. These instruments have transformed DNA sequencing; analysis of mRNA and protein populations; determination of protein structures; structural and functional imaging of subcellular organelles, cells, tissues, organs, and whole organisms; electrophysiology; analysis of genetic variation in populations; and ecological changes across the entire biosphere.
The arrival of networked, high-performance computing systems on the desktops of all biologists. This sudden access to computing resources—a phenomenon whose roots lie in the same technological revolution that enabled the development of the high-throughput instruments discussed above—has infused quantitative methods into all facets of biological research. High-performance computing impacts the whole range of research activities, from the low-level processing of raw data and the development of new theoretical frameworks to the organization, dissemination, and analysis of large biological databases.
The success of the Human Genome Project in establishing accurate, wholegenome sequences as central resources in biology. Genome sequences have given biologists their first taste of “complete knowledge” and stimulated intensive efforts to improve our ability to recognize genes in genomic sequence, to discern their functions, and to infer their evolutionary histories. Genome sequences have also led to a renewed appreciation of the molecular unity of life. The conservation of nucleotide and amino acid sequences and the associated conservation of molecular functions have made sequence comparison the centerpiece of genome analysis. Hence, the analytical challenges in genomics are expanding with O(n2) complexity, where n is the number of known nucleotides in the DNA sequence—a number that is itself growing exponentially.
The maturation of a phase of molecular and cellular biology during which biologists acquired robust, albeit largely qualitative, descriptions of the basic molecular pathways that allow the self-replication and development of organisms and that govern their utilization of energy and interactions with their environments. This great flowering of the biological sciences gained momentum following the discovery of the double helical structure of DNA. While much productive research continues to expand upon and refine the basic paradigms of the late 20th century, biology is also visibly in transition. Bioscientists in many research areas recognize the need for a more quantitative, integrated, and predictive understanding of living systems rather than a simple expansion of current modes of biological analysis to encompass ever more phenomena.
Collectively, these developments are transforming biology into a more quantitative, data-intensive science, a transformation that has important
implications for the interface between mathematics and biology. As noted in the Preface, the committee interpreted “mathematics” broadly so as to include computational science and statistics, as well as all aspects of applied mathematics. Its study also spanned all levels of biological organization, from molecules to ecosystems. Given this breadth of view, it is not surprising that the committee is impressed by the richness and diversity of current research at the interface between mathematics and biology. That richness and diversity pose a substantial challenge to science policy. Certainly a strong case can be made for increased policy attention to this interdisciplinary frontier. However, in the committee’s view, it would be unwise to channel this attention too narrowly into the pursuit of particular high-profile opportunities.
The tension between diversified efforts to strengthen important research areas and the commitment of major resources to high-profile projects is an enduring feature of the science-policy landscape. The Human Genome Project was perhaps an ideal model of a successful high-profile project. Much as envisioned by the National Research Council report Mapping and Sequencing the Human Genome,1 it led to a flourishing of technical advances in our ability to analyze DNA and a much closer connection between research on model experimental organisms and human biology. Too, it culminated in the generation of reference databases that have become indispensable tools for everyday research on many organisms. These databases have also become critical frameworks around which expanding knowledge of molecular and cellular processes can be rationally organized. The Human Genome Project introduced high-throughput technology to the biological sciences, which in turn led to a profound change in how biological research is conducted and also to the data-rich world where biologists now work that enables the introduction of more quantitative approaches. On a philosophical level, the Human Genome Project was big science in the service of small science. Throughout its history, the project empowered rather than displaced small laboratories as the engine of biological innovation, and in its aftermath it continues to do so.
Other science-policy initiatives that made more equivocal contributions also provide historical context for the committee’s recommendations. For example, the War on Cancer of the late 1960s is often cited as a misguided effort to concentrate resources on an ill-defined goal toward whose achievement contemporary science offered no clear path. The mainstream
verdict on the War on Cancer holds that patient, diversified support for molecular and cell biology would have been more appropriate than a high-profile project organized around the easily articulated, practical theme of eradicating cancer. However, even an initiative such as the War on Cancer, much of whose rhetoric appears embarrassing in retrospect, can be a powerful stimulant of needed changes in scientific priorities. There is little doubt that the War on Cancer accelerated the diversification of molecular biology beyond its bacterial roots, helped lay the foundation for the recombinant-DNA revolution, and brought basic biology into closer partnership with medicine. There were also collateral benefits from the War on Cancer’s vigorous pursuit of a largely incorrect hypothesis, which was that retroviruses were a major cause of human cancer. The War on Cancer encouraged development of experimental techniques for isolating and growing retroviruses and expanded knowledge about their life cycles, which proved invaluable in confronting the AIDS epidemic. Nonetheless, the committee does not believe that the possibility of collateral or unexpected, unplanned, perhaps serendipitous contributions from a high-profile project would be an effective way to bring quantitative methods into the biological sciences and quantitative descriptions into our understanding of biology.
In considering research opportunities at the interface between biology and mathematics, these historical precedents—the Human Genome Project and the War on Cancer—influenced the committee’s thinking about a key policy question. Should funding agencies channel resources into grand challenges such as these as a way of stimulating interactions between mathematics and biology in a big way? It is not difficult to identify candidates for such grand challenges at all levels of biological organization: They would include the development of a comprehensive, predictive computer model of a particular free-living cell, organ, or ecosystem. At various levels of ambition, such initiatives are already under way. It is also not difficult to see that the rapid expansion in biological data requires a multiplicative, rather than merely an incremental, expansion in the number of researchers working on mathematical aspects of biology. In fact, a narrowly defined, high-profile project like the two featured above might slow the overall introduction of quantitative methods into the biological sciences, might retard the general training of biologists in more quantitative methods, and might not develop the range of mathematical applications that could transform many areas of biology. Mathematical scientists and methods tuned to that particular grand challenge would, of course, be greatly encouraged and benefit directly, and bioscientists involved in the project could come to appreciate the role of mathematics. However, the science-policy dilemma is whether or not biology is best served at this time by the type of organized multiagency, multi-investigator coordina-
tion—and focused infusion of resources—that made the Human Genome Project a success. If defined ambitiously, an all-out effort to create a predictive computer model of a free-living cell—or any similar project at other levels of biological organization—would require an enormous concentration of experimental, theoretical, and computational resources around a well-specified, centrally sanctioned goal.
While recognizing the potential of such an initiative to stimulate coordinated action in an underdeveloped research area, the committee opted instead to recommend a long-term, broad, and diversified nurturance of the interface between mathematics and biology. Two themes that recurred during the committee’s deliberations influenced this choice—the primacy of the biology problem and the lack of predictability.
Primacy of Biology
In applications of mathematics to biology, the committee returned again and again to the primacy of the biological problem. The primary goal of funding agencies and researchers working at the interface between mathematics and biology should be to solve particular biological problems, not to accomplish particular feats in the mathematical description of living systems. Hence, an all-out effort to “understand” the bacterium Escherichia coli or the yeast Saccharomyces cerevisiae, if undertaken, should have biological goals. Perhaps a predictive computer model is part of what is needed, but it should not be the central goal. Indeed, computer modelers participating in such projects should be guided by the biological objectives. Some modeling approaches will be more appropriate to particular objectives than others. Both biological progress and mathematical progress are likely to be optimized by intimate coupling of whatever modeling is done to defined biological objectives. Implicit in this view is the committee’s sense that we are far away from having an in silico cell. A very large amount of experimental bioscience research would be a prerequisite for the modeling, and a wide range of subcellular elements with their own daunting complexities might well have to be tackled first, both to provide models or prototypes and test beds and to facilitate understanding what is needed and what can be ignored in constructing a successful in silico model of a cell. An analogy with the history of artificial intelligence research may help clarify the committee’s thinking. One could envision a “Turing test” for the in silico cell. To conduct such a test, an experimentalist would design manipulations and measurements to be carried out on the target cells; results would then be returned based on the experimental manipulation of real cells on the one hand and their computer simulation on the other. For the simulator to pass the Turing test, it should be impossible for the experimentalist to
devise manipulations and measurements that would distinguish between the two sources of data. Grand as this challenge might be for 21st century biology, we are too far from meeting it for it to be a dominant organizing principle for current research. While some efforts in this direction are clearly worthwhile, it should be kept in mind that premature efforts by the artificial intelligence community to pass the original Turing test floundered. So, too, would a present-day effort to meet the corresponding challenge of an in silico cell. Progress in artificial intelligence has depended on breaking the ultimate task into many smaller, more accessible tasks, each of which is approached with a variety of strategies. Similarly, the committee concluded that contemporary biology would be best off adopting an incremental and diversified approach to the creation of more quantitative, predictive descriptions of living systems.
The committee found the history of applications of mathematics to biology to be full of unexpected turns and reciprocal influences on the two fields and expects this dynamic to continue. Success in big-science initiatives depends on an element of predictability about how areas of science will develop. Certainly there was technical risk that the Human Genome Project would prove premature, and there was even some risk that genome sequences would prove so difficult to interpret that their impact on biology would be minimal. Nonetheless, by the late 1980s, it was abundantly clear that DNA sequencing was capable of providing much useful information about biology, that there were open-ended opportunities to lower its cost and increase its throughput, and that genome sequences would play a very important role in the future of biological research. Similarly, it was apparent by the late 1960s that the successes of the first decades of molecular biological research on bacteria should be extended to eukaryotic and metazoan organisms. The committee is less confident that the future directions of the interplay between mathematics and biology can be reliably predicted in 2005. While it is confident that mathematical methods will become steadily more deeply integrated into biological research, the committee regards the directions in which the biological sciences will evolve in the decades ahead—and the detailed ways in which mathematics will facilitate that evolution—to be highly uncertain. The excitement that surrounds this area of scientific research stems from a blend of opportunity and unpredictability. Many areas of biological research are at points of instability. The ways in which these instabilities resolve will shape the future of the relationship between mathematics and biology.
The committee is confident that deepening interactions between mathematics and biology will transform the biosciences. Of equal interest is the possibility that the areas of mathematics that interact most strongly with biology will themselves be altered by these interactions. Indeed, much of modern mathematics was shaped by four centuries of intimate interaction with the physical sciences and engineering. As the prominence of the biosciences increases—and as they interact more intensively with mathematics—a similar dynamic may be expected to occur. As discussed above, biological processes have different characteristics than the processes commonly encountered in engineering and physical science. In comparison with scientists involved in materials science, plasma physics, or cosmology, bioscientists work on muddier problems. The vast scales of time and space that characterize the world of biology are complemented by nonquantitative, organizational features that are so extraordinarily complex. The number of different interacting components is huge (ranging up to millions or even billions of entities), and they can all possess individual characteristics and contingent properties and be influenced by historical events. The systems are typically far from equilibrium or even stable steady states. High-order interactions between the components are the rule: The amount of feedback regulation in the simplest cell greatly exceeds that presently incorporated into devices designed by humans. (Indeed, it is this reliance on feedback regulation that accounts for the robustness of living systems.) Small events at one spatial or temporal scale often have large effects at another very different scale. These generalizations apply to cells, whose components are molecules, and also to ecosystems, whose components are commonly taken to be populations of individual members of many species. Calculus, the mathematical properties of continuous, very small elements, has been the essential language for describing the physical world and the language employed in the physical sciences, but biology has discrete elements, and the quantitative language of the computational and information sciences appears far more suited to be the language of biology. As a consequence of these many ways in which biology differs from the physical sciences, the committee looks forward to its many influences on mathematics, including some explicitly new mathematics.
An important goal in developing this report was to illustrate, in diverse contexts, these distinctive characteristics of biological systems. They may appear intimidating to nonbiologists at first, but on closer inspection, it is apparent that there has been great progress in dealing with them in the past and that this process is expanding as more mathematicians address biologically motivated problems. Historically, some new math-
ematics simply emerged from the inner workings of the human brain, without the direct influence of external reality. However, many of the finest moments of pure and applied mathematics have arisen in response to humankind’s quest to understand the physical world. The committee believes that the 21st century’s intensifying quest to understand the living world will provide an equally rich stimulus for future triumphs.