The Nature of the Field

Biology is in dramatic flux due to a surge of new sources of data, access to high-performance computing, increasing reliance on quantitative research methods, and an internally driven need to produce more quantitative and predictive models of biological processes. The growing infusion of mathematical tools and reasoning into biology may therefore be expected to further transform the life sciences during the decades ahead. This transformation will have profound effects on all areas of basic and applied biology.

Nonetheless, we are not starting from scratch in applying the mathematical sciences to biology. To a greater extent than is widely recognized—by biologists and nonbiologists alike—there has been a string of dramatic successes over more than a century that have been critical to advances in biology and have also led to new mathematics. The role that biological problems played in motivating the development of modern statistics is just one example that will be described in Chapter 2, “Historical Successes.”

The interface between mathematics and biology can be examined across scales of biological problems and across all the major areas of mathematical sciences. Biological scales range from molecules, cells, organisms,

Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.

Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 12

1
The Nature of the Field
INTRODUCTION
Biology is in dramatic flux due to a surge of new sources of data,
access to high-performance computing, increasing reliance on quantita-
tive research methods, and an internally driven need to produce more
quantitative and predictive models of biological processes. The growing
infusion of mathematical tools and reasoning into biology may therefore
be expected to further transform the life sciences during the decades
ahead. This transformation will have profound effects on all areas of basic
and applied biology.
Nonetheless, we are not starting from scratch in applying the math-
ematical sciences to biology. To a greater extent than is widely recog-
nized—by biologists and nonbiologists alike—there has been a string of
dramatic successes over more than a century that have been critical to
advances in biology and have also led to new mathematics. The role that
biological problems played in motivating the development of modern sta-
tistics is just one example that will be described in Chapter 2, “Historical
Successes.”
THE MATHEMATICS-BIOLOGY INTERFACE
The interface between mathematics and biology can be examined
across scales of biological problems and across all the major areas of math-
ematical sciences. Biological scales range from molecules, cells, organisms,
12

OCR for page 12

13
THE NATURE OF THE FIELD
and populations to communities.1 Much of the remainder of this report is
organized around these biological scales, articulating examples of the bio-
logical problems to be addressed at each scale. These scales may be briefly
described as follows:
• Molecules. Molecular biology focuses on the chemical components
of life and their interactions. These components differ greatly in size and
complexity, ranging from atoms and simple ions, through the basic mo-
lecular building blocks of life such as nucleic and amino acids, sugars,
and fats, to polymers and homogeneous and heterogeneous aggregates
of the more basic units, forming macromolecular assemblies and super-
molecular structures that carry out many of the fundamental processes
in the life of a cell. The structures of these objects, as well as the dynamics
of molecules and interactions between them, are central to biological
function.
• Cells. Cell biology is concerned with the self-replicating units of
life, including bacteria, plant, and animal cells, as well as the viruses and
other parasites that infect them. The study of the cell also includes consid-
eration of many interconnected units or subcellular structures, such as
organelles, which range in complexity from peroxisomes, proteosomes,
or lysosomes, to mitochondria and chloroplasts, up to the nucleolus and
the nucleus itself for eukaryotic organisms, and other structural compo-
nents intrinsic to cell function such as the endoplasmic reticulum. The
mechanisms and consequences of cell–cell communication are also of pri-
mary interest.
• Organisms. Organismal biology includes both the properties of
whole organisms and the complex multicellular structures of which they
are composed—the tissues, organs, organ systems, and integrative pro-
cesses that create a robust whole out of diverse parts. Organisms sustain
health and well-being in the face of considerable insults and environmen-
tal disturbances, a process known as homeostasis. Another feature at this
scale is the study of the breakdown of this robustness—in other words,
the etiology and nature of disease.
1Dividing life into levels, or scales, is obvious, is essential for understanding, and reflects
an intrinsic feature of biology. Nonetheless, the levels interact, and some of the division is
for human convenience or is an artifact of scholarly history. Characterizing any one level
requires at least considering its immediately adjacent levels; one could also provide a finer
subdivision of some of the scales, but for clarity, the committee used the most commonly
employed and obvious distinctions, ones that are important for how biologists think about
the object of their study and that provide a means for mathematicians to think about how to
engage biology.

OCR for page 12

14 MATHEMATICS AND 21ST CENTURY BIOLOGY
• Populations. Population biology concerns groups of organisms of
the same species. Genetic variation among individuals is of primary inter-
est, as is the behavior of populations over time in real environments—for
example, speciation, population fluctuation, and extinction.
• Communities and ecosystems. Community ecology is the study of as-
semblages of populations of different species and their interactions. Inter-
actions between living and nonliving components, nutrient and carbon
fluxes, and overall responses of ecosystems to changes in the physical
environment are central issues of ecosystem ecology. The ecology of in-
fectious disease is an example of an interspecies interaction of consider-
able recent interest.
Across all these levels of biological organizations, modeling of bio-
logical processes plays a central role. Much of this report describes the
types of models—and the associated mathematical techniques—that have
been productive in biology. In considering this diverse landscape, it is
important to understand that the word “model” has many meanings in
biology. Concepts in biology are often illustrated by simple verbal or vi-
sual models that are entirely qualitative. For example, most models of
gene-regulatory circuits are of this nature: They specify, in simple draw-
ings, which components of a pathway inhibit, and which stimulate, either
their own synthesis or that of other components. Other models may be
formulated mathematically even though they are primarily intended for
heuristic use rather than for data analysis: Simple differential equations
describing idealized predator-prey interactions are in this category. Fi-
nally, sophisticated models designed to capture subtle features of large,
real data sets are also diverse. Some are sets of partial differential equa-
tions that would look familiar to a classical physicist. Others are designed
to capture subtle statistical properties of data sets without reference to
operative biological mechanisms. Many are stochastic models that guide
sampling from combinatorially explosive sets of possible relationships
between biological objects: Examples include coalescent models of pos-
sible phylogenetic relationships between DNA sequences or between or-
ganisms defined by sets of discrete phenotypes. Hidden Markov models
of sites of transcription-factor binding in the regulatory regions of genes
are also in this category. Throughout the report, this diversity of models
should be kept firmly in mind. Of course, diversity in types of models is
found in all fields of science. Nonetheless, biology is perhaps unique in
the extent to which diversity in modeling practices is the rule, with the
existence of a small set of standard paradigms that are applicable to broad
sets of problems being the exception.

OCR for page 12

15
THE NATURE OF THE FIELD
WHAT HAS CHANGED IN RECENT YEARS?
While scientists have been studying biological systems at these vari-
ous scales for many years and applying varying levels and types of math-
ematics and statistics to them, recent achievements in biology and tech-
nology have combined to create a dramatically new world of opportunity
for the application of mathematics to biology. Rather suddenly, new ex-
perimental methods and technologies have allowed the generation of bio-
logical information at an astonishing rate. This phenomenon is playing
out at all scales of biological analysis:
• On the molecular scale, the human-genome sequence and the se-
quences of many other genomes have been determined; methods are also
available to measure the expression levels of all genes in an organism in a
single experiment. New techniques in protein chemistry, as well as new
radiation sources for structural analyses, are accelerating the rate at which
proteins can be detected in complex biological samples, purified, and char-
acterized structurally at atomic resolution.
• On the cellular scale, new methods of cellular imaging are making
it possible to track subcellular processes and to trace the propagation of
signals at millisecond timescales.
• For organisms, reductions in the cost of noninvasive imaging such
as computed tomography (CT) scanning, magnetic resonance imaging
(MRI), and positron emission tomography (PET) are making these meth-
ods available as routine experimental tools. New methods, such as high-
throughput patch-clamp studies, are providing electrophysiology data at
previously unattainable rates.
• For populations, it is now possible to measure the genetic differ-
ences between organisms at hundreds of thousands of sites in a single,
inexpensive experiment.
• At the level of communities, new remote-sensing technologies are
making it possible to measure entire ecosystems across multiple observa-
tion channels at high resolution.
The data that guide biology are diverse, and their integration is chal-
lenging. Data sets span the entire range from genomic data to satellite
data. Data may be collected at one point in time or continuously, resulting
in a real-time data stream (Turner et al., 2004; Running et al., 2004). In
addition, there are some cases (e.g., the National Science Foundation
(NSF)-funded Long-term Ecological Research Sites or the Framingham
Heart Study) where data have been collected about biological entities over
long periods of time.

OCR for page 12

16 MATHEMATICS AND 21ST CENTURY BIOLOGY
Biological data may contain large errors of unknown origin owing to
complex interactions that are either poorly understood or inherently due
to stochastic processes. Quantities may be inferred using proxy data—for
instance, seasonal or annual temperature in the past can be inferred from
tree rings. Model development and model parameterization must take
these sources of uncertainty—complex interactions and stochastic pro-
cesses—into account.
In the past, data were almost exclusively collected by individual re-
searchers or small groups of researchers, and they remained the property
of those who collected them. Now, data are increasingly collected by
larger groups of scientists and made publicly available. The Protein Data
Bank and the National Center for Ecological Analysis and Synthesis are
two examples of institutions established to promote the sharing of bio-
logical data. One impediment to synthetic analytical work that relies on
large data sets collected by different groups has been the lack of com-
monly accepted standards for collecting, archiving, and annotating data
and of agreement on what kinds of data should be collected. Nonetheless,
there are increasing efforts to come to some agreement. For instance, Eco-
logical Metadata Language (EML) is a way to standardize data annota-
tion that is increasingly embraced by the ecological community.
Some of the greatest computational challenges today come from data
collected at the two extreme ends of spatial scales: genomic data and sat-
ellite data. The explosive rate of genomic data generation is well known.
In the case of more global data, Palumbi and colleagues (2003), for ex-
ample, present a number of new data acquisition methods, including re-
mote sensing to measure characteristics of the ocean (temperature, wind,
surface elevation) or trace changes in ocean currents and DNA sequenc-
ing to assess spatial and temporal trends in genetic diversity. New com-
putational methods need to be developed to extract useful informa-
tion from these vast amounts of data.
The analysis of spatial data poses particular challenges due to the cor-
relations that are inherent in spatial processes and due to local interac-
tions and stochastic effects. As an example, a method widely used for
detecting anomalies in space is the spatial-scan statistic. This statistic has
been applied to the detection of disease outbreaks or invasive species (Patil
and Taillie, 2003). One challenge is to couple on-the-ground observations
and remotely sensed data. Visualization tools are indispensable when
analyzing spatial data.
In parallel with the accelerating rate of data acquisition, there has been
an increase in the computational power available to the scientific commu-
nity—on the desktop, in a research unit, or through the Internet, from a
national resource or a grid of independent systems. Before computers
were widely available, discoveries were made by combining data from

OCR for page 12

17
THE NATURE OF THE FIELD
experiments or observational studies and their statistical analysis with
simple models that served as the conceptual framework. When comput-
ing power became available, this paradigm was extended to include com-
putation. This process started in areas of biology that are closely aligned
with the physical sciences (e.g., protein structure determination) and
gradually spread throughout biology. Concomitantly, biologists became
increasingly dependent on sophisticated data-analysis tools and complex,
data-driven models.
When analytical models are improved to the point of being good rep-
resentations of a biological system, they are often analytically intractable,
and biologists must turn to computation. In many cases, such models are
systems of differential equations, which are fairly amenable to solution on
computers thanks to mathematical advances of the past few decades. The
numerical analysis community continues to increase the set of partial dif-
ferential equations for which reliable, fast solvers exist. Robust techniques
are now available for solving problems in electrostatics, diffusion, elastic-
ity, and fluid dynamics. However, the needs of biology are among the
most challenging. Particularly because it often calls for multiscale models,
which include both deterministic and stochastic elements, the solution of
sets of biologically motivated equations frequently exceeds current capa-
bilities. There are not yet adequate capabilities for evaluating the range of
uncertainties embedded in a computational model due to its parameters,
discretization, and structure. Few tools are available to deal with these
issues when the models are applied to large systems. Mathematical meth-
ods are also being used in new ways to inform experimental design. Tra-
ditionally, experimental design decisions, such as choosing the nature of
a perturbation, the response measurements, whether to or how to do gene
disruption, and the timing and scope of response measurement, have been
made by the experimental biologist with only minimal consideration of
the computational analysis that will be performed based on the resulting
data. This was perhaps unavoidable, as techniques for gene disruption
and high-throughput assays have until recently been the major limiting
factors. However, as experimental genomic science advances, options are
becoming increasingly available. In the future, experimental design con-
siderations must be tightly coupled to the mathematical representations
to be used to model the system and the computational and statistical meth-
ods to be used for model identification and parameter estimation.
For example, switches for transcription or protein modification have
recently become available (Shimizu-Sato, 2002; Zeidler et al., 2004), mak-
ing it feasible to implement oscillatory perturbations in a systematic man-
ner. A natural question arises: Does oscillatory perturbation have any ad-
vantage over traditional impulse or step perturbation? And if so, how do
we quantify the advantages? A related question is what mathematical and

OCR for page 12

18 MATHEMATICS AND 21ST CENTURY BIOLOGY
statistical techniques are useful for the analysis of the response to such
oscillatory signals. To answer these questions, we need to study how to
estimate the system parameters based on each type of perturbation and
how the estimation may be affected by intrinsic and measurement noise.
The mathematical techniques involved include Laplace transforms, opti-
mal time and frequency sampling, and estimation theory under ill-posed
conditions. Even for a stochastic network as simple as an autoregulatory
gene whose protein activities are modulated by an input signal, these
questions have not been studied until very recently (Lipan and Wong,
2005), and much more remains to be done. Although the benefits of oscil-
latory perturbations are widely appreciated for the analysis of physical
systems, this approach is rarely used in the study of cellular systems. Care-
ful mathematical and simulation studies may help to interest experimen-
tal investigators in evaluating the promise of novel perturbations of bio-
logical systems.
From the point of view of software, there are a number of open issues
that are common to many scientific disciplines. These issues include the
fact that codes are complex and difficult to support, especially on parallel
computers (Post and Kendall, 2004). Software architectures that allow
problem specification and access to all layers of code development would
constitute an important step forward. Message passing interface (MPI)-
based development platforms such as Portable, Extensible, Toolkit for
Scientific Computation (PETSc) can accelerate the rate of progress, and
efforts like DOE’s Scientific Discovery through Advanced Computing
(SciDAC) program are promising. An upcoming report from the National
Academies’ Computer Science and Telecommunications Board will dis-
cuss, in particular, the interface between biology and the computing
world.
Another issue is verification and validation. Verification is defined
as determining whether the calculations truly correspond to the equa-
tions that constitute the analytical model. There are well-defined tech-
niques for verification, but they are rarely used systematically. Valida-
tion is a broader concept that is generally understood to mean the
assessment of the model quality—that is, does the software correspond
to biological reality? A part of the validation process that is common in
the physical sciences but little used in biology is the conduct of experi-
ments designed specifically to test the computational model itself rather
than to study new phenomena. That is, we need to consider the output of
a computational model as a testable hypothesis and then design biologi-
cal experiments that try to disprove the hypothesis by collecting appro-
priate data or exploring whether qualitative features of the computer
output exist in real systems. In order for such approaches to contribute
significantly to progress in understanding biology, the experimental com-

OCR for page 12

19
THE NATURE OF THE FIELD
munity will need to be convinced of the value of such studies, which may
not directly address the experimentalists’ goals.
WHAT MAKES COMPUTATIONAL BIOLOGY PROBLEMS HARD?
While the challenges posed by rapidly increasing amounts of data cut
across all the sciences, those challenges posed by increased amounts of
data in biology are uniquely difficult. At all scales of analysis, biology
involves large numbers of types of objects, large numbers of objects of
each type, and complex interactions between objects. In addition, biologi-
cal objects can possess individuality, a history (e.g., of external stimuli,
environmental insults, or inheritance), and a contingent existence (e.g.,
the location of components or of neighbors can be significant.) Except for
relatively small contributions from phenomena such as bilateral body plan
(when and where relevant), schemes for simplification that arise from
symmetry are rarely possible at any scale in biology. The systems are ex-
traordinarily heterogeneous in space and time, yet stunningly robust in
the face of perturbation. Interactions across vastly different scales can have
dramatic effects on system behavior. Thus, a tremendous quantity of data
must be managed in creating useful biological models. Moreover, some of
those data are very difficult to obtain. These issues are described in more
detail below for the molecular scale; similar issues arise at the other scales
and across scales:
• Large number of types of objects and objects of each type. At the molecu-
lar scale, with minor exceptions, proteins are synthesized from only 20
different amino acids. These units combine to produce tens of thousands
of independently encoded proteins in humans, and there are many differ-
ent mechanisms that can lead to the creation of variants—sometimes as-
tonishing numbers of variants—of each of these independently encoded
proteins. Analogous phenomena occur with nucleic acids and the poly-
mers of sugars, fats, and other molecules.
• Complex interactions between objects. DNA and RNA interact in-
tramolecularly and with each other. DNA is the template for creating
RNA, and RNA is the template for proteins. RNA and protein combine to
form superstructures that themselves play central roles in the translation
of RNA into proteins. Proteins interact intramolecularly and with other
proteins, as well as with RNA, DNA, and a large variety of other mol-
ecules to act as enzymes, structural components, signals, receptors of sig-
nals, and inhibitors of signals.
• Robustness. The many types of molecules in biological systems com-
bine to form extraordinarily robust subcellular organelles, cells, tissues,
organs, organ systems, organisms, populations, and communities. Bio-

OCR for page 12

20 MATHEMATICS AND 21ST CENTURY BIOLOGY
logical interaction networks achieve this robustness through high levels
of redundancy, modularity, heterogeneity, and feedback. It is not uncom-
mon to find that genetic ablation of a normally critical signaling pathway
not only fails to kill a cell but causes only the most subtle changes in its
behavior. Other pathways can often provide similar functions even if they
do not normally do so when the primary pathway is present. Similarly,
feedback mechanisms make biological systems extraordinarily robust
against both internal and external perturbations. Genetic variation is a
particularly important example of an internal perturbation. Between the
two copies of the genome present in each human, there are millions of
sequence differences, many of which affect the regulation of genes and
the structures of the encoded proteins.
• Complex interactions across scales. All of this complexity is present at
each scale of organization and in the interactions between scales. For ex-
ample, it is infeasible to simulate organ-scale electrophysiology by mod-
eling the ion fluxes through every membrane channel in every cell. There
is, at the present time, no systematic way of bypassing this problem. Ex-
isting approaches tend to be hybrid methods, which overcome such bottle-
necks by using different models on different temporal or spatial scales
coupled with heuristic models to transfer information between them. Sta-
tistical methods are also used to integrate information obtained from fine-
scale calculations to estimate the net response of an organ, tissue, or neu-
ral network.
FACTORS COMMON TO SUCCESSFUL INTERACTIONS
BETWEEN THE MATHEMATICAL SCIENCES
AND THE BIOSCIENCES
As the committee examined the historical record and contemporary
experience in applying mathematics to biology, a few simple observations
that commonly underlie successful interactions came to the fore:
• The biological problem has always been primary. Successful appli-
cations of mathematics to biology are driven by a deep understanding of
the relevant biology. Until this understanding is in place, it is not possible
to state the problem with sufficient clarity and at a sufficient level of ab-
straction to allow a meaningful mathematical formulation. Successful ap-
plications always involve major simplifications of the actual system. How-
ever, these simplifications must preserve the system’s essential features.
This first observation gives rise to the following recommendation:
Recommendation: Funding agencies supporting mathematical
research related to the life sciences should give preference to
proposals that indicate a clear understanding of the specific bio-

OCR for page 12

21
THE NATURE OF THE FIELD
logical objectives of the research and include a realistic plan for
how mathematicians and biologists will collaborate to achieve
them.
There are dual benefits of preferential support for proposals rich in
both biological understanding and clarity about the mechanisms by which
the collaboration will advance. Naturally, well-organized and well-posed
research aimed at important biological problems will pay off early on,
will help sustain further studies, and will open up new directions for fruit-
ful inquiry. In addition, establishing such preferential support minimizes
the risk of applying mathematics to poorly posed biological problems and
maximizes the potential impact of quantitative tools. Rigorous
prioritization will support a structural change in the biological sciences
that encourages the use of quantitative approaches of all categories. More
generally, success stories based on such considerations will be readily ex-
ported to other biological research problems and will serve to validate the
role of mathematicians in biology qua mathematicians rather than just as
technical contributors and to validate, for experimental biologists, the role
of mathematics itself in understanding biology.2
• As the committee discusses in more detail below, cultural and lin-
guistic barriers create a potentially large divide between mathematicians
and biologists. It is only after achieving a common language in which to
discuss a particular problem that mathematics can be applied effectively.
The common ground can lie anywhere along the spectrum from the lan-
guage of biology to that of mathematics, but it has to be found, and each
side has to move toward the other to do so. That said, it is important to
recognize that communication barriers that appear to be linguistic often
have deeper roots. Many of the difficulties that researchers trained in the
physical sciences, engineering, and mathematics have in communicating
with biologists relate to fundamental differences between biology and the
physical sciences. Basic laws typically do not exist, and even basic prin-
ciples are often still undiscovered. Once they understand that progress is
possible despite these obstacles, some nonbiologists thrive in this strange,
new scientific environment. Others find that their skills are best applied
in better-defined settings.
• Initial progress has almost always depended on existing math-
ematical tools, often quite elementary ones. The complexity, particularly
at early stages of analysis, is in the biology, not the mathematics. Any
improvements to mathematical tools come later.
2 Of course this gap does not exist if one individual is well grounded in both fields. How-
ever, it is more common, and generally more practical, to collaborate rather than to learn
two disparate fields.

OCR for page 12

22 MATHEMATICS AND 21ST CENTURY BIOLOGY
• Formulation of the problem has been as important as solving it. As
they are first formulated, biological problems are typically ill posed or
incompletely posed. The process for translating them into formal state-
ments in the language of mathematics introduces a rigor that often uncov-
ers questions that might not otherwise have been asked. The translation
process causes both bioscientists and mathematicians to think carefully
about all of the parts of the system and to decide systematically which
variables, effects, and interactions to take into account. This process is
also a critical test of whether the biologists and mathematicians working
together on a problem have actually arrived at a common language.
• Even though many biological problems have been solved using
simple mathematics, a sophisticated and experienced mathematical scien-
tist has often been required to find the solution. This paradox arises be-
cause of the difficulty of abstracting the problem from its biological messi-
ness and sifting through the enormous collection of tools and methods
already potentially available for addressing it. In addition, the solution
often involves applying familiar mathematical methods in unfamiliar
ways or contexts.
There have been cases where mathematical techniques were applied
to biological problems with inadequate appreciation for the finer points
of the biology, leading some to overstate the significance of their math-
ematical results. The result was statements such as that of Mayr (1982, p.
304), who when explaining the role mathematics played in evolving the
thinking of the ancient Greeks, wrote “This was the first of countless epi-
sodes in the history of biology where mathematics or the physical sci-
ences exerted a harmful influence on the development of biology.” This
notion has held back the full introduction and exploitation of the power
of mathematics in the study of biology. At the same time, a healthy skep-
ticism is necessary for making progress in the sciences, and too-universal
acceptance of approaches can impede progress as much as outright rejec-
tion. A balance of different approaches often yields the greatest gains, as
eloquently expressed by Naeem (2002) in the context of ecology: “. . . eco-
logical truth lies in the confluence of observation, theory, and experiment.
It is through discourse among empiricists and theorists that findings and
theory are sorted and matched and where there is a lack of correspon-
dence, new challenges identified.”
PREPARING THE GROUND FOR IMPROVED SYNERGIES OF
BENEFIT TO BOTH FIELDS
Progress in the life sciences will increasingly depend on deep and
broad integration of mathematical analysis into the study of all levels of

OCR for page 12

23
THE NATURE OF THE FIELD
biological organization. No one level of organization stands out as offer-
ing singularly attractive opportunities for mathematical applications. The
challenges faced at different levels have distinctive characteristics, but
there are also unifying themes.
Recommendation: Funding agencies supporting mathematical
research related to the life sciences should be receptive to re-
search proposals that pertain to any level of biological organi-
zation: molecules, cells, organisms, populations, and ecosys-
tems. While much current research can be productively confined
to a particular level, there are also substantial challenges and
rewards associated with analyzing interactions between levels.
The empirical factors for success listed in the previous section all point
to one critical element: A true collaboration that brings together skills from
the mathematical sciences and a deep knowledge of biology must be es-
tablished. In response to this basic need, funding programs, research in-
stitutions, and groups can experiment with conditions to facilitate such
an establishment. Some of the factors to be addressed include these:
• Communication. It is clear from the above that mathematical scien-
tists and biologists have to find a common language so that all of the
essential richness of a biological problem can be captured and formulated
in mathematical terms. This can and should happen in both directions,
with some biologists developing a deeper and more sophisticated under-
standing of quantitative methods and many mathematical scientists ex-
panding their understanding of biology to appreciate the scope of the
problems to be addressed. (The primary model in the mind of the com-
mittee is mathematical scientists contributing to biology research teams,
not, for the most part, biologists learning all the necessary mathematics
and statistics.) Interestingly, some of the most successful practitioners at
the interface have come out of the physical and mathematical sciences,
bringing a deep understanding of quantitative methods as well as biol-
ogy, but neither to the exclusion of the other.
• Timescales. The professional timescales of the fields are often mis-
matched, and both sides of the collaboration need to develop an apprecia-
tion for this reality. On the one hand, if a biological challenge demands
the development of deep new mathematics or statistics, this process will
typically require a detour of months or years, time that is not consistent
with the competitive nature of researchers in the biological sciences and
the expectations of them. On the other hand, existing mathematical meth-
ods might require the generation of additional data (e.g., to enable good
bounds on parameters or uncertainties), which might be time consuming
and initially unrewarding to biologists.

OCR for page 12

24 MATHEMATICS AND 21ST CENTURY BIOLOGY
• Recognition and advancement. If mathematical scientists are to in-
vest time and effort in learning biology and to contribute what, from a
mathematical perspective, may be relatively simple methods, then the
mathematical sciences must adjust their reward systems. This difficulty
is an age-old problem in academic departments: It flares up as practitio-
ners in a field venture out to the interface with another field and devote
more intellectual energy to transitioning research results than to directly
advancing their own field’s research agenda. Of course, university de-
partments will not adjust something as fundamental as their own inter-
nal reward system in the absence of external stimuli and external re-
wards. While simply putting forward funds for collaborations at the
interface will provide some incentive, the funding agencies need to con-
sider special honorific awards and special programs, and possibly other
mechanisms, to encourage the needed changes in systems for recognition
and advancement. Adjustments would also help with the differences in
timescales between the expectations and realities of doing biology and
doing mathematics, and agencies could consider mechanisms to satisfy
both timescales. Provision of more funding at the interface, as planned, is
the first step.
Recommendation: Funding agencies supporting mathematical
research related to the life sciences should place increased em-
phasis on funding mechanisms and novel approaches to the
organization of interdisciplinary research. The goal should be
to foster effective collaboration between mathematical scientists
and bioscientists by working to eliminate barriers posed by in-
adequate communication, disparate timescales for achieving
research objectives, inequitable recognition of contributors to
interdisciplinary projects, and cultural divisions within univer-
sities, research institutes, and national laboratories.
In spite of the committee’s belief that most problems in biology can
initially be addressed with fairly standard mathematics or statistics, there
are occasions where exceptionally innovative researchers may be driven
by the particularities of a problem to break out of traditional mathematics
paradigms and develop truly novel methods. R.A. Fisher’s work on the
analysis of variance is a dramatic example addressed in Chapter 2, “His-
torical Successes.”
There are also many examples where interesting mathematical prob-
lems were abstracted away from the biological problems that motivated
them, leading to mathematical sciences research that is valuable in its own
right. Examples of this type are particularly common in combinatorics,
algorithmics, and computational complexity theory. A typical example is
the “adjacent ones” problem, which first arose in the 1950s in the context

OCR for page 12

25
THE NATURE OF THE FIELD
of fine-structure genetic mapping. Once posed, it continued to interest
mathematicians—and occasionally found new biological applications, in-
cluding in the Human Genome Project—for 40 years (Benzer, 1959;
Alizadeh et al., 1995). The committee’s sense is that the flow of research
problems from biology back into mathematics is likely to become increas-
ingly common as research expands at the interface of the two fields.
It is important that more biologists recognize the value of true col-
laboration with mathematical scientists. There is a common presumption
that mathematical sciences research can be done in a vacuum—that is,
that mathematical scientists tend to learn about a problem, retreat to their
offices for several months, and reappear only when they have completed
their research. This model is not at all true in applied areas, but many
biologists have not been engaged in the iterative give-and-take that melds
the complementary skills of mathematical and biological scientists to cre-
ate an advance that neither could have achieved alone. Similarly, many
biologists have not seen the powerful difference between using off-the-
shelf formulas or software and using a method that is adapted by an expe-
rienced mathematical scientist for a particular application.
The charge to the committee asked for recommendations on how the
DOE’s applied mathematics program can best support its computational
biology aims. One thrust for that program should be the refinement of
general-purpose tools whose broad biological utility has already been es-
tablished. Some knowledge of biological applications is often important
for pointing this research in optimally useful directions, but intimate fa-
miliarity with specific biological problems may be unnecessary. A good
example of this dynamic involves applications of Markov chain Monte
Carlo (MCMC) methods in biology. These applications are now suffi-
ciently well established that classes of mathematical problems, such as
those governing the convergence properties of Markov chains, can be
identified whose solution would almost surely prove relevant to a wide
array of biological problems.
Recommendation: Funding agencies supporting mathematical
research related to the life sciences should support the refine-
ment of general-purpose tools whose broad biological utility
has already been established. Such research might require spe-
cialized review criteria, particularly when the focus is on tool
enhancement rather than breakthrough research.
The committee believes that most advances in the near future in com-
putational biology at all scales will come from adapting established math-
ematical tools to biological problems. Biology is complicated, and what is
needed is insight about which complications can be ignored and which
are essential; it is easier to reach that insight when dealing with well-char-

OCR for page 12

26 MATHEMATICS AND 21ST CENTURY BIOLOGY
acterized mathematical tools rather than novel ones that might add com-
plexity. These insights will guide the application of sophisticated, but of-
ten familiar, mathematical tools to extract as much information as pos-
sible from large data sets. In some happy instances, this process will
spawn new mathematics. However, no amount of mathematical sophisti-
cation can overcome the intrinsic complexity of biological systems. The
key will be to achieve steady improvements in our ability to simplify and
approximate these systems without losing their essential characteristics.
While this process of reduction will certainly require researchers with a
good sense of the power and limitations of relevant mathematical tools, it
will predominantly require an intimate knowledge of the living systems
that they are attempting to approximate. By working for the most part
with well-established mathematical tools, the mathematician and the bi-
ologist can focus on what data might be missing or what approaches might
not have been tried, in order to make the problem tractable. It should be
easier to ascertain which features of the complexity can be neglected or
ignored, which are essential, and which approaches can provide the best
input for mathematical analysis.
The range of mathematical sciences methods that have successfully
contributed to biology is very large, as indicated in the rest of this report.
Therefore, recommending that the DOE applied mathematics program
cover those demonstrated areas of mathematics is not a restriction; in fact,
it would require a substantial enlargement of that program’s traditional
scope. Some of the most promising areas are discussed in the chapter
“Crosscutting Themes,” but these should be seen as illustrative, not ex-
clusive. As biology itself proceeds, the range of applicable mathematical
methods might well expand. Openness, or inclusiveness, will be impor-
tant to ensure that the methods of mathematics can contribute most effec-
tively to biology.
The federal agencies have set up processes recently to be more re-
sponsive to tool development, to the more general aspects of infrastruc-
ture support, to the provision of new methods, and to the development of
new instruments, new approaches, or software, along with the more tra-
ditional forms of infrastructure such as equipment. The agencies have also
provided some funding to support what is called discovery science: data
mining or exploratory work aimed at gaining a novel insight rather than
testing a specific hypothesis. Interdisciplinary research, in general, often
requires review processes carefully constructed to permit effective evalu-
ation of novel approaches. More specifically, the plans for generalized
tool development will need similar careful review and a mandate pro-
vided through the call for proposals.
Recommendation: Funding agencies supporting mathematical
research related to the life sciences should give priority to re-

OCR for page 12

27
THE NATURE OF THE FIELD
search that addresses intrinsic characteristics of biological sys-
tems that reappear at many levels of biological organization:
high dimensionality, heterogeneity, robustness, and the exist-
ence of multiple spatial and temporal scales.
The committee attempted to identify subdisciplines of mathematics
in which broadly based advances would be particularly likely to enhance
biological research. However, it concluded that since critical advances had
come from nearly every subdiscipline within the mathematical sciences,
any such prognostication would be mere guesswork. The committee be-
lieves that excellent biology research can be achieved only by answering
key questions within that discipline. Specifying a priori the tools to be
developed inverts that goal. However, it is clear that if DOE’s applied
mathematics program is to contribute to computational biology, it should
focus on research that is linked to the intrinsic characteristics of biological
systems that reappear at many levels of biological organization: high di-
mensionality, heterogeneity, robustness, and the existence of multiple spa-
tial and temporal scales. All areas of biology will benefit from improved
mathematical representations of biological systems.
STRUCTURE OF THIS REPORT
Future biologists will use an enormous variety of mathematical tools.
What will be distinctive about their research are the problems they aspire
to solve rather than the tools they use to solve them. For this reason, this
report is organized primarily around biological, rather than mathemati-
cal, themes. Its survey of mathematical challenges in biology, which
ranges from molecular to ecological levels of organization, is necessarily
cursory. However, the report provides an introduction to the diverse chal-
lenges that characterize contemporary applications of mathematics to bi-
ology. The daunting task facing policy makers will be to develop mecha-
nisms that encourage the deep integration of mathematics and biology
needed for sustained progress across this vast, exciting, and rapidly evolv-
ing scientific frontier.
REFERENCES
Alizadeh, F., R.M. Karp, D.K. Weisser, and G. Zweig. 1995. Physical mapping of chromo-
somes using unique probes. J. Comput. Biol. 2: 159-184.
Benzer, S. 1959. On the topology of the genetic fine structure. Proc. Natl. Acad. Sci. U.S.A. 45:
1607-1620.
Lipan, O., and W.H. Wong. 2005. The use of oscillatory signals in the study of genetic net-
works. Proc. Natl. Acad. Sciences U.S.A. 10.1073.
Mayr, E. 1982. The Growth of Biological Thought. Cambridge, Mass.: Belknap Press.

OCR for page 12

28 MATHEMATICS AND 21ST CENTURY BIOLOGY
Naeem, S., M. Loreau, and P. Inchausti. 2002. Biodiversity and ecosystem functioning: The
emergence of a synthetic ecological framework. Pp. 3-11 in Biodiversity and Ecosystem
Functioning, M. Loreau, S. Naeem, and P. Inchausti, eds. New York: Springer.
Palumbi, S.R., S.D. Gaines, H. Leslie, and R.R. Warner. 2003. New wave: High-tech tools to
help marine reserve research. Front. Ecol. Environ. 1(2): 73-79.
Patil, G.P., and C. Taillie. 2003. Geographic and network surveillance via scan statistics for
critical area detection. Statist. Sci. 18(4): 457-465.
Post, D.E., and R.P. Kendall. 2004. Software project management and quality engineering
practices for complex, coupled multiphysics, massively parallel computational simula-
tions: Lessons learned from ASCI. Int. J. High Perform. Comput. Applic. 18(4): 399-416.
Running, S.W., R.R. Nemani, F.A. Heinsch, M. Zhao, M. Reeves, and H. Hashimoto. 2004. A
continuous satellite-derived measure of global terrestrial primary production. Bioscience
6: 547-560.
Shimizu-Sato, S., E. Huq, J.M. Tepperman, and P.H. Quail. 2002. A light switchable gene
promoter system. Nat. Biotechnol. 20(10): 1041-1044.
Turner, D.P., S.V. Ollinger, and J.S. Kimball. 2004. Integrating remote sensing and ecosystem
process models for landscape- to regional-scale analysis of the carbon cycle. Bioscience
6: 573-584.
Zeidler, M.P., C. Tan, Y. Bellaiche, S. Cherry, S. Hader, U. Gayko, and N. Perrimon. 2004.
Temperature-sensitive control of protein activity by conditionally splicing inteins. Nat.
Biotechnol. 22(7): 871-876.