Skip to main content

Currently Skimming:


Pages 127-150

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 127...
... 8 Crosscutting Themes The organization of this report around levels of biological organization reflects the committee's view that the interplay between mathematics and biology during the 21st century will be driven by biological problems. Nonetheless, the committee also recognized that this view of the mathematics-biology interface risks the neglect of crosscutting themes-that is, mathematical ideas or areas of productive research activity that cut across levels of biological organization, emerging and re-emerging in diverse biological contexts.
From page 128...
... It illustrates both the progress that has been made during the past decade and the challenges that lie ahead. Finding Patterns in Gene-Expression Data Although the small n, large P problem is encountered in many biological contexts, the challenges of interpreting gene-expression data provide a prototypical example that is of substantial current interest.
From page 129...
... Machine-learning tools based on these techniques, designed in collaborations between bioscientists and mathematical scientists, have already come into widespread use. Pattern recognition via supervised and unsupervised learning is based on quantitative, stochastic descriptions of the data, sometimes referred to as associative models.
From page 130...
... Despite clear successes in applying machine learning to gene-expression data, most studies have oversimplified the problem by treating genes as independent variables. Even when coregulation is taken into account (Cho et al., 1998; Eisen et al., 1998)
From page 131...
... However, a good parametric description for expression values has yet to be determined and may not exist. Gene-gene interactions are fundamental to biological processes, and thus gene-expression data are inherently incompatible with independence assumptions.
From page 132...
... Traditional power calculations (Adcock, 1997) do not address the situation posed by gene-expression data: They estimate the confidence of an empirical error estimate based on a given data set, not how the error rate might decrease given more data.
From page 133...
... These methods focus on the dominant structure present in a data set while potentially missing more subtle patterns that might be of equal or greater biological interest. In contrast, there are a number of local, or bottom-up, unsupervised methods that seek to identify and analyze subpatterns in gene expression data: the SPLASH algorithm (Califano, 2000)
From page 134...
... When suitable models exist, this requirement is a strength: Indeed, it is sometimes possible to make valid inferences from a single instance of a biological entity such as a gene-that is, to analyze a small n, large P problem when n = 1. This escape from the small n, large P problem is somewhat illusory since the HMM assumption enables us to use the large number of bases in the single gene to provide us with nearly identically distributed and independent proxy samples.
From page 135...
... CROSSCUTTING THEMES 135 nucleotides or amino acids corresponding to specific sequences. All of the parameters of the HMMs governing the emissions of variables from specific states and the transitions between states are probabilities.
From page 136...
... 136 MATHEMATICS AND 21ST CENTURY BIOLOGY of a new protein from its sequence simply by determining the family to which the protein is most likely to belong. Of course, if the protein does not belong to any of the established families, this approach fails, and one must resort to ab initio methods.
From page 137...
... CROSSCUTTING THEMES 137 encoded in these newly discovered DNA sequences was an important problem. The basic structure of an HMM maps well to the gene-prediction problem.
From page 138...
... The use of dynamic Monte Carlo methods in statistics began in the early 1980s, when Geman and Geman (1984) and others introduced them in the context of image analysis.
From page 139...
... The committee describes below some uses of Monte Carlo methods in computational biology and discusses the limitations on current methods and possible directions for future research. Gibbs Sampling in Motif Finding The identification of binding sites for transcription factors that regulate when and where a gene may be transcribed is a central problem in molecular biology.
From page 140...
... The study of folding-energy landscapes is generally based on a simplified energy function -- for example, effects of entropy in the solvent are incorporated into artificial hydrophobic terms in the energy function -- and a greatly simplified conformation space. Even with such simplifications, Monte Carlo methods are often the only way to sample this space.
From page 141...
... Other methods will follow, just as others went before. The greatest enabler of this process will be research programs and collaborations that confront mathematical scientists with specific problems drawn from across the whole landscape of modern biology.
From page 142...
... The continued involvement of mathematicians, physicists, engineers, chemists, and bioscientists in instrumentation development has great potential to advance the biological sciences. Mathematical scientists are essential partners in these collaborations.
From page 143...
... emphasized this point, stating that the lack of such methods is "probably the greatest challenge facing cryo-EM." The mathematical sciences have a clear role to play in addressing this challenge. Hyperspectral imaging is the final example here of promising technologies that could be incorporated into many types of biological instrumentation.
From page 144...
... Mathematical scientists, and the funding agencies that support them, should be encouraged to take an interest in the full cycle of experimental design, data acquisition, data processing, and data interpretation through which bioscientists are expanding their understanding of the living world. Applications of the mathematical sciences to biology are not yet so specialized as to make this breadth of view impractical.
From page 145...
... 2002. Strong-associa tion-rule mining for large-scale gene-expression data analysis: A case study on human SAGE data.
From page 146...
... 2000. Using Bayesian networks to ana lyze expression data.
From page 147...
... 2003. Visualizing single molecules inside living cells using total internal reflection fluorescence microscopy.
From page 148...
... 2003. Extracting conserved gene expression motifs from gene expression data.
From page 149...
... 2000. Class prediction and discovery using gene expression data.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.