National Academies Press: OpenBook
« Previous: MAXIMUM LIKELIHOOD ESTIMATION
Suggested Citation:"Efficient Algorithms." National Research Council. 1995. Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology. Washington, DC: The National Academies Press. doi: 10.17226/2121.
×
Page 35
Suggested Citation:"Efficient Algorithms." National Research Council. 1995. Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology. Washington, DC: The National Academies Press. doi: 10.17226/2121.
×
Page 36

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

MAPPING HEREDITY: USING PROBABILISTIC MODELS AND ALGORITHMS TO MAP GENES AND GENOMES 35 genotype for every family member, including the precise parental chromosome from which each allele was inherited. Given complete information, it is usually easy to estimate the required parameters—for example, the recombination frequency can be estimated by counting recombinant chromosomes, and the penetrance can be estimated by finding the proportion of individuals with a disease-predisposing genotype that manifests the disease. Unfortunately, one typically has only incomplete data Y from which it is difficult to estimate θ directly. The maximum likelihood estimate is the value that makes the observed data Y most likely to have occurred, that is, the value that maximizes the likelihood function, L(θ): = P(Y|θ). Using Bayes Theorem, one can calculate L(θ) by the equation: where the summation is taken over all possible values for the complete data X. The sum is easy to calculate in theory: it decomposes into various terms (corresponding to each individual and each genetic interval) that are conditionally independent, given complete data X. One can then calculate by numerical maximization methods. To determine whether is significantly different from a null value θ0 (for example, to see whether an estimated recombination frequency is significantly less than 50 percent), one examines the likelihood ratio Z = L( ) / L(θ0). If Z exceeds some appropriate threshold T, a statistically significant effect has been found. In principle, virtually any genetic problem can be treated by this approach. In practice, however, two important issues arise: (1) efficient algorithms and (2) statistical significance. Efficient Algorithms The number of terms in the sum L(θ) scales as roughly O(ecmn) where m is the number of people in the family, n is the number of genetic markers studied, and c is a constant. Except in the case of the smallest

MAPPING HEREDITY: USING PROBABILISTIC MODELS AND ALGORITHMS TO MAP GENES AND GENOMES 36 problems, it is infeasible to enumerate all the terms in the sum. Thus, it is a challenge even to calculate the likelihood L(θ) at a single point, let alone to find the value that maximizes the function. Considerable mathematical attention has been devoted to finding efficient ways to calculate L(θ). The first breakthrough was due to Elston and Stewart (1971), who devised an algorithm for computing the sum in a nested fashion from the youngest generations of a family to the oldest. The algorithm used the fact that the youngest generation is dependent only on its parents. The running time scales as O(mec'n), being exponential only in the number of genetic markers. This algorithm has been a mainstay of genetic calculations but becomes cumbersome when geneticists wish to employ ever-denser maps involving more genetic markers. Subsequently, Lander and Green (1987) developed an alternative algorithm based on hidden Markov models for nesting the sum according to the genetic markers used, starting with markers at one end of the map and working across to the other end. These authors also exploited the expectation-maximization algorithm from statistics (a well-known maximum likelihood approach) to aid in the multidimensional search for . The algorithm scales as O(nec"m), being linear in the number of markers although exponential in the number of family members. This algorithm has proven very useful for constructing dense genetic linkage maps, which involves studying many markers in families of limited size. But it is not practical for studying large disease pedigrees. Recently, mathematical geneticists have explored ways to approximate L(θ) by sampling from the sum. Modern techniques such as Gibbs sampling and importance sampling have been introduced in the past few years (Kong, 1991; Kong et al., 1992 a,b; Kong et al., 1993; Thompson and Wijsman, 1990). These methods exploit the fact that each piece of missing data depends only on local information in the pedigree: the probability distribution of the genotype of genetic marker i in individual j depends only on the genotypes of nearby markers in nearby individuals. Finding better ways to compute the likelihood function remains a problem from the standpoint of genetics and an excellent test bed for new statistical estimation techniques.

Next: Excursion: Susceptibility to Colon Cancer in Mice and the Large Deviation Theory of Diffusion Processes »
Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology Get This Book
×
 Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology
Buy Paperback | $80.00
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

As researchers have pursued biology's secrets to the molecular level, mathematical and computer sciences have played an increasingly important role—in genome mapping, population genetics, and even the controversial search for "Eve," hypothetical mother of the human race.

In this first-ever survey of the partnership between the two fields, leading experts look at how mathematical research and methods have made possible important discoveries in biology.

The volume explores how differential geometry, topology, and differential mechanics have allowed researchers to "wind" and "unwind" DNA's double helix to understand the phenomenon of supercoiling. It explains how mathematical tools are revealing the workings of enzymes and proteins. And it describes how mathematicians are detecting echoes from the origin of life by applying stochastic and statistical theory to the study of DNA sequences.

This informative and motivational book will be of interest to researchers, research administrators, and educators and students in mathematics, computer sciences, and biology.

READ FREE ONLINE

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!