National Academies Press: OpenBook
« Previous: Bottom-up
Suggested Citation:"The Infinitely-Many-Sites Model." National Research Council. 1995. Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology. Washington, DC: The National Academies Press. doi: 10.17226/2121.
×
Page 127
Suggested Citation:"The Infinitely-Many-Sites Model." National Research Council. 1995. Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology. Washington, DC: The National Academies Press. doi: 10.17226/2121.
×
Page 128
Suggested Citation:"The Infinitely-Many-Sites Model." National Research Council. 1995. Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology. Washington, DC: The National Academies Press. doi: 10.17226/2121.
×
Page 129

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

CALIBRATING THE CLOCK: USING STOCHASTIC PROCESSES TO MEASURE THE RATE OF EVOLUTION 127 (5.10) The only case not covered by equation (5.10) is the one in which a = (0,. . .,0,1). In this case the previous event had to be a coalescence, and so (5.11) The persistent reader will be able to verify that Pn(a) given by the Ewens sampling formula (5.4) does indeed satisfy equations (5.10) and (5.11). The Infinitely-Many-Sites Model The infinitely-many-sites model of Kimura (1969) and Watterson (1975) is the simplest description of the evolution of a population of DNA sequences. The sites in the sequences are completely linked, and each mutation that occurs in the ancestral tree of the sample introduces a new segregating site into the sample. In this process, each new mutation occurs at a site not previously segregating—new mutations arise just once. It follows that at each segregating site, the sample may be classified as type 0 (ancestral) or type 1 (mutant). Of course, in practice we do not know which is which. The sequences in the sample may now be described by strings of 0s

CALIBRATING THE CLOCK: USING STOCHASTIC PROCESSES TO MEASURE THE RATE OF EVOLUTION 128 and 1s. If distinct sequences are treated as alleles, then the sampling theory is reduced to that covered by the Ewens sampling formula. The number Sn of segregating sites is an important summary statistic for the sample. Since each new mutation produces a segregating site, it follows that Sn = µn, the number of mutations in the ancestral tree. The mean and variance of Sn are therefore given by (5.1) and (5.2), respectively. The number of segregating sites has been studied extensively for many variants of the infinitely-many-sites process, including the effects of selection and recombination, for example. Hudson (1991) gives an accessible summary of this work. When there is no recombination, the fundamental results have been established by Watterson (1975), Ethier and Griffiths (1987), and Griffiths (1989). Watterson (1975) parlayed the moments of Sn into an unbiased estimator of θ, namely, (5.12) with variance where . Note that does not depend on knowing which type at a site is ancestral and does not make full use of the data. For the pyrimidine data, there are 21 segregating sites, giving an approximate 95 percent confidence interval for θ of 4.46 ± 3.10. This should be compared to the estimate of 10.62 ± 6.29 obtained from the Ewens sampling formula. Now think of the data as an n × s matrix of 0s and 1s, s being the number of segregating sites in the sample. When 0 is known to be ancestral in each site, Griffiths (1987) established that the data are consistent with the

CALIBRATING THE CLOCK: USING STOCHASTIC PROCESSES TO MEASURE THE RATE OF EVOLUTION 129 infinitely-many-sites model as long as in any set of three rows of the matrix, at most one of the patterns occurs. This is equivalent to the pairwise compatibility condition for binary characters established by Estabrook et al. (1976) and McMorris (1977): two sites are compatible if two or fewer of the patterns 01, 10, 11 occur. When the ancestral state is unknown, an analogous result holds: two sites are compatible if at most three of the patterns 00, 01, 10, 11 occur. This translates into a simple test of whether a given set of binary site data is consistent with the infinitely- many-sites model. If in all pairs of columns at most three of the patterns 00, 01, 10, 11 occur, then there is at least one labeling of the sites that is consistent. McMorris (1977) proved that consistent data remain consistent when the most frequent type is taken as ancestral. In practice, back mutations and recombination make most molecular data inconsistent with this model. However, it is worthwhile to look for maximal subsets of sites that are consistent, as this provides a way to identify regions of the sequence with simple structure. For the pyrimidine data described in Table 5.1, the maximal consistent set has 14 sites, those in positions 2-8, 11-12, 14-16, and 20-21. The remaining 7 sites have some inconsistencies, attributable to back substitutions, for example. Of the 214 = 16,384 possible relabelings of the consistent set, just 16 are consistent. Each of these labelings is associated with a genealogical tree that describes the relationships between the mutations in the coalescent. The precise definition of the (equivalence class of) trees is given in Ethier and Griffiths (1987) and Griffiths (1989). The tree is equivalent to those built using compatibility methods for binary characters; see Felsenstein (1982, pp. 389-393) for a detailed discussion and references. The nodes in the tree represent the mutations that have generated the segregating sites, and the tips represent the sequences. A convenient algorithm for finding these trees is provided by Griffiths (1987), who also shows (Griffiths, 1989) how the probability of a tree with a given ancestral labeling can be computed under the infinitely-many-sites model. Griffiths' program PTREE can then be used

Next: K-Allele Models »
Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology Get This Book
×
 Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology
Buy Paperback | $80.00
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

As researchers have pursued biology's secrets to the molecular level, mathematical and computer sciences have played an increasingly important role—in genome mapping, population genetics, and even the controversial search for "Eve," hypothetical mother of the human race.

In this first-ever survey of the partnership between the two fields, leading experts look at how mathematical research and methods have made possible important discoveries in biology.

The volume explores how differential geometry, topology, and differential mechanics have allowed researchers to "wind" and "unwind" DNA's double helix to understand the phenomenon of supercoiling. It explains how mathematical tools are revealing the workings of enzymes and proteins. And it describes how mathematicians are detecting echoes from the origin of life by applying stochastic and statistical theory to the study of DNA sequences.

This informative and motivational book will be of interest to researchers, research administrators, and educators and students in mathematics, computer sciences, and biology.

READ FREE ONLINE

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!