National Academies Press: OpenBook
« Previous: THE COALESCENT AND MUTATION
Suggested Citation:"The Ewens Sampling Formula." National Research Council. 1995. Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology. Washington, DC: The National Academies Press. doi: 10.17226/2121.
×
Page 122
Suggested Citation:"The Ewens Sampling Formula." National Research Council. 1995. Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology. Washington, DC: The National Academies Press. doi: 10.17226/2121.
×
Page 123

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

CALIBRATING THE CLOCK: USING STOCHASTIC PROCESSES TO MEASURE THE RATE OF EVOLUTION 122 The Ewens Sampling Formula Motivated by the realization that mutations in DNA sequences could lead to an essentially infinite number of alleles at the given locus, Kimura and Crow (1964) advocated modeling the effects of mutation as an infinitely- many-alleles model. In this process, a gene inherits the type of its ancestor if no mutation occurs and inherits a type not currently (or previously) existing in the population if a mutation does occur. In such a process the genes in the sample are thought of as unlabeled, so that the experimenter knows whether two genes are different, but records nothing further about the identity of alleles. In this case the natural statistic to record about the sample is its configuration Cn ≡ (C1, C2,. . ., Cn), where Cj = number of alleles represented j times. Of course, C1 + 2C2+ . . . + nCn = n, and the number of alleles in the sample is Kn ≡ C1 + C2 + . . . +Cn. (5.3) The sampling distribution of Cn was found by Ewens (1972): (5.4) for a = (a1,a2,. . .,an) satisfying aj ≥ 0 for j = 1,2,. . .,n and and where θ (n)≡ θ (θ + 1)…(θ+ v− 1). From (5.4) it follows that

CALIBRATING THE CLOCK: USING STOCHASTIC PROCESSES TO MEASURE THE RATE OF EVOLUTION 123 (5.5) and (5.6) being the Stirling number of the first kind. From (5.5) and (5.4) it follows that Kn is sufficient for θ, so that the information in the sample relevant for estimating θ is contained just in Kn. This allows us (Ewens, 1972, 1979) to calculate the maximum likelihood (and moment) estimator of θ as the solution of the equation (5.7) where k is the number of alleles observed in the sample. In large samples, the estimator has variance given approximately by (5.8) For the pyrimidine sequence data described above in the ''Overview" section, there are k = 24 alleles. Solving equation (5.7) for gives = 10.62, with a variance of 9.89. An approximate 95 percent confidence interval for θ is therefore 10.62 ± 6.29. This example serves to underline the variability inherent in estimating θ from this model. The pyrimidine region comprises 201 sites, so that the per site substitution rate is estimated to be 0.053 ± 0.031. The goodness of fit of the model to the data may be assessed by using the sufficiency of Kn for θ: given Kn, the conditional distribution of the allele frequencies is independent of θ. Ewens (1972, 1979) gives further details on this point. To describe alternative goodness-of-fit methods, we return briefly to the probabilistic structure of mutation in the coalescent.

Next: Top-down »
Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology Get This Book
×
 Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology
Buy Paperback | $80.00
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

As researchers have pursued biology's secrets to the molecular level, mathematical and computer sciences have played an increasingly important role—in genome mapping, population genetics, and even the controversial search for "Eve," hypothetical mother of the human race.

In this first-ever survey of the partnership between the two fields, leading experts look at how mathematical research and methods have made possible important discoveries in biology.

The volume explores how differential geometry, topology, and differential mechanics have allowed researchers to "wind" and "unwind" DNA's double helix to understand the phenomenon of supercoiling. It explains how mathematical tools are revealing the workings of enzymes and proteins. And it describes how mathematicians are detecting echoes from the origin of life by applying stochastic and statistical theory to the study of DNA sequences.

This informative and motivational book will be of interest to researchers, research administrators, and educators and students in mathematics, computer sciences, and biology.

READ FREE ONLINE

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!