National Academies Press: OpenBook

Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology (1995)

Chapter: Approximations for the Ewens Sampling Formula

« Previous: The Finitely-Many-Sites Models
Suggested Citation:"Approximations for the Ewens Sampling Formula." National Research Council. 1995. Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology. Washington, DC: The National Academies Press. doi: 10.17226/2121.
×
Page 136
Suggested Citation:"Approximations for the Ewens Sampling Formula." National Research Council. 1995. Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology. Washington, DC: The National Academies Press. doi: 10.17226/2121.
×
Page 137
Suggested Citation:"Approximations for the Ewens Sampling Formula." National Research Council. 1995. Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology. Washington, DC: The National Academies Press. doi: 10.17226/2121.
×
Page 138

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

CALIBRATING THE CLOCK: USING STOCHASTIC PROCESSES TO MEASURE THE RATE OF EVOLUTION 136 MATHEMATICAL VIGNETTE: APPROXIMATING COMBINATORIAL STRUCTURES Our mathematical vignette takes us from the world of population genetics to that of probabilistic combinatorics. We show how the Ewens sampling formula (ESF), whose origins in population genetics were described above, plays a central role in approximating the probabilistic structure of a class of combinatorial models. This brief account follows Arratia and Tavaré (1994), to which the interested reader is referred for further results. Our first task is to describe the combinatorial content of the ESF itself. Approximations for the Ewens Sampling Formula First, we recall Cauchy's formula for the number N(a) ≡ N(a1,a2,. . .,an) of permutations of n objects that have a1 cycles of length 1, a2 cycles of length 2, . . ., an cycles of length n. (5.21) 1(A) denoting the indicator of the event A. If each of the n! permutations is assumed to be equally likely, then a random permutation has cycle index a with probability (5.22) where Cj ≡ Ci(n) is the number of cycles of size j in the permutation. Comparison with (5.4) shows that (C1,C2,. . .,Cn) has the ESF with

CALIBRATING THE CLOCK: USING STOCHASTIC PROCESSES TO MEASURE THE RATE OF EVOLUTION 137 parameter θ = 1. To give the permutation representation of the ESF for arbitrary θ, we need only suppose that for some θ > 0, P(π) = cθ |n|, π∈ Sn, (5.23) where |π| denotes the number of cycles in the permutation π∈ Sn, the set of permutations of n objects. The parameter c is a normalizing constant, which may be evaluated as follows. The number of permutations in Sn with k cycles is , the absolute value of the Stirling number of the first kind. Hence so that c−1 = θ(n). It follows that under this model, (5.24) In summary, we have shown that θ-biasing a random permutation gives the ESF. The next ingredient in our story is the observation that the law in (5.24) can be represented as the joint law of independent Poisson random variables Z1,Z2,. . .,Zn, having E[Zj] = θ / j, conditional on (5.25) where denotes the law. This follows because

CALIBRATING THE CLOCK: USING STOCHASTIC PROCESSES TO MEASURE THE RATE OF EVOLUTION 138 which agrees with (5.24) apart from a norming constant that does not vary with a1,a2,. . .,an; since both formulas are probabilities, the norming constants must be equal. Equation (5.25) suggests that we might usefully approximate the dependent random variables C1,C2,. . .,Cn, by the independent random variables Z1, Z2,. . .,Zn,. This turns out to be too ambitious, but we can get away with just a little less. For any b ∈ [n] ≡ {1,2,. . .,n}, we can approximate the joint laws of Cb ≡ Cb(n) ≡ (Cl,C2,. . .,Cn) by those of Zb ≡ (Z1,Z2,. . .,Zb), with an error that tends to 0 as n → ∞ as long as b= ο (n), that is, b/n, → 0. As our measure of how well such an approximation might be expected to work, we use total variation distance dTV as a metric on the space of (discrete) probability measures. Three equivalent definitions of the total variation distance db(n) between (the law of) Cb and (the law of) Zb are given below: (5.26) In (5.26), the infimum is taken over all couplings of Cb and Zb on a common probability space, and No ≡ {0,1,2,. . .}. Arratia et al. (1992) use a particular coupling to show that there is a universal constant c = cθ with c (1) = 2 such that

Next: Combinatorial Assemblies »
Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology Get This Book
×
Buy Paperback | $80.00
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

As researchers have pursued biology's secrets to the molecular level, mathematical and computer sciences have played an increasingly important role—in genome mapping, population genetics, and even the controversial search for "Eve," hypothetical mother of the human race.

In this first-ever survey of the partnership between the two fields, leading experts look at how mathematical research and methods have made possible important discoveries in biology.

The volume explores how differential geometry, topology, and differential mechanics have allowed researchers to "wind" and "unwind" DNA's double helix to understand the phenomenon of supercoiling. It explains how mathematical tools are revealing the workings of enzymes and proteins. And it describes how mathematicians are detecting echoes from the origin of life by applying stochastic and statistical theory to the study of DNA sequences.

This informative and motivational book will be of interest to researchers, research administrators, and educators and students in mathematics, computer sciences, and biology.

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!