Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
CALIBRATING THE CLOCK: USING STOCHASTIC PROCESSES TO MEASURE THE RATE OF EVOLUTION 136 MATHEMATICAL VIGNETTE: APPROXIMATING COMBINATORIAL STRUCTURES Our mathematical vignette takes us from the world of population genetics to that of probabilistic combinatorics. We show how the Ewens sampling formula (ESF), whose origins in population genetics were described above, plays a central role in approximating the probabilistic structure of a class of combinatorial models. This brief account follows Arratia and Tavaré (1994), to which the interested reader is referred for further results. Our first task is to describe the combinatorial content of the ESF itself. Approximations for the Ewens Sampling Formula First, we recall Cauchy's formula for the number N(a) ⡠N(a1,a2,. . .,an) of permutations of n objects that have a1 cycles of length 1, a2 cycles of length 2, . . ., an cycles of length n. (5.21) 1(A) denoting the indicator of the event A. If each of the n! permutations is assumed to be equally likely, then a random permutation has cycle index a with probability (5.22) where Cj ⡠Ci(n) is the number of cycles of size j in the permutation. Comparison with (5.4) shows that (C1,C2,. . .,Cn) has the ESF with
CALIBRATING THE CLOCK: USING STOCHASTIC PROCESSES TO MEASURE THE RATE OF EVOLUTION 137 parameter θ = 1. To give the permutation representation of the ESF for arbitrary θ, we need only suppose that for some θ > 0, P(Ï) = cθ |n|, Ïâ Sn, (5.23) where |Ï| denotes the number of cycles in the permutation Ïâ Sn, the set of permutations of n objects. The parameter c is a normalizing constant, which may be evaluated as follows. The number of permutations in Sn with k cycles is , the absolute value of the Stirling number of the first kind. Hence so that câ1 = θ(n). It follows that under this model, (5.24) In summary, we have shown that θ-biasing a random permutation gives the ESF. The next ingredient in our story is the observation that the law in (5.24) can be represented as the joint law of independent Poisson random variables Z1,Z2,. . .,Zn, having E[Zj] = θ / j, conditional on (5.25) where denotes the law. This follows because
CALIBRATING THE CLOCK: USING STOCHASTIC PROCESSES TO MEASURE THE RATE OF EVOLUTION 138 which agrees with (5.24) apart from a norming constant that does not vary with a1,a2,. . .,an; since both formulas are probabilities, the norming constants must be equal. Equation (5.25) suggests that we might usefully approximate the dependent random variables C1,C2,. . .,Cn, by the independent random variables Z1, Z2,. . .,Zn,. This turns out to be too ambitious, but we can get away with just a little less. For any b â [n] â¡ {1,2,. . .,n}, we can approximate the joint laws of Cb â¡ Cb(n) â¡ (Cl,C2,. . .,Cn) by those of Zb â¡ (Z1,Z2,. . .,Zb), with an error that tends to 0 as n â â as long as b= ο (n), that is, b/n, â 0. As our measure of how well such an approximation might be expected to work, we use total variation distance dTV as a metric on the space of (discrete) probability measures. Three equivalent definitions of the total variation distance db(n) between (the law of) Cb and (the law of) Zb are given below: (5.26) In (5.26), the infimum is taken over all couplings of Cb and Zb on a common probability space, and No â¡ {0,1,2,. . .}. Arratia et al. (1992) use a particular coupling to show that there is a universal constant c = cθ with c (1) = 2 such that