Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
HEARING DISTANT ECHOES: USING EXTREMAL STATISTICS TO PROBE EVOLUTIONARY ORIGINS 96 therefore cannot be optimized. We give the statistical distribution of the alignment score for completeness, however. Let s(A,B) be a real valued random variable. Define the score S by and let E(S) denote the expectation of S and Var(S) denote the variance. Clearly, E(S) = nE(s(A,B)) and Var(S) = nVar(s(A, B)). Since S is the sum of independent, identically distributed random variables s(A,B), the central limit theorem implies that for large n S â Normal(nE(s(A, B)), nVar(s(A, B))). If s(A, B) â {0, 1}, then S is binomial (n, p), where p = P{s(A, B) = 1}. Even when the letters are not identically distributed, the limiting distribution is normal under mild assumptions (Chung, 1974). Alignment Unknown The assumptions of the last section are carried over: AlA2. . . An and B1B2. . . Bm are composed of independent and identically distributed letters and s(A, B) is a real valued random variable on pairs of letters. We extend s(·,·) to s(A,â) and s(â,B) so that deletions are included. We assume that the value of s for all deletion scores is smaller than maxs(A,B). An alignment score S is the maximum over all possible alignments
HEARING DISTANT ECHOES: USING EXTREMAL STATISTICS TO PROBE EVOLUTIONARY ORIGINS 97 The optimization destroys the classical normal distribution of alignment score, but an easy application of a beautiful theorem known as Kingman's subadditive ergodic theorem gives an interesting result: Theorem 4.1 (Kingman, 1973) For s,t nonnegative integers with 0 ⤠s ⤠t, let Xs,t be a collection of random variables that satisfy (i) Whenever s t u, Xs,u ⤠Xs,t + Xt,u, (ii) The joint distribution of Xs,t} is the same as that of {Xs+l, t+1}, (iii) The expectation gt = EX0,t exists and satisfies gt ⥠â Kt for some constant Then the finite limtââ X0,t / t = λ exists with probability 1 and in the mean. The essential assumption is (ii), the subadditivity condition. To motivate and illustrate this theorem, recall the strong law of large numbers (SLLN), which treats independent, identically distributed (iid) random variables W1,W2,. . . with µ = E(Wj). The SLLN asserts that with probability 1. It is easy to see that additivity holds. Set Of course (i) is satisfied:
HEARING DISTANT ECHOES: USING EXTREMAL STATISTICS TO PROBE EVOLUTIONARY ORIGINS 98 Since the Wi are iid, (ii) is evidently true. Finally, g(t) = E(U0,t) = tµ, so that (iii) holds with µ = -K. Therefore the limit exists and is constant with probability 1. Notice that this setup does not allow us to conclude that the limit is µ. This is a price of relaxing the assumption of additivity. Returning to the statistical distribution of alignment score, recall that an alignment score S is the maximum over all possible alignments Define X,, by Then evidently, âXs,u ⥠(âXs,t) + (âXt,u) and Xs,u ⤠Xs,t + Xt,u. We have that gt = E(X0,t) exists since the expectation of a single alignment exists and âX0,t is the maximum of a finite number of alignment scores. The final hypothesis to check is gt ⥠âKt for some constant K and all t > 1. Clearly, E(âX0,t) ⤠t maxs(A, B) so that gt ⥠â(maxs(A, B))t = âKt.