Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
HEARING DISTANT ECHOES: USING EXTREMAL STATISTICS TO PROBE EVOLUTIONARY ORIGINS 105 Alignment Unknown The situation for matching between two sequences is closely related, although the dependence structure becomes more complex. Suppose that the two sequences A1A2. . .An and B1B2. . .Bm are made up of letters independently and uniformly chosen from a d-letter alphabet. It must be emphasized that whenever the letters are not uniformly chosen, Theorem 4.2 holds but is not straightforward to apply. In matching DNA, d = 4; for protein sequences, d = 20. Let I = {(i, j): l ⤠i ⤠n â t + l, l ⤠j ⤠m â t + l}. Define indicator random variables Ei,j =1 if Ai = Bj. Let p =P(Eα =1)=1/d. As in the case of head runs, we need to unclump matches and consider ''boundary effects." Let Xi,j = Ei,j Ei+lj+l⦠Ei+t-1 j+t-1 if i =1 or j = 1 and otherwise Xi,j = (1 â Eiâ1 jâ1) Ei,jE i+1 j+l⦠Ei+tâ1 j+tâ1. With W = âαâI Xα, calculating λ = E(W) yields λ = pt[(n + m - 2t + 1) + (n â t)(m â t)(1 â p)]. (4.5) In matching two tRNA sequences, one of length 76, the other of length 77, would a match of length 9 be unusual? For the given parameters, λ = 0.0136 and under the model above, the event has a probability of approximately