Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
HEARING DISTANT ECHOES: USING EXTREMAL STATISTICS TO PROBE EVOLUTIONARY ORIGINS 110 H(A, B; â, â) ~ b log n. It is natural to ask if there are other growth rates. The answer is presented in Waterman et al. (1987) and Arratia and Waterman (1994), where the following result is proved: Assume both sequences have equal lengths n. There is a continuous curve in the nonnegative (µ,δ) plane such that when (µ,δ) belongs to F0, the same component as (0,0), the growth of H is linear with sequence length. When (µ,δ) belongs to Fâ, the same component as (â,â), the growth is logarithmic with sequence length. In any curve crossing from F0 to Fâ there is a phase transition in growth of the score H(µ,δ). This behavior is quite general, and in Arratia and Waterman (1994) it is shown to hold with very general penalties for scoring matches, mismatches, and indels. The behavior of H(A, B; µ,δ) when (µ,δ) lies on the line between F0 and Fâ remains an open question. RNA EVOLUTION REVISITED How do the results in the previous section apply to our comparisons of 16S rRNA with tRNAs? As we have seen, the matchings of Bloch et al. (1983) were the result of applying a local algorithm, and so we will apply the local algorithm H to the data and study the distribution of scores. The first task is to compare the sequences using the statistic H(A, B; µ,δ) with µ = 0.9 and δ = 2.1. These values have been used in several database searches, and the growth of scores from aligning random sequences lies in the logarithmic region. The results of the algorithm applied to our data can be found in Table 4.2. No closed-form Chen-Stein method has been arrived at for alignments with indels, so the results are presented in number of standard deviations (#Ï) above the mean value for comparing two random sequences of similar lengths. (See Waterman and Vingron (1994) for recent work on estimating statistical significance.) The estimated mean as a function of the tRNA length is H(A, B; µ = 0.9, δ = 2.1) = 5.04 logn â30.95,
HEARING DISTANT ECHOES: USING EXTREMAL STATISTICS TO PROBE EVOLUTIONARY ORIGINS 111 Table 4.2 Scores and Alignment Statistics tRNA Score #Ï Matches mms. Indels ala-la 12.2 â.02 14 2 0 ala-lb 12.2 â0.1 14 2 0 cys 21.0 6.2 40 10 5 asp-l 10.8 â1.1 22 8 2 glu-l 10.9 â0.8 21 9 1 glu-2 12.8 0.6 22 8 1 phe 13.0 0.6 32 10 5 gly-1 9.4 â1.4 15 4 1 gly-2 9.5 â1.2 35 15 6 gly-3 14.4 1.5 41 14 7 his-1 13.2 1.1 28 12 2 ile-1 13.6 0.9 41 26 2 ile-2 14.0 1.3 35 10 6 lys 10.7 â0.5 23 7 3 leu-1 13.8 0.7 49 28 5 leu-2 11.7 â0.7 33 17 3 leu-5 13.4 0.4 36 14 5 met-f 12.0 â.03 44 20 7 met-m 11.4 â0.2 21 4 3 asn 15.3 2.4 33 13 3 g1n-1 11.8 0.1 23 8 2 gln-2 12.1 0.2 26 11 2 arg-1 13.3 0.7 48 23 7 arg-2 12.8 0.3 26 8 3 ser- 11.1 â1.3 29 11 4 ser-3 13.8 0.3 42 18 6 thr-ggt 10.1 â1.3 15 1 2 val-1 11.9 â0.2 22 9 1 val-2a 11.3 â0.7 14 3 0 val-2b 11.3 â0.4 14 3 0 trp 11.0 â0.7 22 10 1 tyr-1 11.7 â0.4 31 17 2 tyr-2 10.9 â0.9 42 19 7 # Ï, the number of standard deviations above the mean value (for comparing the two random sequences of similar lengths); mms., mismatches; indels, insertions/deletions.