Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
HEARING DISTANT ECHOES: USING EXTREMAL STATISTICS TO PROBE EVOLUTIONARY ORIGINS 106 lâexp(âλ) = 0.135. A bound on the error may be calculated in a way similar to that for coin tossing. We note that again b2 = 0, and by breaking the sum for b1 into two sums, one of which is made up of all terms that involve the boundary, we find b1 < λ2 (2t + 1) / ((n â t + l)(m â t + 1)) + 2λpt. Hence, the probability above is correct to within 8.5 à 10â7. APPLICATION TO RNA EVOLUTION Now we bring these ideas to bear on our RNA evolution problem. We have a set of 33 tRNA molecules and one 16S rRNA molecule from E. coli. In Bloch et al. (1983), matchings between 16S and each of the tRNAs were intensely studied. tRNA evolution is a complex topic and tRNA/tRNA comparisons were not made in this study. Table 4.1 shows the length of the longest exact match Hn between these sequences, along with estimates of significance or p-values (1âeâλn) from our ChenStein method. There are no exceptionally good matchings in this list, and so this analysis discounts any deep relationship between the sequences. In fact the p-values seem unusually large. In the 33 comparisons the minimum p-value is 0.26. Still we should not give up the search. One estimate puts the origin of these sequences at 3 billion years ago. We should not expect large segments of sequence to be preserved in every position over such vast amounts of time. Instead, mutations such as substitutions, insertions, and deletions will accumulate, greatly complicating our task. It is possible that the hypothesis of common origin is correct and that so much evolutionary change has taken place that no significant similarity remains. The next section, "Two Behaviors Suffice," examines the results of this search for unusual similarity using more subtle sequence comparison algorithms.
HEARING DISTANT ECHOES: USING EXTREMAL STATISTICS TO PROBE EVOLUTIONARY ORIGINS 107 Table 4.1 Exact Match P-Values tRNA GenBank Locus Length (n) Hn 1 â eâλn b1 ala-la ECOTRA1A 76 9 0.26 1.87 à 10â5 ala-lb ECOTRA1B 76 9 0.26 1.87 à 10â5 cys ECOTRC 74 8 0.69 2.67 à 10â4 asp-l ECOTRD1 77 8 0.71 2.79 à 10â4 glu-1 ECOTRE1 76 10 0.71 1.25 à 10â6 glu-2 ECOTRE2 76 10 0.71 1.25 à 10â6 phe ECOTRF 76 9 0.26 1.87 à 10â5 gly-l ECOTRG1 74 7 0.99 3.90 à 10â3 gly-2 ECOTRG2 75 6 1.00 5.70 à 10â2 gly-3 ECOTRG3 76 9 0.26 1.87 à 10â5 his-1 ECOTRH1 77 9 0.26 1.89 à 10â5 ile-1 ECOTRI1 77 9 0.26 1.89 à 10â5 ile-2 ECOTRI2 76 10 0.71 1.25 à 10â6 lys ECOTRK 76 6 1.00 5.78 à 10â2 leu-1 ECOTRL1 87 8 0.76 3.19 à 10â4 leu-2 ECOTRL2 87 8 0.76 3.19 à 10â4 leu-5 ECOTRL5 87 9 0.29 2.16 à 10â5 met-f ECOTRMF 77 9 0.26 1.89 à 10â5 met-m ECOTRMM 77 8 0.71 2.79 à 10â4 asn ECOTRN 76 7 0.99 4.01 à 10â3 gln-1 ECOTRQ1 75 8 0.70 2.71 à 10â4 gln-2 ECOTRQ2 75 8 0.70 2.71 à 10â4 arg-1 ECOTRR1 76 7 0.99 4.01 à 10â3 arg-2 ECOTRR2 77 7 0.99 4.07 à 10â3 ser-1 ECOTRS1 88 8 0.76 3.23 à 10â4 ser-3 ECOTRS3 93 9 0.31 2.33 à 10â5 thr-ggt ECOTRTACU 76 7 0.99 4.01 à 10â3 val-1 ECOTRV1 76 8 0.70 2.75 à 10â4 val-2a ECOTRV2A 77 8 0.71 2.79 à 10â4 val-2b ECOTRV2B 77 9 0.26 1.89 à 10â5 trp ECOTRW 76 7 0.99 4.01 à 10â3 tyr-1 ECOTRYI 85 8 0.75 3.11 à 10â4 tyr-2 ECOTRY2 85 8 0.75 3.11 à 10â4 Hn, the length of the longest exact match between the listed tRNA molecule and a 16S rRNA molecule; 1 â eâλn, the p-value (estimate of significance) for nth tRNA molecule; b1, column entry is the calculated bound on b1.