Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
HEARING DISTANT ECHOES: USING EXTREMAL STATISTICS TO PROBE EVOLUTIONARY ORIGINS 108 TWO BEHAVIORS SUFFICE In this section we describe a statistic that provides a link between the sections "Global Sequence Comparison" and "Local Sequence Comparison" of this chapter. This statistic is the score of the best matching intervals between two sequences, where nonidentities in the alignments receive penalties. In the section ''Global Sequence Comparisons," we showed that the growth of score of global alignments of random sequences is linear with sequence length. In the section "Local Sequence Comparisons," we showed that the number of long runs of exact matches between random sequences has an approximate Poisson distribution. Below we show that the Poisson distribution implies that, for exact matching, the growth of longest run length is proportional to the logarithm of the product of sequence length. Then we state the result that all optimal alignments of a broad class have a score that has either logarithmic or linear growth, depending on the penalties for nonidentities. We will consider two sequences A = A1A2. . .An and B = B1B2. . .Bn of equal length n. Recall that p: = P(two random letters are identical) = P(Cα = 1). In the case of unknown alignments, λ = E(W) is given from equation (4.5) by λ = pt((n + n â 2t + 1) + (n â t)(n â t)(l â p)). For λ â 1 , we expect one run of length t. Then 1 = pt ((n + n â 2t + 1)+(n â t)(n â t)(1 â p)) â pt(nn(l â p)). Solving for t yields t = log1/p(nn(l â p)). Therefore the length of the longest run of identities grows like log1/p(n2) = 2 log1/p(n).
HEARING DISTANT ECHOES: USING EXTREMAL STATISTICS TO PROBE EVOLUTIONARY ORIGINS 109 To relax our stringent requirement of identities, we recall scoring for the alignments as introduced in the section "Global Sequence Comparisons." Extend the sequence A1A2. . .An to by inserting gaps "â" and similarly extend B1B2. . .Bn to Define where and µ ⥠0, δ ⥠0. The maximum is extended over all ways of inserting gaps and all L. In Smith and Waterman (1981) and Waterman and Eggert (1987), dynamic programming algorithms are presented to compute H(A, B) = H(A, B; µ, δ) = max{S(I, J): I â A, J â B} in time O(n2). By I â A, for example, we mean all I = AiAi+1. . .Aj, where 1 ⤠i ⤠j ⤠n. This algorithm was designed to study situations like our 16S rRNA/tRNA relationships. We are searching for segmental alignments that are not necessarily perfect matchings but are unusually good matches. After some discussion of the statistical properties of H(A ,B; µ, δ), we will apply the algorithm to our data. The statistic H(A ,B; µ, δ) is for one of the so-called local alignment algorithms. However, when the penalties µ and δ are set to 0, the algorithm computes a global alignment. The results in the section "Global Sequence Comparisons" imply that H(A, B; 0,0) ~ a n; but when the parameters are set to â, the results in the section "Local Sequence Comparisons" imply that