Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
SEEING CONSERVED SIGNALS: USING ALGORITHMS TO DETECT SIMILARITIES BETWEEN BIOSEQUENCES 72 sequence comparison. It has been postulated that such a model is natural for biological sequences where gap costs would be expected to have a decreasing marginal penalty as a function of length. For this model, investigators have been able to design algorithms that take O(MN(log N + log M)) time or less (Miller and Myers, 1988; Eppstein et al., 1989). It is also possible to design an algorithm for completely arbitrary gap cost functions. However, such generality comes at a price: the best available algorithm takes O(MN(M + N)) time (Waterman et al., 1976). For this reason and because the more restricted affine and concave models appear adequate to most needs, the general algorithm is rarely used. The Duality Between Similarity and Difference Measures Thus far we have considered the comparison problem to be one of exposing the similarity between two sequences and thus have naturally thought in terms of maximizing the score of alignments. Another natural perspective is to think about how a sequence A may have evolved into sequence B over time. In this context, one seeks alignments that reveal the minimum number of mutational events that might have effected the transformation. In this view, an aligned symbol of B is substituted for its counterpart in A, an unaligned symbol in A is deleted, and an unaligned symbol in B is inserted. For example, in the alignment , the first T in ATTACG is deleted, and the second T in ATATCG is inserted. In the alignment , T is mutated into A, and A is mutated into T. As before, the scoring scheme δ assigns a score to each evolutionary event modeled by a column, but now the interpretation is that δ represents the differences rather than the similarities between symbols. Note that for formal purposes it is assumed that an A mutates into an A in the alignments above at no cost; that is, one chooses δ(A,A) to be 0. Given a scoring scheme δ reflecting an evolutionary or difference-based model, the goal is to find an alignment of minimal score, that is, one that indicates the minimum-scoring set of changes needed to go from one sequence to a related sequence. Let D(A,B) be the score of a minimal