Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
SEEING CONSERVED SIGNALS: USING ALGORITHMS TO DETECT SIMILARITIES BETWEEN BIOSEQUENCES 59 The unit-cost scoring scheme of Figure 3.1 is not the only possible scheme. Later in this chapter, we will see a much more complex scoring scheme used in the comparison of proteins (20-letter alphabet). In that scheme and other scoring schemes, the scores in the table are real numbers assigned on the basis of various interpretations of empirical evidence. Let us introduce here a formal framework to assist our thinking. Figure 3.1 Unit-cost scoring scheme. Consider comparing sequence A = α1α2···αM and sequence B = b1b2··· bN, whose symbols range over some alphabet Ï, for example, Ï = {A,C,G,T} for DNA sequences. Let δ (a,b) be the score for aligning a with b, let δ (a,â) be the score of leaving symbol a unaligned in sequence A, and let δ(â,b) be the score of leaving b unaligned in B. Here a and b range over the symbols in Ï and the gap symbol "â". The score of an alignment is simply the sum of the scores d assigns to each pair of aligned symbols, for example, the score of is δ(A,A) + δ (T,â) + δ (T,T) + δ (A,A) + δ (â,T) + δ (C,C) + δ (G,G), which for the scoring scheme of Figure 3.1 equals 5. An optimal alignment under a given scoring scheme is an alignment that yields the highest sum. Visualizing Alignments: Edit Graphs Many investigators have found it illuminating to convert the problem of finding similarities into one of finding certain paths in an edit graph.