proteins can be similar because of evolution from a common precursor, similarity of protein sequences can also be a clue to common function independent of evolutionary considerations. It appears that nature not only conserves the critical parts of a protein's conformation and function, but also reuses such motifs as modular units in fashioning the spectrum of known proteins. One finds strong similarities between segments of proteins that have similar functions. A strong similarity between the v-sis oncogene and a growth-stimulating hormone was the key to discovering that the v-sis oncogene causes cancer by deregulating cell growth. In that case, the similarity involved the entirety of the sequence. In other cases, functionally related proteins are similar only in segments corresponding to active sites or other functionally critical stretches.

Finding Global Similarities

To illustrate the underlying techniques of sequence comparison, we begin with a simple, core problem of finding the best alignment between the entirety of two sequences. Such an alignment is called a global alignment because it aligns the entire sequences, as opposed to a local alignment, which aligns portions of the sequences.

As an example, consider finding the best global alignment of A = ATTACG and B = ATATCG under the following scoring scheme. A letter aligned with the same letter has a score of 1. A letter aligned with any different letter or a gap has a score of 0. The total score is the sum of the scores for the alignment. A matrix depicting this "unit-cost" scoring scheme is shown in Figure 3.1. Under this unit-cost scheme, the score of an alignment is equal to the number of identical aligned characters. The obvious alignment image has a score of 4. However, because gaps are allowed, a higher score can be achieved, namely, 5, which can be shown to be the highest score possible. An optimal alignment, that is, an alignment that achieves this highest score by aligning five symbols, is image. In some cases, there is only one, unique optimal alignment, but in general there can be many. For example, image also has a score of 5.

