Proteins and DNA sequences are polymers consisting of a chain of monomers with a common backbone substructure that links them together. In the case of DNA, there are 4 types of monomers, the nucleotides, each having a different side chain. For proteins, there are 20 types of monomers, the amino acids. With just a few exceptions, the sequence of monomers, that is, the primary structure, of a given protein or DNA strand completely determines the three-dimensional shape of the biopolymer. Because the function of a molecule is determined by the position of its atoms in space, this almost perfect correlation between sequence and structure implies that to know the function of a biopolymer, it in principle suffices to know its primary sequence.
The primary sequence of a DNA segment is denoted by a string consisting of the four letters A,C,G, and T. Analogously, the primary sequence of a protein is denoted by a string consisting of 20 letters of the alphabet, one for each type of amino acid. In principle, these strings of symbols encode everything one needs to know about the protein or DNA strand in question. If the primary sequences of two proteins are similar, then it is reasonable to conjecture that they perform the same function. Because DNA's principal role is one of encoding information (including all of an organism's proteins), the similarity of two segments of DNA suggests that they code similar things.
Mutation in a DNA or protein sequence is a natural evolutionary process. Errors in the replication of DNA can cause a change in the nucleotide at a given position. Less often, a nucleotide is deleted or inserted. If the mutation occurs in a region of DNA that codes for protein, these changes cause related changes in the primary sequence and, hence, the shape and activity of the protein. The impact of a particular mutation depends on the degree to which the original and new amino acid sequences differ in their physical and chemical properties. Mutations that result in proteins that are so altered that they function improperly or not at all tend to be lethal to the organism. Nature is biased against mutations in those critical regions central to a protein's function and is more lenient toward changes in other regions.
Similarity of DNA sequences is a clue to common evolutionary origin. If two proteins in two organisms evolved from a common precursor, one will generally find highly similar segments, reflecting strongly conserved critical regions. If the proteins are very recent derivatives, one might expect to see similarity over the entire length of the sequences. While