not amino acid replacements, and, moreover, assumes that all sites are variable. For the first time, this equation recognizes that even two unrelated sequences will still have positions that match. The 3/4 and 4/3 terms reflect that there are four kinds of nucleotides and thus there are three ways in which a second nucleotide may not match the first. To make equation 2 suitable for replacements, the 3/4 and 4/3 should be 19/20 and 20/19, respectively. Note that equation 2, to be technically correct, requires n to be the number of variable sites, although generally n is taken as the length of the sequence, the implicit assumption being that all sites are variable.
Equation 2 assumes that every nucleotide (or every amino acid when modified as stated) is equally likely to be present at a position. The equation can be made more accurate if one knows the frequencies of the elements. If b = 1 - σp2i, where pi is the frequency of the ith element, and 1 ≤ i ≤ k, where k = 4 for nucleotides and k = 20 for amino acids, then
where b is the probability that two randomly drawn elements do not match.
The above modifications (equations 2 and 3) improve the estimate of r by recognizing and accounting for additional biological facts. However, there are other biological features that may be important. In particular, it may be important to know whether n is all sites or only a fraction thereof. Moreover, we may need to consider not only that n is less than the length of the sequence but also that the sites that make up n may not be the same (e.g., in fungi as in mammals). This is the concept incorporated in the notion of covarions, which is supported by reasonable evidence, derived not only from the observed fitting of Poisson distributions to data such as in Fitch and Markowitz (1970) but based also on a test that showed that the invariable positions of cytochrome c are, in fact, not the same in the fungi as in the metazoans (Fitch, 1971).
The first parameter determined was the number of covarions in SOD, estimated to be 28. The fixation of this number reduces the number of parameters that are free to change in order to get a good fit to the SOD observations. In a complementary vein, the 11 clades for which paleontological dates were estimated constitute a large extent of variant data, all of which must be fit to demonstrate that the SOD observations could arise via a perfect clock.
The second parameter fixed was the number of potentially variable codons. This was set rather arbitrarily at 118 by the following logic. Of the 162 codon positions, 44 were unvaried in our data set or were