positions present only in the liverfluke, nametode, and/or bacteria but not in the other sequences. If all 44 unvaried sites were permanently invariable sites within the plant-animal-fungal sequences, then 162 - 44 = 118 were potentially variable.

The third parameter set was the number of alternatives that, on average, are allowed in a variable site. A site that must have a negative charge (aspartate or glutamate) has only two alternatives. Other possible pairs include serine-threonine, phenylalanine-tyrosine, asparagine-glutamine, and lysine-arginine. At the other extreme, there may be sites at which any amino acid can be present, in which case there are 20 alternatives. If all possibly variable sites do vary at some point, then one would expect, that (α - 1)/α of them will differ in distant pairwise comparisons (α is the average number of alternatives at variable sites; it is like the 3/4 or 19/20, in Eq. 2 above). If α = 2.5, then one expects 1.5/2.5 = 0.6 of the potentially variable sites to differ when the number of replacements per site is large. This number may be estimated as 0.6 × 118 = 70.8 differences, a number slightly greater than the average number of differences observed between fungi and metazoans. Thus, we have set α at 2.5, although it may be somewhat low in view of the fact that the number is 3.1 in the Ayala (1986) data.

The other parameters are obtained by trying, in a hit-or-miss fashion, various combinations of the persistence and the number of replacements per million years. The results shown in Figure 3 were obtained by setting the persistence at 0.01 and the replacements per million years at 0.6. The persistence needs to be low for the vast majority of potentially variable sites to have been variable and experienced a replacement while variable—that is, to get an average of 67 differences in 118 codons after 600 replacements. A larger a plus a larger persistence could yield a comparable result but would make the short-duration times yield simulated differences that are too small (compared to observed differences) because too many replacements would occur in sites with prior replacements. A larger α and a smaller number of potentially variable codons would also improve the fit, but it does not seem correct to reduce the number of potentially variable codons below the number that had in fact varied at least once among the 67 sequences in the group. Data similar to that shown in Figure 3 were presented by Kwiatowski et al. (1991) who tried to fit them with both a double Poisson and a rectangular parabola. They found, as we do here, that it was difficult to fit well all the data at once.

We present the best fit that we have obtained, but there is no reason to believe that it is optimal. Therefore, we do not wish to assert that we have determined the real values of the parameters. What we do wish to assert, however, is that a reasonable model of the biological processes

The National Academies of Sciences, Engineering, and Medicine
500 Fifth St. N.W. | Washington, D.C. 20001

Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement