FIG. 1. Distribution of genes with various degrees of codon usage bias measured by “effective number of codons”, ENC; lower ENC is greater bias. The number of genes for each species (Drosophila melanogaster, Drosophila pseudoobscura, Drosophila virilis, Escherichia coli, and Saccharomyces cerevisiae) and the mean ENC±SD are shown in brackets.

compared is very high; Ks, the synonymous substitutions per site, is greater than 1 for most genes. This indicates enough evolutionary time has elapsed to radically change codon usage in the absence of constraints. Not only does the level of bias remain, conserved, but often the actual pattern as well. One example is Alcohol dehydrogenase (Adh), which has been sequenced in more than 50 species of Drosophila. Table 1 shows the pattern of codon usage for three amino acids. The subgenera Sophophora and Drosophila diverged from each other about 50 million years ago (7), so the avoidance of particular codons in Adh, namely AUA (isoleucine), GGG (glycine), and UUA (leucine), has persisted for a very long time. It is not the case that Drosophila simply cannot use these codons; many genes do use them, an example being the very closely linked Adh-related (Adhr) gene shown in the lower part of Table 1.

While most genes display evolutionary conservatism for codon bias, other genes do not. Fig. 2 notes a few examples of exceptions which are of some interest. First, Adh in D. virilis is quite unbiased, having an ENC of about 53, while in D. melanogaster and most other species it is quite biased. (Note that even though low in codon usage bias over all the gene, D. virilis Adh still avoids the three codons noted in Table 1, so the avoidance of these codons is not simply due to overall bias.) Adhr also varies in bias between species, being nearly totally unbiased in D. melanogaster but displaying quite high bias in D. pseudoobscura (Fig. 2). Contrariwise, Adh is more biased in D. melanogaster (ENC=31.4) than in D. pseudoobscura (ENC= 36.7). These two genes are only a few hundred base pairs apart.

The Serendipity genes, indicated by points Sry-ß and Sry-d in Fig. 2, are also of some interest. These genes are part of a gene cluster that contains six transcriptional units in an 8-kb stretch of DNA. In Fig. 3 we compare the codon usage bias of these genes between D. melanogaster and D. pseudoobscura. Some genes in this cluster have remained relatively highly biased (e.g., the ribosomal protein gene M(3)99D) and others remain quite unbiased (e.g., janA, janB, and Sry-a). Interspersed are the two Sry genes that shift in level of codon bias between these species. There is evidence that Sry genes are expressed differently in these two species (8), which may be related to their change in level of codon usage bias.

Pattern of Codon Usage Bias. While ENC and related measures indicate the overall bias, it is also instructive to look more closely at the pattern of codon bias. Generally, Drosophila genes with high codon usage bias have G and especially C at silent positions (9, 10). Table 2 shows the base composition at two- and fourfold degenerate synonymous sites for the approximately 10% highest and 10% lowest biased genes in D. melanogaster.

Do all amino acids contribute to the codon usage bias of a gene and, if so, do they all show the same pattern (i.e., an increase in C ending codons)? Comparing the individual amino acid measure, ENC-X, to overall bias of the gene, we found all amino acids contribute significantly (P < 0.0001) to the overall bias of a gene, although Asp is a clear outlier with relatively little contribution to overall codon usage bias (unpublished work). We then examined if the pattern of bias for each amino acid is similar: Table 3 shows the correlation of



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement