FIG. 2. Correlation of codon usage bias across species. The correlation coefficient for the upper graph is 0.42 (n=31, P=0.01) and that for the lower graph is 0.42 (n=63, P < 0.001).

codon usage bias (ENC-X) of each fourfold and sixfold degenerate amino acid and the base composition at synonymous sites. As expected, for most amino acids the highest positive correlation is an increase in C as bias increases. However, there are exceptions to the preference for C.Val and Leu increase in use of G as bias increases. This pattern is very similar in D. pseudoobscura and D. virilis, a general preference for C except for Val and Leu, which prefer G in the codon third position. For Ile (threefold degenerate) and most twofold degenerate T/C amino acids the highest significant positive correlation is for an increase in C as bias increases. The exception is Asp, which shows no significant correlation in its ENC-X and base composition at the wobble position, in agreement with the previous point. For all A/G twofold degenerate amino acids, G increases as bias increases (unpublished work).

Gene Length. As we discuss below, some explanations of codon usage bias may be affected by the length of a gene. Does the length of a gene in D. melanogaster correlate with the degree of codon bias? To answer this, we need to be certain to use a measure of bias that itself is not biased by sample size (i.e., the number of codons in a gene). Wright (4) performed simulation studies on ENC and found little or no detectable bias with sample size; we have confirmed this finding (E.N.M., unpublished data). Fig. 4 summarizes the relationship between gene length and codon usage bias: smaller genes tend to have higher bias than do longer genes.

Recombination. There is also an effect of the level of recombination on the level of codon usage bias of Drosophila genes: genes in regions of low recombination tend to have low bias (11). This is attributed to the fact that selection can act more effectively at single loci or on nucleotide positions when recombination is high, the so-called Hill-Robertson (12) effect.

Causes

Mutation Bias. There is evidence that mutation bias may affect codon usage in warm-blooded vertebrates that have mosaic genomes consisting of long stretches of A+T-rich DNA interspersed with long stretches of G+C-rich DNA. This isochore structure, as it is termed (13), is thought to be due to regional differences in mutation bias (14, 15). The observation is that genes in A+T-rich isochores tend to have A+T predominantly at silent sites, while genes in G+C-rich isochores have G+C more often at silent sites (16, 17). This is shown by a correlation between base content of introns and the exons of the same gene.

Table 1. Codon usage for Adh and Adhr

 

No. of times codon used

Isoleucine

Glycine

 

Leucine

Subgenus group

No. of species

Mean ENC

AUU

AUC

AUA

GGU

GGC

GGA

GGG

UUA

UUG

CUU

CUC

CUA

CUG

Adh

Sophophora

melanogaster

9

31.8±3.2

72

136

0

51

81

36

1

0

30

3

33

0

224

obscura

7

41.5±5.8

64

93

3

37

87

9

0

1

22

6

16

1

125

willistoni

6

45.9±0.7

79

53

0

67

33

8

0

2

99

7

13

3

32

Idiomyia

“Hawaiians”

10

43.8±1.7

113

115

1

66

112

6

0

0

49

43

26

23

103

Drosophila

repleta

9

44.6±2.7

77

144

4

35

89

29

2

2

23

16

36

6

124

virilis

8

53.5±1.7

92

100

3

35

64

41

4

0

18

35

8

15

112

Adhr

Sophophora

melanogaster

6

57.1±2.0

42

33

23

28

12

58

11

13

32

6

7

18

42

obscura

7

49.5±3.9

48

49

21

45

46

34

14

7

20

6

15

16

84

Numbers in main body are numbers of times each codon is used in that group of species. ENC is effective number of codons, defined in the text, and is presented ±SD.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement