separate gene clusters, each of which spans more than one megabase (Mb) of DNA in the human genome.

The purpose of this paper is to present major findings of our recent study on this subject. The immune system is one of the most complicated genetic systems in vertebrates, and detailed results will be published elsewhere. In this paper, we will be concerned primarily with the evolution of MHC and VH gene families.


MHC molecules in vertebrates can be divided into two groups; class I and class II molecules. The class I MHC molecule consists of an a chain and a ß2-microglobulin (ß2m). The a chain is encoded by a class I MHC gene, whereas ß2m is produced by a gene that lies outside the MHC. The class I a chain has three extracellular domains (a1, a2, a3), a transmembrane portion, and a cytoplasmic tail. The a3 domain associates noncovalently with ß2m. The class II MHC molecule consists of noncovalently associated a and ß chains, which are encoded by class II a-chain (A) and ß-chain (B) loci, respectively. Each chain is composed of two extracellular domains (designated as a1 and a2 in the a chain and ß1 and ß2 in the ß chain), a transmembrane portion, and a cytoplasmic tail.

In humans class I and class II genes are located on chromosome 6 and form two separate clusters (Fig. 2). The class I MHC consists of three highly expressed and highly polymorphic loci, A, B, and C (classical class I or class Ia loci), and 25–50 nonclassical (class Ib) loci, including pseudogenes. Nondefective class Ib genes are usually monomorphic and expressed in limited tissues, and their function is not well understood. So the definition of class Ib genes is somewhat vague (23). However, the recently discovered class Ib gene HLA-HH seems to have some important function, because mutants of this gene apparently cause the genetic disease hemochromatosis (21). The human class II gene cluster contains six major gene regions: DP, DN, DM, DO, DQ, and DR. Each of the DP, DM, DQ, and DR regions consists of at least one a-chain and at least one ß-chain functional gene. The a-chain and ß-chain genes in the regions DP, DQ, etc., are designated as DPA1, DPA2, DPB1, DPB2, DQA1, etc. The class II gene cluster also includes many poorly expressed genes and pseudogenes.

FIG. 2. Simplified genomic organizations of the human and mouse MHC genes. Only relatively well characterized genes are presented here, and there are many other genes or pseudogenes in both organisms. The recently discovered class Ib gene HLA-HH (21) is located about 4 Mb away from gene F on the telomeric side. (The original authors used the gene symbol HLA-H, but we changed it to HLA-HH to avoid the confusion with an already established Ib locus with the same name.) The number of genes also varies with haplotype in both class I and class II regions. Open boxes refer to polymorphic or classical loci, whereas closed boxes stand for monomorphic or nonclassical loci. Class III genes or other genes unrelated to class I and class II genes are not shown. Class II genes A and B refer to class II a- and ß-chain genes, respectively. The MHCs in humans and mice are often called HLA and H2, respectively. The gene maps in this figure are based on information from Trowsdale (22) and other sources.

The mouse MHC genes have also been studied extensively. The mouse class Ia genes are not orthologous with the human class Ia genes (2426), and therefore different gene symbols are used for them (Fig. 2). Actually, most different orders of mammals seem to have nonorthologous class Ia genes. The number of class Ia genes in mammals is usually 1–3, but there are often a large number of class Ib genes. By contrast, class II genes from different orders of mammals usually have orthologous relationships, but the genes from birds and amphibians are not orthologous with the mammalian genes, with a few possible exceptions (27).

Polymorphism. The hallmark of MHC genes is the extremely high degree of polymorphism within loci, the extent of polymorphism being the highest among all vertebrate genetic loci (28). The mechanism of maintenance of this polymorphism has been debated for the last 30 years, and it still remains controversial (15, 19, 29). The hypotheses proposed to explain the polymorphism include those of maternal-fetal incompatibility, mating preference, overdominant selection, frequency-dependent selection due to minority advantage, and interlocus gene conversion. This problem has been discussed by Hughes and Nei several times (15, 16, 30, 31), and we are not going to repeat the discussion here. However, we would like to mention that in our view the simplest explanation is heterozygote advantage or overdominant selection. In this hypothesis, heterozygotes for a locus have selective advantage over homozygotes, because they can cope with two different types of antigens, whereas the latter can deal with only one type of foreign antigen. Since there are several different functional MHC loci, heterozygotes for all these loci should have substantial selective advantage over homozygotes. Evidence supporting overdominant selection is also increasing (19, 31).

In recent years a number of authors (19, 32) presented evidence that new alleles can be created by interallelic recombination at the B locus. However, interallelic recombination is powerless in producing new alleles unless there are abundant polymorphic alleles in the population, and the MHC polymorphism seems to be maintained primarily by point mutation and overdominant selection (15). Another interesting discovery in recent years is the relatively high degree of polymorphism at a class Ib locus, MICA (33). The function of this gene is unknown, but the average heterozygosity per nucleotide site (nucleotide diversity) for the three extracellular domains (exons 2, 3, and 4) is 0.011. Although this is lower than that for class Ia loci (0.04–0.08), it is considerably higher than that (0.0002–0.007) for other nuclear genes (15). The reason for this high degree of polymorphism is unclear, but it is possibly caused by a hitchhiking effect of overdominant selection operating at the B locus, which is closely linked with this locus.

In the past, population geneticists have been primarily interested in the extent of polymorphism within loci. In the MHC, however, there is a substantial amount of polymorphism due to gene duplication, insertion, or deletion. For example, in the class II DRB region of humans there are at least five different haplotypes, and the number of genes per haplotype varies from 2 to 5 (22). Furthermore, there seem to be at least 9 distinct gene copies in this region, and only one gene (DRB1) is shared by all haplotypes. The mouse population is also known to have many different haplotypes, and the type and number of class I genes vary considerably with haplotype. For example, class Ia locus D is missing in most haplotypes, and the number of class Ib genes varies considerably with haplotype (34).

Intralocus vs. Interlocus Variation Within Species. One way of studying the significance of interlocus recombination or gene conversion is to compare the interlocus and intralocus genetic variation within species. This can be done by constructing a phylogenetic tree for alleles from different loci. If there is any kind of interlocus genetic exchange, one would expect that alleles from each polymorphic locus do not necessarily form a monophyletic cluster in the phylogenetic tree. Fig. 3 shows the phylogenetic tree for different alleles from human MHC (HLA) class

The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement