Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
4 Mapping The genes that specify the biological heritage of each human being are arranged along chromosomes in a nearly invariant order. Conse- quently, simple one-dimensional maps can specify the genetic orga- nization of the human, as well as other species. Some applications of these maps have already been described. In this chapter, the committee provides a more detailed view of the types of chromosome maps, their uses, and technical problems affecting their construction. In considering current and future uses of maps in genetics, it is important to recognize that the exploration of the human genome is at an early stage. The roles of maps in human genetics may be expected to change with time. Over the next few years, maps will largely be used to guide the search for the DNA sequences responsible for particular genetic diseases and in genetic counseling. As systematic studies of the structure and function of the human genome expand, the role of maps in organizing information and planning new types of research will increase in importance. It would be impossible, for example, to organize systematic DNA sequencing of the human genome without precise maps of the regions to be sequenced. Even when extensive sequence data become available, maps will remain indispensable to a wide variety of genetic data, including the sequences themselves. The continued value of chromosome maps has been demonstrated for viruses whose genomes have been completely mapped and sequenced. Researchers who study such viruses keep detailed maps of the viral genome within reach at all times, but consult the sequence data less frequently. Genetic linkage maps and physical maps (even when incomplete), as well as partial sequences, have been 34
MAPPING 35 of value in research on Escherichia cold (a bacterium) and Drosophila (a fly). In the latter, maps have been of critical importance in guiding investigators and have provided direction to the regions of interest that need to be sequenced. A similar future awaits maps of the human genome: These maps will not only be critical tools during the coming decades of discovery, but will also form a permanent part of the basic description of humankind's genetic endowment. Early Cytological Mapping Efforts Depended on Examining Chromosomes Under the Light Microscope All types of mapping involve measuring the positions of easily observed landmarks. Until recently, the only useful physical landmarks along human chromosomes have been cytogenetic bands. When cultured human cells are treated with suitable drugs during cell division, the chromosomes are easily viewed through the light microscope as wormlike shapes. Several staining procedures developed in the late 1960s and early 1970s imprint reproducible patterns of light and dark bands on chromosomes (George, 1970~. The banding pattern is believed to reflect a periodicity in the spacing of certain types of DNA sequences along chromosomes. From a mapping standpoint, this banding is important in that it allows human chromosomes to be individually recognized by light microscopy and allows an average chromosome to be subdivided into 10 to 20 regions. Banding patterns provide the basis for a physical map of the chromosomes, often referred to as a cytogenetic map. In clinical genetics, examination of the banding patterns has led to diagnosis of such conditions as the Down syndrome, a genetic disease usually caused by the presence of an extra copy of chromosome 21 (Lejeune et al., 1959~. Since the late 1960s, it has been possible to assign many genes to locations on the cytogenetic map by the techniques of somatic cell genetics (Weiss and Green, 1967~. In these techniques, rodent and human cells are fused to form hybrid cells that can be grown in culture. These cells generally lose all but one or a few human chromosomes, but different human chromosomes, or parts thereof, are retained in different cell lines. Chromosome banding is used to determine which portions of the human genome have been retained in particular cell lines. Consistent co-retention of a region of the genome and a human biochemical trait allows the genetic determinant of that trait to be assigned to a position on the cytogenetic map. More than l,OOO genes and other DNA sequences have now been assigned to positions on the cytogenetic map (McKusick, 1986~. Mapping activities have provided an important focus for international
36 MAPPING AND SEQUENCING THE HUMAN GENOME activities in human genetics, including studies by laboratories in at least 12 countries on 4 continents. International gene-mapping work- shops hay-e been organized every year or two since 1973. The ninth workshop was held in Paris in September 1987. The Current Revolution in Genome Mapping Is Based on the Use of Recombirzant-DNA Techniques The systematic application of recombinant-DNA technology to chromosome mapping began in approximately 1980. Since that time, it has become apparent that recombinant-DNA techniques can poten- tially create chromosome maps with an accuracy and level of detail that only a few years ago seemed unachievable. It is no exaggeration to say that current maps of human chromosomes compare in quality to the navigational charts that guided the explorers of the New World. Another decade of special effort directed toward mapping the human genome could yield maps comparable to the best modern maps of the earth's surface. No single application of recombinant DNA-technology is responsible for creating this historic opportunity for progress in human genetics. Instead, the revolution in chromosome mapping has developed on several fronts, all of which are spin-offs of the extraordinary advances in DNA experimentation that took place during the 1970s. Methods of cloning DNA molecules from any organism into microbial cells, of cleaving molecules at specific sites, and of separating DNA fragments that differ only slightly in size have all contributed to present mapping capabilities. Also of major importance are DNA-probe techniques that allow a particular DNA sequence, usually obtained from a DNA clone, to be used to detect other DNA molecules with similar or identical sequences in unclonecl DNA that is extracted from human or other ceils. Whether chromosome mapping is being done at the level of the chromosomal DNA molecule (physical mapping) or by following the pattern in which portions of chromosomes are passed through pedigrees (genetic linkage mapping), the experimental face of chromosome mapping has changed beyond recognition since 1980. Nevertheless, the scale of current activity is small relative to the amount of work that must be done to study the unexplored territory in the human genome. Only a major special effort directed toward systematic mapping of the human chromosomes will allow this revolution to produce, within a decade or less, a comprehensive, detailed map of the human genome.
MAPPING 37 FUNDAMENTALS OF GENOME MAPPING Physical Maps Describe Chromosomal DNA Molecules, Whereas Genetic Linkage Maps Describe Patterns of Inheritance Physical maps specify the distances between landmarks along a chromosome. Ideally, the distances are measurer! in nucleotides, so that the map provides a direct description of a chromosomal DNA molecule. The most important landmarks in physical mapping are the cleavage sites of restriction enzymes. The maps can be calibrated in nucleotides by measuring the sizes of the DNA fragments produced when a chromosomal DNA molecule is cleaved with a restriction enzyme. Restriction mapping has not yet been extended to DNA molecules as large as human chromosomes. Physical maps of human chromo- somes are now based largely on the banding patterns along chromo- somes as observed in the light microscope. One can only estimate the number of nucleotides represented by a given interval on the map; furthermore, the amount of DNA present in different bands of the same size may not be constant since there are likely to be regional variations in the extent to which chromosomes condense during cell division. Nonetheless, cytogenetic maps are considered to be physical maps because they are based on measurements of actual distance. In contrast, genetic linkage maps describe the arrangement of genes and DNA markers on the basis of the pattern of their inheritance. Genes that tend to be inherited together (i.e., linked) are close together on such maps, and those inherited independently of one another are distant. Genes from different chromosomes are inherited indepen- dently and thus are always unlinked. Genes on the same chromosome can be tightly or loosely linked or unlinked, as reflected in the probability that they will be separated from one another during sperm or egg production. The genes can be separated if the chromosome breaks and exchanges parts with the other member of the chromosome pair, a process know as crossing over or genetic exchange. The farther apart two genes are on the chromosome, the more frequently such an exchange will occur between them. Exchange is a complex genetic process that accompanies the formation of sperm cells in the male and egg cells in the female. Unlike other cells, which contain two copies of each chromosome (except for the special case of the X and Y chromosomes in males), sperm and egg cells contain only a single copy of each chromosome. A particular sperm or egg cell, however, does not simply receive a
38 MAPPING AND SEQUENCING THE HUMAN GENOME precise copy of one of the two parental versions of each chromosome: Instead, each sperm or egg receives a unique composite of the two versions, produced by the series of cutting and splicing events that constitute genetic exchange. Indeed, the great variety of individual chromosomes that can be produced by exchange and independent assortment is responsible for much of the genetic individuality of different humans. The order of genes on a chromosome measured by linkage maps is the same as the order in physical maps, but there is no constant scale factor that relates physical and genetic distances. This variation in scale exists because the process of exchange does not occur equally at all places along a chromosome. Nor does exchange take place at the same rate in the two sexes; hence, as maps become more accurate, there will have to be separate genetic linkage maps for males and females. Because they describe the arrangement of genes at the most fundamental level, physical maps are gaining in importance relative to genetic linkage maps in most areas of biological research. They can never displace genetic linkage maps, however, which are distinc- tive in their ability to map traits that can be recognized only in whole organisms. Disease genes are particularly important illustrations of this point. Huntington's disease and cystic fibrosis, for example, have catastrophic effects on patients, but cannot be recognized in the types of cultured cells that are suitable for genetic studies. Only by studying the patterns in which these diseases are inherited in affected families has it been possible to localize the defective genes on chromosome maps. Because of the unique ability of genetic linkage mapping to define and localize disease genes, increasing the number of genetic markers available for this type of mapping should receive major emphasis in any overall program to map the human genome. A type of physical map that provides information on the approximate location of expressed genes is a complementary DNA (cDNA) map. A gene that is expressed will produce messenger RNA (mRNA) molecules in those cells in which the gene is active (Figure 2-3~. The physical mapping of expressed genes (exons) is possible by using the DNA prepared from messenger RNA in the process called reverse transcription (in which an enzyme synthesizes a complementary strand of DNA by copying an RNA molecule that serves as a template). The availability of cDNAs permits the localization of genes of unknown function, including genes that are expressed only in differentiated tissues, such as the brain, and at particular stages of development and differentiation. Because they are expressed, they are likely to be the biologically most interesting part of the genome and therefore can
MAPPING 39 usefully be the focus for early sequencing. In addition, knowledge of their map locations provides a set of likely candidate genes to test once the approximate location of a gene that is altered in a particular disorder has been mapped by genetic linkage techniques. To this point, about 4,100 expressed gene loci have been identified by all methods (McKusick, 19864. Identification of the rest of the 50,000 to 100,000 genes in the haploid genome will come eventually with complete sequencing, but can be greatly facilitated in the immediate future by the cDNA map. This map contains information of great biological and medical significance simply because it represents the expressed portion of the genome. The Development of Ordered Collections of DNA Clones Is an Important Adjunct to Physical Mapping In theory, sensitive DNA-probe technologies make it possible to construct physical maps while cloning only a small fraction of the genome that is being mapped. In practice, however, this approach is suitable only for the coarsest level of physical mapping. At higher resolutions, most physical mapping is likely to be carried out on collections of DNA clones that have been ordered according to their positions in the original genome. The individual clones are especially useful because they provide an inexhaustible source of the DNA from each genomic region. The vectors used for DNA cloning can be plasmids, bacterial viruses, modified bacterial viruses called cosmids, or artificial yeast chromosomes. All of these types of DNA molecules are characterized by the ability to replicate exactly as autonomous units inside suitable host cells. Having ordered clone collections is also a prerequisite to most methods of sequencing the genome since the clones would provide the actual DNA fragments that would be purified and prepared for DNA sequencing. Both Physical and Genetic Linkage Maps Can Be Constructed with Various Degrees of Resolution and Connectivity All types of mapping presuppose an inherent trade-off between the level of detail, or resolution, in a map and the extent to which the map provides a convenient overview of the mapping objective (its connectivity). An atlas of street maps for all the major cities in a state, for example, has high resolution but low connectivity. Separate maps must be presented for each city since a fully connected map of the whole state at the same resolution used for the street maps would be too big to be useful. As a practical matter, constructing maps that combine high reso
40 MAPPING AND SEQUENCING THE HUMAN GENOME lution and high connectivity is difficult. This technical challenge is likely to be the dominant problem in the systematic physical mapping of the human genome. The nature of the difficulty can be appreciated by analogy with conventional cartography. Suppose, for example, that the only two sources of data available for mapping the United States were satellite pictures of multistate regions and local property surveys. An adequate set of overlapping satellite pictures would allow construction of a fully connected, low-resolution map, whereas the local surveys would provide detailed maps of small regions. It would be extremely difficult, however, to relate the two types of data. In principle, this problem could be solved by painstakingly piecing together the local-survey maps unfit they covered regions large enough to discern on the satellite pictures. In practice, however, accuracy would suffer as the survey maps were pieced together, since regions such as lakes and deserts would disrupt connectivity. In general, the only powerful solution to this type of problem lies in the development of mapping methods that can achieve a series of intermediate reso- lutions. In chromosome mapping, cytogenetic maps of the banding patterns seen in the light microscope correspond to the satellite pictures, . . - . - · . . . .. . . ~ whereas restrlctlon-slte maps correspond to the local surveys. Even the most extensive restriction-site maps of local regions of human chromosomes do not yet cover even a single band on the cytogenetic map. Prospects for filling in intermediate levels of the resolution hierarchy are good, but these techniques are still being developed. Ultimately, the DNA sequence will represent the physical map of the human genome at the highest possible resolution. Nonetheless, as the analogy with conventional cartography suggests, sequencing cannot stand alone: It must anchor at the high-resolution end a program of mapping at a whole series of resolutions. cow - an- - - r - - ~ - ~ 7 - - ~ - ~-I GENETIC LINKAGE MAPPING Restriction Fragment Length Polymorphisms Are Convenient Landmarks for Genetic Linkage Mapping Human beings differ from one another at many points in their genomes: Some of these differences account for differences in traits such as eye color, blood type, height at maturity, or susceptibility to a particular disease. Most differences, however, have few or no consequences in terms of the appearance or function of the individual. Nonetheless, they can still be detected since they cause subtle differences in proteins or, at a minimum, in the DNA sequence. The
MAPPING 41 phenomenon of multiple genetic variants at a particular site in the genome is called polymorphism. With the advent of recombinant- DNA methods and, more particularly, DNA-probe technology, a versatile type of polymorphism called restriction fragment length polymorphism (Ramp) has come to dominate human genetic linkage mapping (Botstein et at., 1980; White et at., 19851. Romps are DNA- sequence polymorphisms that result in variations in the local restriction map at particular sites in the genome. These variations are readily detected in small amounts of DNA extracted from blood samples. The inheritance of Ramps can be followed through families by analyzing DNA from parents and children. Because (with the exception of the X and Y chromosomes in males) each of us has two versions of each chromosome, we have two versions of each gene and DNA sequence-one inherited from each of our parents. Thus, polymorphic DNA sequences such as genes or RF~Ps.can be present in one person in two different forms. In such a case, the person is said to be heterozygous, carrying two different forms, called alleles, of the polymorphic gene or sequence. Heterozygosity allows investigators to track genes through families and to detect linkage. An ideal genetic marker is one that exists in so many distinct forms that every individual is heterozygous, and unrelated individuals are heterozygous for different forms. In this case, the marker can be traced unambiguously from grandparent to parent to child in every family group studied, allowing the inheritance of linked genes in the family to be traced accurately and efficiently. Actual RF~Ps don't approach this ideal, but a newly discovered type of molecular marker comes much closer. These VNTRs (variable number random repeats) are short repeated regions that vary in length and may exist in a dozen (rather than just two) identifiable forms. Genetic Linkage Mapping Requires the Study of Many People in Large Family Groups Two genes that are close to one another on a chromosome show tight linkage: The particular alleles of the two genes that a person inherits from one of his or her parents are almost always passed on together to that person's children. However, two genes that are farther apart but still on the same chromosome are more likely to be separated by exchange during sperm or egg production. The probability of such an exchange increases with the physical distance between the genes, thereby accounting for the observation that genes are ordered in the same way by genetic linkage and by physical mapping. To measure the degree of exchange between two genes, the
42 MAPPING AND SEQUENCING THE HUMAN GENOME frequency of co-inheritance of parental allele combinations must be measured on a statistically significant sample. From a practical standpoint, detection of linkage requires the measurement of the allele combinations passed from one generation to the next by at least 10 sperm or egg cells, meaning that at least five offspring must be examined from a fully informative mating, (i.e., both parents heter- ozygous at both sites with all parental alleles distinguishable from one another). However, an accurate measurement of the extent of linkage requires the examination of even more people. The unit of distance in genetic linkage mapping is called the centimorgan (cM), in honor of the great American geneticist Thomas Hunt Morgan. By definition, two sites that are spaced by 1 cM have a 1 percent probability of being separated by exchange during sperm or egg production. Averaged over the whole genome, 1 cM on the genetic linkage map corresponds to approximately 1 million nucleotide pairs, although the relation between genetic and physical distances varies considerably. Great progress has been made in genetic linkage mapping with Romps since the concept was introduced in 1980 (Botstein et al., 19801. Hundreds of Ramps have been described, and many maps of whole chromosomes and portions of chromosomes have been pub- lished (Drayna and White, 19851. The major laboratories engaged in REAP mapping have formed a highly effective collaboration centered around the Centre d' etude Polymorphisme Humain (CEPH) in Paris (Marx, 1985; Dausset, 19861. In CEPH, collaborating investigators are provided with DNA from cultured cells derived from the lympho- cytes of the members of 40 families having an average of approximately eight children each, as well as both parents and all four grandparents. This collection comprises approximately 600 progeny chromosome sets. By agreement among the collaborators, RF~Ps that are mapped with any of the CEPH families are analyzed throughout all the families for which they are informative. Consequently, information is steadily accumulating about the po- sitions of recombination events in all the progeny chromosomes in the collection. The data are pooled and distributed at regular intervals to all interested investigators. This international collaboration has greatly speeded human genetic linkage mapping and lowered the entry barriers for new investigators who are interested in joining the effort. In fact, the large demand for this material makes it important to increase the number of cultured cells chosen from families that are especially useful for linkage studies. A genetic linkage map of the entire human genome at an average resolution of about at a 10 cM was recently reported (Doris-Keller et al., 1987~. Current technology seems to allow construction of an
MAPPING 43 REAP map with an average resolution of 1 cM within the next several years. This increase in resolution would require the mapping of several thousand Romps on a set of families larger than the current CEPH collection. Recent innovations in human linkage mapping now allow three- point and higher multipoint mapping to be performed. This makes mapping more efficient and more like the Drosophila mapping that has been so productive. Maps will also be of primary importance in areas such as genetic counseling and in disease research. Obtaining markers on both sides of the genes of interest will provide more reliable information. Genetic linkage maps of humans will require special statistical and computer techniques because humans, unlike experimental animals, often have few siblings. Computers also make it possible to do linkage analysis of complex pedigrees. RFLPs Are Useful for Interrelating Physical and Genetic Linkage Maps Genetic linkage mapping allows those genes with no known cellular or molecular effects to be located on the human genome. On the other hand, physical maps describe the DNA molecules present in chro- mosomes. REAP markers can easily be localized on either type of map. Not only can REAP markers be placed on the genetic linkage map in family studies, but also, because the probes that are used to recognize RF~Ps are themselves DNA molecules, their positions on a physical map can be determined in a variety of straightforward ways. Exact alignment between the genetic linkage and physical maps of the human genome at a large number of sites is therefore possible. This will greatly facilitate finding the actual DNA sequences that correspond to a gene once such a gene is localized on the genetic linkage map. In addition, making maps continuous across entire chromosomes will be easier by genetic linkage mapping, whereas maps of higher resolution (finer than a million nucleotides) will be easier to achieve by physical mapping. The more points at which the two maps can be exactly aligned, the greater the opportunity to take advantage of this complementarily, which will help solve the connec- tivity problem that arises when making maps of high resolution. A Reference RFLP Map for the Human Would Be a Critical Tool for Studying Inherited Diseases RAP mapping provides a powerful, comprehensive approach to the study of inherited diseases. Ideally, the centerpiece of this approach
44 MAPPING AND SEQUENCING THE HUMAN GENOME would be a reference REAP map, at ~ cM resolution, determined from normal families. Once completed, the project of constructing such a map would provide human geneticists with a permanent archive of several thousand DNA probes that would detect polymorphisms throughout the genome at an average spacing of 1 million nucleotides. To apply this resource to the study of a particular inherited disease, an investigator would test DNA samples from families afflicted by a particular inherited disease with a uniformly spaced subset of perhaps 5 percent of these probes. Once rough linkage was tentatively detected, typically with a recombination frequency of 10 percent between the mutant gene that caused the disease and the polymorphism that was detected by the probe, the linkage could be rapidly confirmed and the position of the disease gene refined by follow-up analyses conducted with more closely spaced probes, selected to cover the region of interest thoroughly. Because the same REAP polymorphisms are not segregating in all families, more sites are required than might seem necessary. For this reason more reference pedigrees are needed. In addition, research in highly polymorphic sites and ways of detecting them should be encouraged. At present, genetic linkage mapping with Romps is often begun with essentially random probe collections; once weak linkage is detected, the refinement of the position of the disease gene is extremely laborious since new sets of probes must be developed. Nonetheless, when major resources are directed to the study of particular diseases- such as cystic fibrosis and Huntington's disease progress can be impressive. Only a few years ago, nothing was known about the position in the genome of the gene responsible for either of these diseases, and no compelling evidence existed that either was caused by mutations in the same gene in different afflicted families. Now, as a result of the REAP approach, both genes have been mapped with great precision and shown to have a common genetic basis in most or all cases (Gusella et al., 1983; White, 19864. Equally important, the RAP approach, because of its ability to interrelate genetic linkage and physical mapping, has laid the groundwork for locating and analyzing the actual DNA sequences responsible for the diseases by coupled strategies of physical mapping and cloning, starting with the DNA clones used to probe for the linked Ramps. Generalization of this strategy to the large variety of known inherited disorders could be expected to advance our understanding of basic human biology as well as to direct improvements in the diagnosis and treatment of many diseases. The reference REAP map for the human- and its associated collection of well-tested DNA probes would dramatically improve the efficiency of this research, allow the study
MAPPING 45 of diseases in smaller family groups, and improve the practicality of studying diseases that are caused by alterations in more than one gene. The study of multigenic disorders could ultimately -revolutionize medicine, since there are likely to be multigenic genetic predispositions to such common disorders as cancer, heart disease, and schizophrenia. MAKING PHYSICAL MAPS Medium-Resolution Mapping of Restriction' Sites Is Facilitated by New Methods of Preparing and Separating Large DNA Molecules At low resolution, cytogenetic mapping of banded chromosomes is already advanced. At high resolution, methods such as restriction- site mapping and DNA sequencing of clones are well established. Major issues of efficiency must be considered in applying these methods to the human genome, but, in principle, there are no major obstacles. However, until recently, the middle range contained a serious gap between the highest resolution achievable in cytogenetic mapping with the light microscope (10 million nucleotides) and the lowest resolution achievable by restriction-site mapping (10,000 nu- cleotides). At present, prospects of bridging this 1,000-fold gap in resolution to connect the two types of maps by increasing the resolution of cytogenetic mapping are limited. Until recently, two substantial obstacles existed to bridging it from the other direction by extending restriction-site mapping to lower resolutions (and thus longer dis- tances). The first obstacle was a lack of restriction enzymes that cleave human DNA infrequently enough to produce the very large DNA fragments needed for low-resolution mapping. The second was an inability to separate and measure the sizes of DNA fragments appreciably larger than 20,000 nucleotides. During the past 5 years, major progress has been made toward solving both of these problems. Restriction enzymes have been discovered that cleave DNA into fragments with average sizes ranging from 100,000 to 1 million nucleotides. In addition, a method known as pulsed-field gel electro- phoresis, which allows the separation of DNA fragments as large as 10 million nucleotides, has been introduced (Schwartz and Cantor, 1984). Now that it is possible to generate, separate, and measure large DNA fragments, a variety of ways of constructing restriction-site cleavage maps exist. Cleaving a DNA genome infrequently at specific
46 MAPPING AND SEQUENCING THE HUMAN GENOME sites with appropriate restriction enzymes produces many large DNA fragments of different sizes. These fragments can then be separated from each other by electrophoresis through agarose gels. The DNA bands that result can be seen either by direct DNA staining or by nucleic-acic} hybridization with appropriate DNA probes. (The latter technique takes advantage of the specificity of complementary base- pairing between two DNA strands, which allows one highly radioactive DNA molecule the DNA probe- to be used to find its one comple- mentary partner in a mixture that contains millions of other DNA molecules.) Although these methods allow different fragments of chromosomes to be separated and their contents of probe sequences to be determined, they provide no information regarding the order of these fragments along the chromosome. However, the 50 to 500 different large fragments produced from each human chromosome can be ordered by an extension of such analyses. One way involves cutting the genome at two distinct sets of sites with two different restriction enzymes, a procedure that generates two families of large DNA fragments that overlap. The fragments that are neighbors in the genome can then be identified with appropriate DNA probes since two overlapping fragments will hybridize to the same probe. In another method, only a single restriction enzyme is used to produce the large DNA fragments. In addition, however, a set of small DNA probes, called linking probes, is generated by selectively cloning the short segments of DNA that surround each of the cleavage sites for the restriction enzyme used to make the large fragments. Because linking probes contain sequences from both sides of a particular restriction site, each should hybridize to two different large fragments when used as a DNA probe, thereby demonstrating that these particular large fragments are neighbors in the genome (Poustka and Lehrach, 19864. The largest DNA molecule that has been mapped with restriction enzymes that cleave DNA infrequently is the single chromosome of E. cold (4.7 million nucleotides) (Kohara et a/i., 1987; Smith et al., 1987~. The average spacing of the mapped sites is approximately 200,000 nucleotides. Progress in achieving an E. co11i map at higher resolution, largely by analyzing ordered sets of DNA clones, is also proceeding rapidly. The smallest human chromosome is 10 times as large as the E. Hopi chromosome. Although difficult to construct, its physical map could be determined by methods that are generally similar to those applied to E. chili. In principle, such an effort would best be carried out after the human chromosomes were separated from each other, to prevent the DNA fragments of the other chromosomes from complicating the analysis of the one chromosome of interest. In recent years, progress
MAPPING 47 in chromosome-separation technology has been impressive, but expert opinion remains divided as to whether the final samples are pure enough and contain enough DNA to have a major impact on physical mapping projects. The chromosomes to be separated are isolated from human cells undergoing division, a stage of the cell's life cycle when the chromosomes are condensed and stable. They can be separated according to size by flow cytometry-a method in which the amount of DNA present in condensed chromosomes is analyzed while the chromosomes flow one by one through a small tube. Computer- controlled systems allow each individual chromosome to be diverted to a designated collection tube depending on its DNA content. DNA samples prepared from chromosomes separated in this way have already served as an important source for producing clone collections that are highly enriched for the DNA sequences of a particular human chromosome. High-Resolufion Mapping of Restriction Sites Will Require the Use of Ordered Collections of DNA Clones The purification of human chromosomes can only moderately decrease the complexity of the DNA samples used for mapping. In contrast, cloning techniques offer large decreases in complexity: Through chromosome separation, the complexity of the samples can be reduced 10- to 100-fold, whereas cosmic cloning reduces the complexity of individual samples 100,000-fold. Furthermore, unlike separated samples of human chromosomes, DNA clones will replicate in microbial hosts, thereby allowing the production of as much DNA as needed. For these reasons, doing as much physical mapping as possible on cloned DNA has overwhelming advantages. Particularly for high-resolution mapping, the preferred source of DNA samples for physical mapping will be ordered collections of DNA clones a set of cloned DNA fragments that have been sufficiently analyzed that they can be arranged to reflect the order of their corresponding DNA fragments on the original chromosomes. Since the clones are usually generated in a way that produces cloned DNA fragments that start and stop at random sites along the chromosome, each member of the collection will normally overlap extensively with several neighbors, and the entire collection will have considerable redundancy (i.e., any segment of the chromosome will be represented in several different clones). Fingerprinting Methods Can Be Used to Order DNA Clones Preparing an ordered-clone collection involves cloning DNA frag- ments as molecules that can replicate in a microbial host, determining
48 MAPPING AND SEQUENCING THE HUMAN GENOME the order of these fragments in the genome, and propagating the fragments in pure form to make them widely available for subsequent analysis. Much can already be done in these respects, and the prospects for rapid advancement of technical capabilities are good. The prop- erties of the cloned DNA fragments can then be used to reconstruct their original order in the genome. For a set of random clones, some clones will partially overlap the region of the genome covered by other clones. A characteristic of the overlapping region can be measured, such as the detailed pattern of cutting by a set of restriction enzymes. This analysis is performed for a large number of clones individually, and then a computer search of the patterns is used to place clones in order (neighboring clones are those that share part of their patterns). This method is called fingerprinting, since the identi- fying DNA characteristics of each cloned segment are analogous to a fingerprint of the DNA fragment. Fingerprinting methods have recently been used successfully to order large numbers of cloned DNA segments in yeast, E. tori, and nematode genomes (CouIson et ai., 1986; Olson et al., 1986; Daniels and Blattner, 1987; Kohara et al., 1987~. in principle, this method should provide an efficient way to group DNA clones into contiguous regions that cover 90 percent or more of the genome. A common problem, however, is that the matching of contiguous segments proceeds rapidly at first and then slows. Finishing the process by using DNA-probe techniques to find the clones needed to fill in the map then becomes time-consuming and tedious. The unexpectedly large number of gaps have two principal explanations: (1) Not all overlapping segments are being recovered because of biases inherent in the DNA cloning procedures used, and (2) the fingerprint information collected for the overlapping DNA segments lacks sufficient precision to distinguish all DNA fragments from each other unambiguously. Progress in both areas may be expected as a wider variety of cloning systems are explored and more sophisticated fingerprinting methods are developed. For example, alternatives to the use of restriction enzyme cutting patterns as the fingerprint are being explored (Poustka et al., 19861. The Optimal Method for Preparing Ordered Collections of DNA Clones Is Not Yet Clear Although the general principles of working with ordered collections of DNA clones are well established, the technology is in a state of flux. A promising recent development is the demonstration that yeast can be used as host cells for cloning large human DNA segments.
MAPPING 49 Several laboratories have shown that DNA fragments as long as 500,000 nucleotides can be cloned as artificial chromosomes in yeast. These fragments are 10 times the size of the fragments that can be cloned with current bacterial-host systems (Burke et al., 19871. Further development of systems for cloning large DNA molecules will greatly enhance the efficiency of ordering DNA fragments. For example, it should be possible to prepare DNA clone collections by using a single restriction enzyme that cuts DNA infrequently; this procedure would generate a single family of large DNA fragments that are then cloned. This family would be much less complex than the collection of randomly cut clones required for the fingerprinting method. A second set of short DNA clones that specifically includes all the rare restriction sites that were cut to make the large fragments could then be used as linking probes to establish the continuity between adjacent large fragments, thereby allowing the large fragments to be ordered along the genome. The cDNA clones representing the transcribed regions of the genome represent an alternative source of probes that could be used to demonstrate the adjacency of large cloned fragments. Because cDNA clones are made by reverse transcription of mRNAs, they lack the intron sequences that interrupt the exons in the genomic DNA. The exons that have been joined together in the cDNAs will often be encoded by the DNA from more than one large genomic fragment, so that DNA probes prepared from cDNAs can be used to order the fragments from adjacent portions of the genome. This method has the advantage that the cDNA clones are themselves of special interest since they represent the portion of the genome that is selectively expressed in cells. Still another source of useful probes would be a set of RAP DNA probes that have been ordered by genetic linkage analyses of standard families. An REAP map with a 1-cM resolution would provide markers separated by 1 million nucleotides, on average. If a DNA clone collection of human genome fragments that averaged several million nucleotides in size could be constructed, it could be readily ordered with these markers. For certain methods at least, the task of ordering the DNA clones obtained from the human genome is complicated by the considerable repetition of DNA sequences in the genomes of higher organisms. These sequences are largely absent from the E. chili, nematode, and yeast genomes from which ordered clone collections have thus far been prepared. Additional problems are expected from the instability of selected clones observed when E. coil serves as the host for cloned DNA; it is too early to know whether these problems will also apply
so MAPPING AND SEQUENCING THE HUMAN GENOME to the newer yeast cloning systems. For all these reasons, it is uncertain which cloning and linking methods will prove most effective for a human genome project. Further methodological developments could even supplant all present methods. IMMEDIATE APPLICATIONS OF CHROMOSOME MAPS A number of important applications of chromosome maps could be pursued even while the various mapping activities are progressing. We have already discussed how even a partial map can be expected to facilitate the isolation of specific human disease genes. Maps will also support early sequencing efforts. The lower resolution physical maps will provide a framework within which to organize the highly fragmentary sequence data that will be generated by these initial sequencing efforts, while the ordered-clone collections will provide the actual fragments that are subcloned for final sequencing (see Chapter 51. Chromosome maps can also be usefully applied to begin a systematic assignment of expressed genes to map positions. Most DNA in the human genome is either not part of an expressed gene or is in one of the many intervening sequences (introns) that separate the protein- coding portions of expressed genes. As previously stated, the cloning of cDNA produces only the coding DNA sequences present in expressed genes (the exons and not the introns). It is possible to make large collections of cDNA clones derived from the genes that are expressed in particular tissues or at a particular stage of development and differentiation and to embark on the systematic assignment of each expressed gene to a map position on the chromosomes. Methods are being developed to avoid the standard problem with cDNA, which is that genes expressed at a low level are often missed, whereas genes expressed at a high level produce much mRNA and therefore are obtained repeatedly as cDNA clones. These methods aim at producing '`normalizecI" cDNA libraries, in which each expressed DNA se- quence is equally represented. Initially, the map assignments for the expressed genes could be based on the existing cytogenetic map and could be carried out by somatic cell genetic techniques, as well as by in situ hybridization of cDNAs to chromosomes. As the physical mapping and sequencing of the genome proceeded, it would require relatively little effort to refine these map assignments. CONCLUSIONS AND RECOMMENDATIONS Methods for physical and genetic linkage mapping have developed steadily and impressively over the past three decades. Today, low
MAPPING 51 resolution genetic linkage maps and cytogenetic maps exist for much of the human genome. During the past few years, these maps have led to the identification of genes or chromosome segments involved in several human diseases. These advances underscore the extent of past progress in genome mapping and the promise that it holds for contributing to improved human health. Recent Breakthroughs Have Set the Stage for Large-Scale Mapping Breakthroughs in mapping methods during the past several years have made it possible to construct chromosome maps of unprecedented completeness, accuracy, and detail. These breakthroughs include the development of techniques that have allowed 100-fold larger DNA molecules to be separated and manipulated than previously possible. In addition, new and powerful methods for following the inheritance of arbitrary segments of chromosomes through human pedigrees are available. Both physical and genetic linkage mapping have been invigorated by these developments, and important synergism has arisen between these two approaches to genomic mapping. Conse- quently, the goal of developing complete physical and genetic linkage maps of the human genome in a relatively short time is now realistic. These maps would be useful in their own right and would pave the way toward constructing the ultimate human map-the complete DNA sequence of the human genome. The task of making a human genome map will by no means be easy. The longest complete physical map that has been constructed to date is for the E. cold chromosome. This map is only No the size of the human genome. The E. cold mapping benefited from an enormous base of knowledge on the bacterium accumulated during 40 years of intensive study. For example, approximately 1,000 genes have been assigned to positions around the E. cold chromosome, whereas a comparable region of the human genome, on average, contains a single known gene. Even after the genetic linkage mapping is completed at the 1-million nucleotide resolution recommended in this report, an E. coli-sized region of the human genome would contain only a handful of genetic markers. Thus, constructing a physical map of even the smallest human chromosome with today's technology would require a substantial effort. It is anticipated that the most difficult aspect of the physical mapping will be the achievement of long-range connectivity. Although it is likely that a large proportion of the human genome could be mapped at a resolution of a few thousand nucleotides simply by relying on the fingerprinting of overlapping DNA clones, so many gaps would likely be left that the connectivity achievable by this approach would
52 MAPPING AND SEQUENCING TlIE HUMAN GENOME be poor. The committee believes that the utility of the physical map will increase dramatically as its connectivity improves. Consequently, attaining high connectivity in the physical map should be a major priority of the overall human genome project. Because the technology needed for genetic linkage mapping with Ramps is more advanced than that for physical mapping, an immediate emphasis should be placed on completing the genetic linkage map. A project with the goal of attaining of a fully connected map with an average resolution of 1 cM is strongly recommended. This goal would require that a few thousand new RF~Ps be identified and mapped by classic linkage analysis on DNA samples from a set of three-generation families. Such an effort, which could begin immediately, would be expected to require several years to complete and to cost approxi- mately $40 million. Different Mapping Methods Should Proceed in Parallel A critical feature in all mapping is that the results from different methods are additive and corroborative. For example, the restriction- site maps, the cDNA maps, and ordered DNA clone collections go hand in hand since each helps construct the other. The use of one of these maps to study human disease also requires a genetic linkage map. In turn, efforts to construct linkage maps at higher resolutions will be assisted by the existence of corresponding physical maps. Thus, no single strategy is best overall. All types of mapping need to be coordinated as part of a human genome project. The natural tendency of researchers to press forward with the detailed analysis of chromosomal regions of particular interest should be encouraged. The committee specifically recommends against a centrally imposed plan to proceed from lower to higher resolution as is implicit, for example, in proposals to complete the entire physical map before initiating pilot sequencing projects. Such sequencing projects will no doubt begin with the sequencing of large chromosomal regions of particular biological interest. The Improvement of Physical Mapping Techniques Should Be Closely Coupled to Actual Attempts to Map Large Genomes Experience teaches that the practical problems facing large-scale mapping efforts become clear only when attempts are made to apply new methods to actual map production. Many approaches that seem ideal in theory fad! for reasons that cannot be foreseen. In addition, the day-to-day press of practical problems drives the development of
MAPPING 53 useful new technology. Thus, the committee recommends that actual mapping efforts be supported now on a substantial scale. Nonetheless, a major initial focus of most laboratories involved in physical mapping projects is likely to be the development of techniques. Despite recent advances, many limitations on physical mapping methods still exist. For example, DNA fragments as large as 10 million nucleotides can be handled, but only with considerable difficulty, and such large fragments cannot yet be cloned. Ordered DNA clone collections have been started, but not completed, for several organisms with genomes that are at most ]/50 the size of the human genome. Advanced technology, such as handling larger DNA pieces, can expedite the preparation of such clone collections. In addition, the stability of the cloned DNA fragments is a major concern, since once the effort is devoted to constructing an ordered DNA clone collection, orate should be able to count on it as a permanent resource for future studies. Specific Improvements That Will Facilitate Map Construction and Usefulness Can Be Identified In each aspect of mapping, major improvements in technology seem likely to emerge over the next few years. These improvements, which should be major initial goals of the human genome project, will include increased DNA size range, increased resolution, diminished cost, and improved accuracy. Some of the specific target areas include improving or creating methods for: · Physically separating intact human chromosomes. · Isolating and immortalizing identified fragments of human chro- mosomes in cultured cell lines. · Cloning complementary DNA from low-abundance messenger RNA and obtaining "normalized" cDNA libraries. · Cloning large DNA fragments. · Purifying large DNA fragments. · Separating large DNA fragments with higher resolution. · Ordering the adjacent DNA fragments in a DNA clone bank, including mathematical and statistical work that would aid in map construction. · Automating various steps in DNA mapping, including DNA purification and hybridization analysis, and handling of many different DNA samples simultaneously. · Data recording, storage, and analysis, with attention to the mathematical and statistical problems of optimizing physical mapping
54 MAPPING AND SEQUENCING THE HUMAN GENOME and sequence assembly and to the application of statistical methods of database quality control. In addition, expanded collections of CEPH-like, three-generation families from which DNA could be distributed for genetic linkage studies will be important in facilitating map construction. Because the technology is still in its infancy, support should be directed to those research groups judged to have the greatest ability to develop technology, rather than to routine production centers staffed mainly by technicians. REFERENCES Botstein, D.. R. L. White, M. Skolnick, R. W. Davis. 1980. Construction of a genetic linkage map in man using restriction fragment length polymorphisms. Am J. Hum. Genet. 32:314-331. Burke, D. T., G. F. Carle, and M. V. Olson. 1987. Cloning of large segments of exogenous DNA into yeast by means of artificial chromosome vectors. Science 236:806-812. Coulson, A.' J. Sulston. S. Brenner, and J. Karn. 1986. Toward a physical map of the genome of the nematode Caenorhabditis elegans. Proc. Natl. Acad. Sci. U.S.A. 83:7821-7825. Daniels, D. L., and F. R. Blattner. 1987. Mapping using gene encyclopaedias. Nature 325:831-832. Dausset, J. 1986. Le centre d'etude du polymorphisms humain. Presse Med. 15:1801-1802. Donis-Keller, H., P. Green, C. Helms, S. Cartinhour, B. Welffenbach, K. Stephens, T. P. Keith, D. W. Bowden. D. R. Smith, E. S. Lander, D. Botstein, G. Akots, K. S. Rediker. T. Gravius, V. A. Brown, M. B. Rising, C. Parker. J. A. Powers, D. E. Watt, E. A. Kauffman, A. Bricker, P. Phipps, H. Muller-Kahle. T. R. Fulton, S. Ng, J. W. Schumm. J. C. Braman, R. G. Knowlton, D. F. Barker, S. M. Crooks. S. E. Lincoln, M. J. Daly, and J. Abrahamson. 1987. A genetic linkage map of the human genome. Cell 51:319-337. Drayna, D., and R. White. 1985. The genetic linkage map of the human X chromosome. Science 230:753-758. George. K. P. 1970. Cytochemical differentiation along human chromosomes. Nature 226:80-81. Gusella, J. F., N. S. Wexler P. M. Conneally, S. L. Naylor, M. A. Anderson, E. R. Tanzi, P. C. Watkins, K. Ottina, M. R. Wallace, A. Y. Sakaguchi, A. B. Young, I. Shoulson, E. Bonilla~ and J. B. Martin. 1983. A polymorphic DNA marker genetically linked to Hungtington's disease. Nature 306:234-235. Kohara, Y., K. Akiyama, and K. Isono. 1987. The physical map of the whole Escherichia cold chromosome. Cell 50:495-508. Lejeune~ J., M. Gauthier, and R. Turpin. 1959. Les chromosomes humains en culture de tissues. C. R. Hebd. Seances Acad. Sci. 248:602-603. Marx, J. L. 1985. Putting the human genome on the map. Science 239:150-151. McKusick~ V. A. 1986. Mendelian Inheritance in Man: Catalogs of Autosomal Dominant, Autosomal Recessive, and X-Linked Phenotypes, 7th ed. Johns Hopkins University Press, 13altimore.
MAPPING 55 Olson, M. V., J. E. Dutchik, M. Y. Graham, G. M. Brodeur, C. Helms, M. Frank, M. MacCollin, R. Scheinman, and T. Frank. 1986. Random-clone strategy for genomic restriction mapping in yeast. Proc. Natl. Acad. Sci. U.S.A. 83:7826-7830. Poustka A., and H. Lehrach. 1986. Jumping libraries and linking libraries: The next generation of molecular tools in mammalian genetics. Trends Genet. 2: 174-179. Poustka, A., T. Pohl, D. P. Barlow, G. Zehetner, A. Craig, F. Michaels, E. Ehrich, A.- M. Frischauf, and H. Lehrach. 1986. Molecular approaches to mammalian genetics. Cold Spring Harbor Symp. Quant. Biol. 51:131-139. Schwartz, D. C., and C. R. Cantor. 1984. Separation of yeast chromosome-sized DNAs by pulsed field gradient gel electrophoresis. Cell 37:67-75. Smith, C. L., J. F. Econome, A. Schutt, S. Klco, and C. R. Cantor. 1987. A physical map of the Escherichia cold K12 genome. Science 236: 1448-1453. Weiss, M. C., and H. Green. 1967. Human-mouse hybrid cell lines containing partial complements of human chromosomes and functioning human genes. Proc. Natl. Acad. Sci. U.S.A. 58:1104-1111. White, R. 1986. The search for the cystic fibrosis gene. Science 234: 1054-1055. White, R., M. Leppert, D. T. Bishop, D. Barker, J. Berkowitz, C. Brown, P. Callahan, T. Holm, and L. Jerominski. 1985. Construction of linkage maps with DNA markers for human chromosomes. Nature 313:101-105.