| ||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||
| Copyright © 2009. National Academy of Sciences. All rights reserved. Terms of Use and Privacy Statement |
Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 34
4
Mapping
The genes that specify the biological heritage of each human being
are arranged along chromosomes in a nearly invariant order. Conse-
quently, simple one-dimensional maps can specify the genetic orga-
nization of the human, as well as other species. Some applications of
these maps have already been described. In this chapter, the committee
provides a more detailed view of the types of chromosome maps,
their uses, and technical problems affecting their construction.
In considering current and future uses of maps in genetics, it is
important to recognize that the exploration of the human genome is
at an early stage. The roles of maps in human genetics may be
expected to change with time. Over the next few years, maps will
largely be used to guide the search for the DNA sequences responsible
for particular genetic diseases and in genetic counseling. As systematic
studies of the structure and function of the human genome expand,
the role of maps in organizing information and planning new types of
research will increase in importance. It would be impossible, for
example, to organize systematic DNA sequencing of the human
genome without precise maps of the regions to be sequenced. Even
when extensive sequence data become available, maps will remain
indispensable to a wide variety of genetic data, including the sequences
themselves. The continued value of chromosome maps has been
demonstrated for viruses whose genomes have been completely
mapped and sequenced. Researchers who study such viruses keep
detailed maps of the viral genome within reach at all times, but consult
the sequence data less frequently. Genetic linkage maps and physical
maps (even when incomplete), as well as partial sequences, have been
34
OCR for page 35
MAPPING
35
of value in research on Escherichia cold (a bacterium) and Drosophila
(a fly). In the latter, maps have been of critical importance in guiding
investigators and have provided direction to the regions of interest
that need to be sequenced. A similar future awaits maps of the human
genome: These maps will not only be critical tools during the coming
decades of discovery, but will also form a permanent part of the basic
description of humankind's genetic endowment.
Early Cytological Mapping Efforts Depended on Examining
Chromosomes Under the Light Microscope
All types of mapping involve measuring the positions of easily
observed landmarks. Until recently, the only useful physical landmarks
along human chromosomes have been cytogenetic bands. When
cultured human cells are treated with suitable drugs during cell division,
the chromosomes are easily viewed through the light microscope as
wormlike shapes. Several staining procedures developed in the late
1960s and early 1970s imprint reproducible patterns of light and dark
bands on chromosomes (George, 1970~. The banding pattern is believed
to reflect a periodicity in the spacing of certain types of DNA sequences
along chromosomes. From a mapping standpoint, this banding is
important in that it allows human chromosomes to be individually
recognized by light microscopy and allows an average chromosome
to be subdivided into 10 to 20 regions. Banding patterns provide the
basis for a physical map of the chromosomes, often referred to as a
cytogenetic map. In clinical genetics, examination of the banding
patterns has led to diagnosis of such conditions as the Down syndrome,
a genetic disease usually caused by the presence of an extra copy of
chromosome 21 (Lejeune et al., 1959~.
Since the late 1960s, it has been possible to assign many genes to
locations on the cytogenetic map by the techniques of somatic cell
genetics (Weiss and Green, 1967~. In these techniques, rodent and
human cells are fused to form hybrid cells that can be grown in
culture. These cells generally lose all but one or a few human
chromosomes, but different human chromosomes, or parts thereof,
are retained in different cell lines. Chromosome banding is used to
determine which portions of the human genome have been retained
in particular cell lines. Consistent co-retention of a region of the
genome and a human biochemical trait allows the genetic determinant
of that trait to be assigned to a position on the cytogenetic map.
More than l,OOO genes and other DNA sequences have now been
assigned to positions on the cytogenetic map (McKusick, 1986~.
Mapping activities have provided an important focus for international
OCR for page 36
36
MAPPING AND SEQUENCING THE HUMAN GENOME
activities in human genetics, including studies by laboratories in at
least 12 countries on 4 continents. International gene-mapping work-
shops hay-e been organized every year or two since 1973. The ninth
workshop was held in Paris in September 1987.
The Current Revolution in Genome Mapping Is
Based on the Use of Recombirzant-DNA Techniques
The systematic application of recombinant-DNA technology to
chromosome mapping began in approximately 1980. Since that time,
it has become apparent that recombinant-DNA techniques can poten-
tially create chromosome maps with an accuracy and level of detail
that only a few years ago seemed unachievable. It is no exaggeration
to say that current maps of human chromosomes compare in quality
to the navigational charts that guided the explorers of the New World.
Another decade of special effort directed toward mapping the human
genome could yield maps comparable to the best modern maps of the
earth's surface.
No single application of recombinant DNA-technology is responsible
for creating this historic opportunity for progress in human genetics.
Instead, the revolution in chromosome mapping has developed on
several fronts, all of which are spin-offs of the extraordinary advances
in DNA experimentation that took place during the 1970s. Methods
of cloning DNA molecules from any organism into microbial cells, of
cleaving molecules at specific sites, and of separating DNA fragments
that differ only slightly in size have all contributed to present mapping
capabilities. Also of major importance are DNA-probe techniques
that allow a particular DNA sequence, usually obtained from a DNA
clone, to be used to detect other DNA molecules with similar or
identical sequences in unclonecl DNA that is extracted from human
or other ceils. Whether chromosome mapping is being done at the
level of the chromosomal DNA molecule (physical mapping) or by
following the pattern in which portions of chromosomes are passed
through pedigrees (genetic linkage mapping), the experimental face of
chromosome mapping has changed beyond recognition since 1980.
Nevertheless, the scale of current activity is small relative to the
amount of work that must be done to study the unexplored territory
in the human genome. Only a major special effort directed toward
systematic mapping of the human chromosomes will allow this
revolution to produce, within a decade or less, a comprehensive,
detailed map of the human genome.
OCR for page 37
MAPPING
37
FUNDAMENTALS OF GENOME MAPPING
Physical Maps Describe Chromosomal DNA Molecules,
Whereas Genetic Linkage Maps Describe Patterns of Inheritance
Physical maps specify the distances between landmarks along a
chromosome. Ideally, the distances are measurer! in nucleotides, so
that the map provides a direct description of a chromosomal DNA
molecule. The most important landmarks in physical mapping are the
cleavage sites of restriction enzymes. The maps can be calibrated in
nucleotides by measuring the sizes of the DNA fragments produced
when a chromosomal DNA molecule is cleaved with a restriction
enzyme.
Restriction mapping has not yet been extended to DNA molecules
as large as human chromosomes. Physical maps of human chromo-
somes are now based largely on the banding patterns along chromo-
somes as observed in the light microscope. One can only estimate
the number of nucleotides represented by a given interval on the map;
furthermore, the amount of DNA present in different bands of the
same size may not be constant since there are likely to be regional
variations in the extent to which chromosomes condense during cell
division. Nonetheless, cytogenetic maps are considered to be physical
maps because they are based on measurements of actual distance.
In contrast, genetic linkage maps describe the arrangement of genes
and DNA markers on the basis of the pattern of their inheritance.
Genes that tend to be inherited together (i.e., linked) are close together
on such maps, and those inherited independently of one another are
distant. Genes from different chromosomes are inherited indepen-
dently and thus are always unlinked. Genes on the same chromosome
can be tightly or loosely linked or unlinked, as reflected in the
probability that they will be separated from one another during sperm
or egg production. The genes can be separated if the chromosome
breaks and exchanges parts with the other member of the chromosome
pair, a process know as crossing over or genetic exchange. The farther
apart two genes are on the chromosome, the more frequently such
an exchange will occur between them.
Exchange is a complex genetic process that accompanies the
formation of sperm cells in the male and egg cells in the female.
Unlike other cells, which contain two copies of each chromosome
(except for the special case of the X and Y chromosomes in males),
sperm and egg cells contain only a single copy of each chromosome.
A particular sperm or egg cell, however, does not simply receive a
OCR for page 38
38
MAPPING AND SEQUENCING THE HUMAN GENOME
precise copy of one of the two parental versions of each chromosome:
Instead, each sperm or egg receives a unique composite of the two
versions, produced by the series of cutting and splicing events that
constitute genetic exchange. Indeed, the great variety of individual
chromosomes that can be produced by exchange and independent
assortment is responsible for much of the genetic individuality of
different humans.
The order of genes on a chromosome measured by linkage maps is
the same as the order in physical maps, but there is no constant scale
factor that relates physical and genetic distances. This variation in
scale exists because the process of exchange does not occur equally
at all places along a chromosome. Nor does exchange take place at
the same rate in the two sexes; hence, as maps become more accurate,
there will have to be separate genetic linkage maps for males and
females.
Because they describe the arrangement of genes at the most
fundamental level, physical maps are gaining in importance relative
to genetic linkage maps in most areas of biological research. They
can never displace genetic linkage maps, however, which are distinc-
tive in their ability to map traits that can be recognized only in whole
organisms. Disease genes are particularly important illustrations of
this point. Huntington's disease and cystic fibrosis, for example, have
catastrophic effects on patients, but cannot be recognized in the types
of cultured cells that are suitable for genetic studies. Only by studying
the patterns in which these diseases are inherited in affected families
has it been possible to localize the defective genes on chromosome
maps. Because of the unique ability of genetic linkage mapping to
define and localize disease genes, increasing the number of genetic
markers available for this type of mapping should receive major
emphasis in any overall program to map the human genome.
A type of physical map that provides information on the approximate
location of expressed genes is a complementary DNA (cDNA) map.
A gene that is expressed will produce messenger RNA (mRNA)
molecules in those cells in which the gene is active (Figure 2-3~. The
physical mapping of expressed genes (exons) is possible by using the
DNA prepared from messenger RNA in the process called reverse
transcription (in which an enzyme synthesizes a complementary strand
of DNA by copying an RNA molecule that serves as a template). The
availability of cDNAs permits the localization of genes of unknown
function, including genes that are expressed only in differentiated
tissues, such as the brain, and at particular stages of development
and differentiation. Because they are expressed, they are likely to be
the biologically most interesting part of the genome and therefore can
OCR for page 39
MAPPING
39
usefully be the focus for early sequencing. In addition, knowledge of
their map locations provides a set of likely candidate genes to test
once the approximate location of a gene that is altered in a particular
disorder has been mapped by genetic linkage techniques.
To this point, about 4,100 expressed gene loci have been identified
by all methods (McKusick, 19864. Identification of the rest of the
50,000 to 100,000 genes in the haploid genome will come eventually
with complete sequencing, but can be greatly facilitated in the
immediate future by the cDNA map. This map contains information
of great biological and medical significance simply because it represents
the expressed portion of the genome.
The Development of Ordered Collections of DNA Clones
Is an Important Adjunct to Physical Mapping
In theory, sensitive DNA-probe technologies make it possible to
construct physical maps while cloning only a small fraction of the
genome that is being mapped. In practice, however, this approach is
suitable only for the coarsest level of physical mapping. At higher
resolutions, most physical mapping is likely to be carried out on
collections of DNA clones that have been ordered according to their
positions in the original genome. The individual clones are especially
useful because they provide an inexhaustible source of the DNA from
each genomic region. The vectors used for DNA cloning can be
plasmids, bacterial viruses, modified bacterial viruses called cosmids,
or artificial yeast chromosomes. All of these types of DNA molecules
are characterized by the ability to replicate exactly as autonomous
units inside suitable host cells. Having ordered clone collections is
also a prerequisite to most methods of sequencing the genome since
the clones would provide the actual DNA fragments that would be
purified and prepared for DNA sequencing.
Both Physical and Genetic Linkage Maps Can Be Constructed with
Various Degrees of Resolution and Connectivity
All types of mapping presuppose an inherent trade-off between the
level of detail, or resolution, in a map and the extent to which the
map provides a convenient overview of the mapping objective (its
connectivity). An atlas of street maps for all the major cities in a
state, for example, has high resolution but low connectivity. Separate
maps must be presented for each city since a fully connected map of
the whole state at the same resolution used for the street maps would
be too big to be useful.
As a practical matter, constructing maps that combine high reso
OCR for page 40
40
MAPPING AND SEQUENCING THE HUMAN GENOME
lution and high connectivity is difficult. This technical challenge is
likely to be the dominant problem in the systematic physical mapping
of the human genome. The nature of the difficulty can be appreciated
by analogy with conventional cartography. Suppose, for example,
that the only two sources of data available for mapping the United
States were satellite pictures of multistate regions and local property
surveys. An adequate set of overlapping satellite pictures would allow
construction of a fully connected, low-resolution map, whereas the
local surveys would provide detailed maps of small regions. It would
be extremely difficult, however, to relate the two types of data. In
principle, this problem could be solved by painstakingly piecing
together the local-survey maps unfit they covered regions large enough
to discern on the satellite pictures. In practice, however, accuracy
would suffer as the survey maps were pieced together, since regions
such as lakes and deserts would disrupt connectivity. In general, the
only powerful solution to this type of problem lies in the development
of mapping methods that can achieve a series of intermediate reso-
lutions.
In chromosome mapping, cytogenetic maps of the banding patterns
seen in the light microscope correspond to the satellite pictures,
. . - . - · . . . .. . . ~
whereas restrlctlon-slte maps correspond to the local surveys. Even
the most extensive restriction-site maps of local regions of human
chromosomes do not yet cover even a single band on the cytogenetic
map. Prospects for filling in intermediate levels of the resolution
hierarchy are good, but these techniques are still being developed.
Ultimately, the DNA sequence will represent the physical map of the
human genome at the highest possible resolution. Nonetheless, as the
analogy with conventional cartography suggests, sequencing cannot
stand alone: It must anchor at the high-resolution end a program
of mapping at a whole series of resolutions.
cow
- an- - - r - - ~ - ~ 7 - - ~ - ~-I
GENETIC LINKAGE MAPPING
Restriction Fragment Length Polymorphisms Are Convenient
Landmarks for Genetic Linkage Mapping
Human beings differ from one another at many points in their
genomes: Some of these differences account for differences in traits
such as eye color, blood type, height at maturity, or susceptibility to
a particular disease. Most differences, however, have few or no
consequences in terms of the appearance or function of the individual.
Nonetheless, they can still be detected since they cause subtle
differences in proteins or, at a minimum, in the DNA sequence. The
OCR for page 41
MAPPING
41
phenomenon of multiple genetic variants at a particular site in the
genome is called polymorphism. With the advent of recombinant-
DNA methods and, more particularly, DNA-probe technology, a
versatile type of polymorphism called restriction fragment length
polymorphism (Ramp) has come to dominate human genetic linkage
mapping (Botstein et at., 1980; White et at., 19851. Romps are DNA-
sequence polymorphisms that result in variations in the local restriction
map at particular sites in the genome. These variations are readily
detected in small amounts of DNA extracted from blood samples.
The inheritance of Ramps can be followed through families by
analyzing DNA from parents and children. Because (with the exception
of the X and Y chromosomes in males) each of us has two versions
of each chromosome, we have two versions of each gene and DNA
sequence-one inherited from each of our parents. Thus, polymorphic
DNA sequences such as genes or RF~Ps.can be present in one person
in two different forms. In such a case, the person is said to be
heterozygous, carrying two different forms, called alleles, of the
polymorphic gene or sequence. Heterozygosity allows investigators
to track genes through families and to detect linkage. An ideal genetic
marker is one that exists in so many distinct forms that every individual
is heterozygous, and unrelated individuals are heterozygous for
different forms. In this case, the marker can be traced unambiguously
from grandparent to parent to child in every family group studied,
allowing the inheritance of linked genes in the family to be traced
accurately and efficiently. Actual RF~Ps don't approach this ideal,
but a newly discovered type of molecular marker comes much closer.
These VNTRs (variable number random repeats) are short repeated
regions that vary in length and may exist in a dozen (rather than just
two) identifiable forms.
Genetic Linkage Mapping Requires the Study of
Many People in Large Family Groups
Two genes that are close to one another on a chromosome show
tight linkage: The particular alleles of the two genes that a person
inherits from one of his or her parents are almost always passed on
together to that person's children. However, two genes that are farther
apart but still on the same chromosome are more likely to be separated
by exchange during sperm or egg production. The probability of such
an exchange increases with the physical distance between the genes,
thereby accounting for the observation that genes are ordered in the
same way by genetic linkage and by physical mapping.
To measure the degree of exchange between two genes, the
OCR for page 42
42
MAPPING AND SEQUENCING THE HUMAN GENOME
frequency of co-inheritance of parental allele combinations must be
measured on a statistically significant sample. From a practical
standpoint, detection of linkage requires the measurement of the allele
combinations passed from one generation to the next by at least 10
sperm or egg cells, meaning that at least five offspring must be
examined from a fully informative mating, (i.e., both parents heter-
ozygous at both sites with all parental alleles distinguishable from one
another). However, an accurate measurement of the extent of linkage
requires the examination of even more people. The unit of distance
in genetic linkage mapping is called the centimorgan (cM), in honor
of the great American geneticist Thomas Hunt Morgan. By definition,
two sites that are spaced by 1 cM have a 1 percent probability of
being separated by exchange during sperm or egg production. Averaged
over the whole genome, 1 cM on the genetic linkage map corresponds
to approximately 1 million nucleotide pairs, although the relation
between genetic and physical distances varies considerably.
Great progress has been made in genetic linkage mapping with
Romps since the concept was introduced in 1980 (Botstein et al.,
19801. Hundreds of Ramps have been described, and many maps of
whole chromosomes and portions of chromosomes have been pub-
lished (Drayna and White, 19851. The major laboratories engaged in
REAP mapping have formed a highly effective collaboration centered
around the Centre d' etude Polymorphisme Humain (CEPH) in Paris
(Marx, 1985; Dausset, 19861. In CEPH, collaborating investigators
are provided with DNA from cultured cells derived from the lympho-
cytes of the members of 40 families having an average of approximately
eight children each, as well as both parents and all four grandparents.
This collection comprises approximately 600 progeny chromosome
sets. By agreement among the collaborators, RF~Ps that are mapped
with any of the CEPH families are analyzed throughout all the families
for which they are informative.
Consequently, information is steadily accumulating about the po-
sitions of recombination events in all the progeny chromosomes in
the collection. The data are pooled and distributed at regular intervals
to all interested investigators. This international collaboration has
greatly speeded human genetic linkage mapping and lowered the entry
barriers for new investigators who are interested in joining the effort.
In fact, the large demand for this material makes it important to
increase the number of cultured cells chosen from families that are
especially useful for linkage studies.
A genetic linkage map of the entire human genome at an average
resolution of about at a 10 cM was recently reported (Doris-Keller et
al., 1987~. Current technology seems to allow construction of an
OCR for page 43
MAPPING
43
REAP map with an average resolution of 1 cM within the next several
years. This increase in resolution would require the mapping of several
thousand Romps on a set of families larger than the current CEPH
collection.
Recent innovations in human linkage mapping now allow three-
point and higher multipoint mapping to be performed. This makes
mapping more efficient and more like the Drosophila mapping that
has been so productive. Maps will also be of primary importance in
areas such as genetic counseling and in disease research. Obtaining
markers on both sides of the genes of interest will provide more
reliable information.
Genetic linkage maps of humans will require special statistical and
computer techniques because humans, unlike experimental animals,
often have few siblings. Computers also make it possible to do linkage
analysis of complex pedigrees.
RFLPs Are Useful for Interrelating Physical and
Genetic Linkage Maps
Genetic linkage mapping allows those genes with no known cellular
or molecular effects to be located on the human genome. On the other
hand, physical maps describe the DNA molecules present in chro-
mosomes. REAP markers can easily be localized on either type of
map. Not only can REAP markers be placed on the genetic linkage
map in family studies, but also, because the probes that are used to
recognize RF~Ps are themselves DNA molecules, their positions on
a physical map can be determined in a variety of straightforward
ways. Exact alignment between the genetic linkage and physical maps
of the human genome at a large number of sites is therefore possible.
This will greatly facilitate finding the actual DNA sequences that
correspond to a gene once such a gene is localized on the genetic
linkage map. In addition, making maps continuous across entire
chromosomes will be easier by genetic linkage mapping, whereas
maps of higher resolution (finer than a million nucleotides) will be
easier to achieve by physical mapping. The more points at which the
two maps can be exactly aligned, the greater the opportunity to take
advantage of this complementarily, which will help solve the connec-
tivity problem that arises when making maps of high resolution.
A Reference RFLP Map for the Human Would Be a
Critical Tool for Studying Inherited Diseases
RAP mapping provides a powerful, comprehensive approach to
the study of inherited diseases. Ideally, the centerpiece of this approach
OCR for page 44
44
MAPPING AND SEQUENCING THE HUMAN GENOME
would be a reference REAP map, at ~ cM resolution, determined from
normal families. Once completed, the project of constructing such a
map would provide human geneticists with a permanent archive of
several thousand DNA probes that would detect polymorphisms
throughout the genome at an average spacing of 1 million nucleotides.
To apply this resource to the study of a particular inherited disease,
an investigator would test DNA samples from families afflicted by a
particular inherited disease with a uniformly spaced subset of perhaps
5 percent of these probes. Once rough linkage was tentatively detected,
typically with a recombination frequency of 10 percent between the
mutant gene that caused the disease and the polymorphism that was
detected by the probe, the linkage could be rapidly confirmed and the
position of the disease gene refined by follow-up analyses conducted
with more closely spaced probes, selected to cover the region of
interest thoroughly. Because the same REAP polymorphisms are not
segregating in all families, more sites are required than might seem
necessary. For this reason more reference pedigrees are needed. In
addition, research in highly polymorphic sites and ways of detecting
them should be encouraged.
At present, genetic linkage mapping with Romps is often begun
with essentially random probe collections; once weak linkage is
detected, the refinement of the position of the disease gene is extremely
laborious since new sets of probes must be developed. Nonetheless,
when major resources are directed to the study of particular diseases-
such as cystic fibrosis and Huntington's disease progress can be
impressive. Only a few years ago, nothing was known about the
position in the genome of the gene responsible for either of these
diseases, and no compelling evidence existed that either was caused
by mutations in the same gene in different afflicted families. Now, as
a result of the REAP approach, both genes have been mapped with
great precision and shown to have a common genetic basis in most
or all cases (Gusella et al., 1983; White, 19864. Equally important,
the RAP approach, because of its ability to interrelate genetic linkage
and physical mapping, has laid the groundwork for locating and
analyzing the actual DNA sequences responsible for the diseases by
coupled strategies of physical mapping and cloning, starting with the
DNA clones used to probe for the linked Ramps.
Generalization of this strategy to the large variety of known inherited
disorders could be expected to advance our understanding of basic
human biology as well as to direct improvements in the diagnosis and
treatment of many diseases. The reference REAP map for the human-
and its associated collection of well-tested DNA probes would
dramatically improve the efficiency of this research, allow the study
OCR for page 45
MAPPING
45
of diseases in smaller family groups, and improve the practicality of
studying diseases that are caused by alterations in more than one
gene. The study of multigenic disorders could ultimately -revolutionize
medicine, since there are likely to be multigenic genetic predispositions
to such common disorders as cancer, heart disease, and schizophrenia.
MAKING PHYSICAL MAPS
Medium-Resolution Mapping of Restriction' Sites Is Facilitated by New
Methods of Preparing and Separating Large DNA Molecules
At low resolution, cytogenetic mapping of banded chromosomes is
already advanced. At high resolution, methods such as restriction-
site mapping and DNA sequencing of clones are well established.
Major issues of efficiency must be considered in applying these
methods to the human genome, but, in principle, there are no major
obstacles. However, until recently, the middle range contained a
serious gap between the highest resolution achievable in cytogenetic
mapping with the light microscope (10 million nucleotides) and the
lowest resolution achievable by restriction-site mapping (10,000 nu-
cleotides).
At present, prospects of bridging this 1,000-fold gap in resolution
to connect the two types of maps by increasing the resolution of
cytogenetic mapping are limited. Until recently, two substantial
obstacles existed to bridging it from the other direction by extending
restriction-site mapping to lower resolutions (and thus longer dis-
tances). The first obstacle was a lack of restriction enzymes that
cleave human DNA infrequently enough to produce the very large
DNA fragments needed for low-resolution mapping. The second was
an inability to separate and measure the sizes of DNA fragments
appreciably larger than 20,000 nucleotides. During the past 5 years,
major progress has been made toward solving both of these problems.
Restriction enzymes have been discovered that cleave DNA into
fragments with average sizes ranging from 100,000 to 1 million
nucleotides. In addition, a method known as pulsed-field gel electro-
phoresis, which allows the separation of DNA fragments as large as
10 million nucleotides, has been introduced (Schwartz and Cantor,
1984).
Now that it is possible to generate, separate, and measure large
DNA fragments, a variety of ways of constructing restriction-site
cleavage maps exist. Cleaving a DNA genome infrequently at specific
OCR for page 46
46
MAPPING AND SEQUENCING THE HUMAN GENOME
sites with appropriate restriction enzymes produces many large DNA
fragments of different sizes. These fragments can then be separated
from each other by electrophoresis through agarose gels. The DNA
bands that result can be seen either by direct DNA staining or by
nucleic-acic} hybridization with appropriate DNA probes. (The latter
technique takes advantage of the specificity of complementary base-
pairing between two DNA strands, which allows one highly radioactive
DNA molecule the DNA probe- to be used to find its one comple-
mentary partner in a mixture that contains millions of other DNA
molecules.) Although these methods allow different fragments of
chromosomes to be separated and their contents of probe sequences
to be determined, they provide no information regarding the order of
these fragments along the chromosome. However, the 50 to 500
different large fragments produced from each human chromosome
can be ordered by an extension of such analyses. One way involves
cutting the genome at two distinct sets of sites with two different
restriction enzymes, a procedure that generates two families of large
DNA fragments that overlap. The fragments that are neighbors in the
genome can then be identified with appropriate DNA probes since
two overlapping fragments will hybridize to the same probe. In another
method, only a single restriction enzyme is used to produce the large
DNA fragments. In addition, however, a set of small DNA probes,
called linking probes, is generated by selectively cloning the short
segments of DNA that surround each of the cleavage sites for the
restriction enzyme used to make the large fragments. Because linking
probes contain sequences from both sides of a particular restriction
site, each should hybridize to two different large fragments when used
as a DNA probe, thereby demonstrating that these particular large
fragments are neighbors in the genome (Poustka and Lehrach, 19864.
The largest DNA molecule that has been mapped with restriction
enzymes that cleave DNA infrequently is the single chromosome of
E. cold (4.7 million nucleotides) (Kohara et a/i., 1987; Smith et al.,
1987~. The average spacing of the mapped sites is approximately
200,000 nucleotides. Progress in achieving an E. co11i map at higher
resolution, largely by analyzing ordered sets of DNA clones, is also
proceeding rapidly.
The smallest human chromosome is 10 times as large as the E. Hopi
chromosome. Although difficult to construct, its physical map could
be determined by methods that are generally similar to those applied
to E. chili. In principle, such an effort would best be carried out after
the human chromosomes were separated from each other, to prevent
the DNA fragments of the other chromosomes from complicating the
analysis of the one chromosome of interest. In recent years, progress
OCR for page 47
MAPPING
47
in chromosome-separation technology has been impressive, but expert
opinion remains divided as to whether the final samples are pure
enough and contain enough DNA to have a major impact on physical
mapping projects. The chromosomes to be separated are isolated from
human cells undergoing division, a stage of the cell's life cycle when
the chromosomes are condensed and stable. They can be separated
according to size by flow cytometry-a method in which the amount
of DNA present in condensed chromosomes is analyzed while the
chromosomes flow one by one through a small tube. Computer-
controlled systems allow each individual chromosome to be diverted
to a designated collection tube depending on its DNA content. DNA
samples prepared from chromosomes separated in this way have
already served as an important source for producing clone collections
that are highly enriched for the DNA sequences of a particular human
chromosome.
High-Resolufion Mapping of Restriction Sites Will Require the
Use of Ordered Collections of DNA Clones
The purification of human chromosomes can only moderately
decrease the complexity of the DNA samples used for mapping. In
contrast, cloning techniques offer large decreases in complexity:
Through chromosome separation, the complexity of the samples can
be reduced 10- to 100-fold, whereas cosmic cloning reduces the
complexity of individual samples 100,000-fold. Furthermore, unlike
separated samples of human chromosomes, DNA clones will replicate
in microbial hosts, thereby allowing the production of as much DNA
as needed. For these reasons, doing as much physical mapping as
possible on cloned DNA has overwhelming advantages. Particularly
for high-resolution mapping, the preferred source of DNA samples
for physical mapping will be ordered collections of DNA clones a
set of cloned DNA fragments that have been sufficiently analyzed
that they can be arranged to reflect the order of their corresponding
DNA fragments on the original chromosomes. Since the clones are
usually generated in a way that produces cloned DNA fragments that
start and stop at random sites along the chromosome, each member
of the collection will normally overlap extensively with several
neighbors, and the entire collection will have considerable redundancy
(i.e., any segment of the chromosome will be represented in several
different clones).
Fingerprinting Methods Can Be Used to Order DNA Clones
Preparing an ordered-clone collection involves cloning DNA frag-
ments as molecules that can replicate in a microbial host, determining
OCR for page 48
48
MAPPING AND SEQUENCING THE HUMAN GENOME
the order of these fragments in the genome, and propagating the
fragments in pure form to make them widely available for subsequent
analysis. Much can already be done in these respects, and the prospects
for rapid advancement of technical capabilities are good. The prop-
erties of the cloned DNA fragments can then be used to reconstruct
their original order in the genome. For a set of random clones, some
clones will partially overlap the region of the genome covered by
other clones. A characteristic of the overlapping region can be
measured, such as the detailed pattern of cutting by a set of restriction
enzymes. This analysis is performed for a large number of clones
individually, and then a computer search of the patterns is used to
place clones in order (neighboring clones are those that share part of
their patterns). This method is called fingerprinting, since the identi-
fying DNA characteristics of each cloned segment are analogous to a
fingerprint of the DNA fragment.
Fingerprinting methods have recently been used successfully to
order large numbers of cloned DNA segments in yeast, E. tori, and
nematode genomes (CouIson et ai., 1986; Olson et al., 1986; Daniels
and Blattner, 1987; Kohara et al., 1987~. in principle, this method
should provide an efficient way to group DNA clones into contiguous
regions that cover 90 percent or more of the genome. A common
problem, however, is that the matching of contiguous segments
proceeds rapidly at first and then slows. Finishing the process by
using DNA-probe techniques to find the clones needed to fill in the
map then becomes time-consuming and tedious. The unexpectedly
large number of gaps have two principal explanations: (1) Not all
overlapping segments are being recovered because of biases inherent
in the DNA cloning procedures used, and (2) the fingerprint information
collected for the overlapping DNA segments lacks sufficient precision
to distinguish all DNA fragments from each other unambiguously.
Progress in both areas may be expected as a wider variety of cloning
systems are explored and more sophisticated fingerprinting methods
are developed. For example, alternatives to the use of restriction
enzyme cutting patterns as the fingerprint are being explored (Poustka
et al., 19861.
The Optimal Method for Preparing Ordered Collections of
DNA Clones Is Not Yet Clear
Although the general principles of working with ordered collections
of DNA clones are well established, the technology is in a state of
flux. A promising recent development is the demonstration that yeast
can be used as host cells for cloning large human DNA segments.
OCR for page 49
MAPPING
49
Several laboratories have shown that DNA fragments as long as
500,000 nucleotides can be cloned as artificial chromosomes in yeast.
These fragments are 10 times the size of the fragments that can be
cloned with current bacterial-host systems (Burke et al., 19871. Further
development of systems for cloning large DNA molecules will greatly
enhance the efficiency of ordering DNA fragments. For example, it
should be possible to prepare DNA clone collections by using a single
restriction enzyme that cuts DNA infrequently; this procedure would
generate a single family of large DNA fragments that are then cloned.
This family would be much less complex than the collection of
randomly cut clones required for the fingerprinting method. A second
set of short DNA clones that specifically includes all the rare restriction
sites that were cut to make the large fragments could then be used as
linking probes to establish the continuity between adjacent large
fragments, thereby allowing the large fragments to be ordered along
the genome.
The cDNA clones representing the transcribed regions of the genome
represent an alternative source of probes that could be used to
demonstrate the adjacency of large cloned fragments. Because cDNA
clones are made by reverse transcription of mRNAs, they lack the
intron sequences that interrupt the exons in the genomic DNA. The
exons that have been joined together in the cDNAs will often be
encoded by the DNA from more than one large genomic fragment,
so that DNA probes prepared from cDNAs can be used to order the
fragments from adjacent portions of the genome. This method has the
advantage that the cDNA clones are themselves of special interest
since they represent the portion of the genome that is selectively
expressed in cells.
Still another source of useful probes would be a set of RAP DNA
probes that have been ordered by genetic linkage analyses of standard
families. An REAP map with a 1-cM resolution would provide markers
separated by 1 million nucleotides, on average. If a DNA clone
collection of human genome fragments that averaged several million
nucleotides in size could be constructed, it could be readily ordered
with these markers.
For certain methods at least, the task of ordering the DNA clones
obtained from the human genome is complicated by the considerable
repetition of DNA sequences in the genomes of higher organisms.
These sequences are largely absent from the E. chili, nematode, and
yeast genomes from which ordered clone collections have thus far
been prepared. Additional problems are expected from the instability
of selected clones observed when E. coil serves as the host for cloned
DNA; it is too early to know whether these problems will also apply
OCR for page 50
so
MAPPING AND SEQUENCING THE HUMAN GENOME
to the newer yeast cloning systems. For all these reasons, it is
uncertain which cloning and linking methods will prove most effective
for a human genome project. Further methodological developments
could even supplant all present methods.
IMMEDIATE APPLICATIONS OF CHROMOSOME MAPS
A number of important applications of chromosome maps could be
pursued even while the various mapping activities are progressing.
We have already discussed how even a partial map can be expected
to facilitate the isolation of specific human disease genes. Maps will
also support early sequencing efforts. The lower resolution physical
maps will provide a framework within which to organize the highly
fragmentary sequence data that will be generated by these initial
sequencing efforts, while the ordered-clone collections will provide
the actual fragments that are subcloned for final sequencing (see
Chapter 51.
Chromosome maps can also be usefully applied to begin a systematic
assignment of expressed genes to map positions. Most DNA in the
human genome is either not part of an expressed gene or is in one of
the many intervening sequences (introns) that separate the protein-
coding portions of expressed genes. As previously stated, the cloning
of cDNA produces only the coding DNA sequences present in
expressed genes (the exons and not the introns). It is possible to make
large collections of cDNA clones derived from the genes that are
expressed in particular tissues or at a particular stage of development
and differentiation and to embark on the systematic assignment of
each expressed gene to a map position on the chromosomes. Methods
are being developed to avoid the standard problem with cDNA, which
is that genes expressed at a low level are often missed, whereas genes
expressed at a high level produce much mRNA and therefore are
obtained repeatedly as cDNA clones. These methods aim at producing
'`normalizecI" cDNA libraries, in which each expressed DNA se-
quence is equally represented.
Initially, the map assignments for the expressed genes could be
based on the existing cytogenetic map and could be carried out by
somatic cell genetic techniques, as well as by in situ hybridization of
cDNAs to chromosomes. As the physical mapping and sequencing of
the genome proceeded, it would require relatively little effort to refine
these map assignments.
CONCLUSIONS AND RECOMMENDATIONS
Methods for physical and genetic linkage mapping have developed
steadily and impressively over the past three decades. Today, low
OCR for page 51
MAPPING
51
resolution genetic linkage maps and cytogenetic maps exist for much
of the human genome. During the past few years, these maps have
led to the identification of genes or chromosome segments involved
in several human diseases. These advances underscore the extent of
past progress in genome mapping and the promise that it holds for
contributing to improved human health.
Recent Breakthroughs Have Set the Stage for Large-Scale Mapping
Breakthroughs in mapping methods during the past several years
have made it possible to construct chromosome maps of unprecedented
completeness, accuracy, and detail. These breakthroughs include the
development of techniques that have allowed 100-fold larger DNA
molecules to be separated and manipulated than previously possible.
In addition, new and powerful methods for following the inheritance
of arbitrary segments of chromosomes through human pedigrees are
available. Both physical and genetic linkage mapping have been
invigorated by these developments, and important synergism has
arisen between these two approaches to genomic mapping. Conse-
quently, the goal of developing complete physical and genetic linkage
maps of the human genome in a relatively short time is now realistic.
These maps would be useful in their own right and would pave the
way toward constructing the ultimate human map-the complete DNA
sequence of the human genome.
The task of making a human genome map will by no means be
easy. The longest complete physical map that has been constructed
to date is for the E. cold chromosome. This map is only No the size
of the human genome. The E. cold mapping benefited from an enormous
base of knowledge on the bacterium accumulated during 40 years of
intensive study. For example, approximately 1,000 genes have been
assigned to positions around the E. cold chromosome, whereas a
comparable region of the human genome, on average, contains a single
known gene. Even after the genetic linkage mapping is completed at
the 1-million nucleotide resolution recommended in this report, an E.
coli-sized region of the human genome would contain only a handful
of genetic markers. Thus, constructing a physical map of even the
smallest human chromosome with today's technology would require
a substantial effort.
It is anticipated that the most difficult aspect of the physical mapping
will be the achievement of long-range connectivity. Although it is
likely that a large proportion of the human genome could be mapped
at a resolution of a few thousand nucleotides simply by relying on
the fingerprinting of overlapping DNA clones, so many gaps would
likely be left that the connectivity achievable by this approach would
OCR for page 52
52
MAPPING AND SEQUENCING TlIE HUMAN GENOME
be poor. The committee believes that the utility of the physical map
will increase dramatically as its connectivity improves. Consequently,
attaining high connectivity in the physical map should be a major
priority of the overall human genome project.
Because the technology needed for genetic linkage mapping with
Ramps is more advanced than that for physical mapping, an immediate
emphasis should be placed on completing the genetic linkage map. A
project with the goal of attaining of a fully connected map with an
average resolution of 1 cM is strongly recommended. This goal would
require that a few thousand new RF~Ps be identified and mapped by
classic linkage analysis on DNA samples from a set of three-generation
families. Such an effort, which could begin immediately, would be
expected to require several years to complete and to cost approxi-
mately $40 million.
Different Mapping Methods Should Proceed in Parallel
A critical feature in all mapping is that the results from different
methods are additive and corroborative. For example, the restriction-
site maps, the cDNA maps, and ordered DNA clone collections go
hand in hand since each helps construct the other. The use of one of
these maps to study human disease also requires a genetic linkage
map. In turn, efforts to construct linkage maps at higher resolutions
will be assisted by the existence of corresponding physical maps.
Thus, no single strategy is best overall. All types of mapping need to
be coordinated as part of a human genome project.
The natural tendency of researchers to press forward with the
detailed analysis of chromosomal regions of particular interest should
be encouraged. The committee specifically recommends against a
centrally imposed plan to proceed from lower to higher resolution as
is implicit, for example, in proposals to complete the entire physical
map before initiating pilot sequencing projects. Such sequencing
projects will no doubt begin with the sequencing of large chromosomal
regions of particular biological interest.
The Improvement of Physical Mapping Techniques Should Be Closely
Coupled to Actual Attempts to Map Large Genomes
Experience teaches that the practical problems facing large-scale
mapping efforts become clear only when attempts are made to apply
new methods to actual map production. Many approaches that seem
ideal in theory fad! for reasons that cannot be foreseen. In addition,
the day-to-day press of practical problems drives the development of
OCR for page 53
MAPPING
53
useful new technology. Thus, the committee recommends that actual
mapping efforts be supported now on a substantial scale.
Nonetheless, a major initial focus of most laboratories involved in
physical mapping projects is likely to be the development of techniques.
Despite recent advances, many limitations on physical mapping
methods still exist. For example, DNA fragments as large as 10 million
nucleotides can be handled, but only with considerable difficulty, and
such large fragments cannot yet be cloned. Ordered DNA clone
collections have been started, but not completed, for several organisms
with genomes that are at most ]/50 the size of the human genome.
Advanced technology, such as handling larger DNA pieces, can
expedite the preparation of such clone collections. In addition, the
stability of the cloned DNA fragments is a major concern, since once
the effort is devoted to constructing an ordered DNA clone collection,
orate should be able to count on it as a permanent resource for future
studies.
Specific Improvements That Will Facilitate Map Construction and
Usefulness Can Be Identified
In each aspect of mapping, major improvements in technology seem
likely to emerge over the next few years. These improvements, which
should be major initial goals of the human genome project, will include
increased DNA size range, increased resolution, diminished cost, and
improved accuracy. Some of the specific target areas include improving
or creating methods for:
· Physically separating intact human chromosomes.
· Isolating and immortalizing identified fragments of human chro-
mosomes in cultured cell lines.
· Cloning complementary DNA from low-abundance messenger
RNA and obtaining "normalized" cDNA libraries.
· Cloning large DNA fragments.
· Purifying large DNA fragments.
· Separating large DNA fragments with higher resolution.
· Ordering the adjacent DNA fragments in a DNA clone bank,
including mathematical and statistical work that would aid in map
construction.
· Automating various steps in DNA mapping, including DNA
purification and hybridization analysis, and handling of many different
DNA samples simultaneously.
· Data recording, storage, and analysis, with attention to the
mathematical and statistical problems of optimizing physical mapping
OCR for page 54
54
MAPPING AND SEQUENCING THE HUMAN GENOME
and sequence assembly and to the application of statistical methods
of database quality control.
In addition, expanded collections of CEPH-like, three-generation
families from which DNA could be distributed for genetic linkage
studies will be important in facilitating map construction.
Because the technology is still in its infancy, support should be
directed to those research groups judged to have the greatest ability
to develop technology, rather than to routine production centers
staffed mainly by technicians.
REFERENCES
Botstein, D.. R. L. White, M. Skolnick, R. W. Davis. 1980. Construction of a genetic
linkage map in man using restriction fragment length polymorphisms. Am J. Hum.
Genet. 32:314-331.
Burke, D. T., G. F. Carle, and M. V. Olson. 1987. Cloning of large segments of exogenous
DNA into yeast by means of artificial chromosome vectors. Science 236:806-812.
Coulson, A.' J. Sulston. S. Brenner, and J. Karn. 1986. Toward a physical map of the
genome of the nematode Caenorhabditis elegans. Proc. Natl. Acad. Sci. U.S.A.
83:7821-7825.
Daniels, D. L., and F. R. Blattner. 1987. Mapping using gene encyclopaedias. Nature
325:831-832.
Dausset, J. 1986. Le centre d'etude du polymorphisms humain. Presse Med. 15:1801-1802.
Donis-Keller, H., P. Green, C. Helms, S. Cartinhour, B. Welffenbach, K. Stephens, T. P.
Keith, D. W. Bowden. D. R. Smith, E. S. Lander, D. Botstein, G. Akots, K. S.
Rediker. T. Gravius, V. A. Brown, M. B. Rising, C. Parker. J. A. Powers, D. E. Watt,
E. A. Kauffman, A. Bricker, P. Phipps, H. Muller-Kahle. T. R. Fulton, S. Ng, J. W.
Schumm. J. C. Braman, R. G. Knowlton, D. F. Barker, S. M. Crooks. S. E. Lincoln,
M. J. Daly, and J. Abrahamson. 1987. A genetic linkage map of the human genome.
Cell 51:319-337.
Drayna, D., and R. White. 1985. The genetic linkage map of the human X chromosome.
Science 230:753-758.
George. K. P. 1970. Cytochemical differentiation along human chromosomes. Nature
226:80-81.
Gusella, J. F., N. S. Wexler P. M. Conneally, S. L. Naylor, M. A. Anderson, E. R. Tanzi,
P. C. Watkins, K. Ottina, M. R. Wallace, A. Y. Sakaguchi, A. B. Young, I. Shoulson,
E. Bonilla~ and J. B. Martin. 1983. A polymorphic DNA marker genetically linked to
Hungtington's disease. Nature 306:234-235.
Kohara, Y., K. Akiyama, and K. Isono. 1987. The physical map of the whole Escherichia
cold chromosome. Cell 50:495-508.
Lejeune~ J., M. Gauthier, and R. Turpin. 1959. Les chromosomes humains en culture de
tissues. C. R. Hebd. Seances Acad. Sci. 248:602-603.
Marx, J. L. 1985. Putting the human genome on the map. Science 239:150-151.
McKusick~ V. A. 1986. Mendelian Inheritance in Man: Catalogs of Autosomal Dominant,
Autosomal Recessive, and X-Linked Phenotypes, 7th ed. Johns Hopkins University
Press, 13altimore.
OCR for page 55
MAPPING
55
Olson, M. V., J. E. Dutchik, M. Y. Graham, G. M. Brodeur, C. Helms, M. Frank, M.
MacCollin, R. Scheinman, and T. Frank. 1986. Random-clone strategy for genomic
restriction mapping in yeast. Proc. Natl. Acad. Sci. U.S.A. 83:7826-7830.
Poustka A., and H. Lehrach. 1986. Jumping libraries and linking libraries: The next
generation of molecular tools in mammalian genetics. Trends Genet. 2: 174-179.
Poustka, A., T. Pohl, D. P. Barlow, G. Zehetner, A. Craig, F. Michaels, E. Ehrich, A.-
M. Frischauf, and H. Lehrach. 1986. Molecular approaches to mammalian genetics.
Cold Spring Harbor Symp. Quant. Biol. 51:131-139.
Schwartz, D. C., and C. R. Cantor. 1984. Separation of yeast chromosome-sized DNAs
by pulsed field gradient gel electrophoresis. Cell 37:67-75.
Smith, C. L., J. F. Econome, A. Schutt, S. Klco, and C. R. Cantor. 1987. A physical map
of the Escherichia cold K12 genome. Science 236: 1448-1453.
Weiss, M. C., and H. Green. 1967. Human-mouse hybrid cell lines containing partial
complements of human chromosomes and functioning human genes. Proc. Natl. Acad.
Sci. U.S.A. 58:1104-1111.
White, R. 1986. The search for the cystic fibrosis gene. Science 234: 1054-1055.
White, R., M. Leppert, D. T. Bishop, D. Barker, J. Berkowitz, C. Brown, P. Callahan, T.
Holm, and L. Jerominski. 1985. Construction of linkage maps with DNA markers for
human chromosomes. Nature 313:101-105.
Representative terms from entire chapter:
dna fragments