| ||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||
| Copyright © 2009. National Academy of Sciences. All rights reserved. Terms of Use and Privacy Statement |
Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 12
Introduction
All living organisms are composed of cells, each no wider than a
human hair. Each of our cells contains the same complement of DNA
constituting the human genome (Figure 1-1.) The DNA sequence of
every person's genome is the blueprint for his or her development
from a single cell to a complex, integrated organism that is composed
of more than 10~3 (10 million million) cells. Encoded in the DNA
sequence are fundamental determinants of those mental capacities-
learning, language, memory essential to human culture. Encoded
there as well are the mutations and variations that cause or increase
susceptibility to many diseases responsible for much human suffering.
Unprecedented advances in molecular and cellular biology, in bio-
chemistry, in genetics, and in structural biology-occurring at an
accelerating rate over the past decade define this as a unique and
opportune moment in our history: For the first time we can envision
obtaining easy access to the complete sequence of the 3 billion
nucleotides in human DNA and deciphering much of the information
contained therein. Converging developments in recombinant DNA
technology and genetics make obtaining a complete ordered DNA
clone collection indexed to the human genetic linkage map a realistic
immediate goal. Even determination of the complete nucleotide
sequence is attainable, although ambitious. The DNA in the human
genome is remarkably stable, as it must be to provide a reliable
blueprint for building a new organism. For this reason, obtaining
complete genetic linkage and physical maps and deciphering the
sequence will provide a permanent base of knowledge concerning all
human beings- a base whose utility for all activities of biology and
12
OCR for page 13
INTROD UCTION
13
medicine will increase with future analysis, research, and experimen-
tation.
Even the complete sequence of DNA in the human genome will
not by itself explain human biology. It will, however, serve as a great
resource, an essential data bank, facilitating future research in mam-
malian biology and medicine. Humans, like all living organisms, are
composed largely of proteins. For humans these are roughly estimated
to be of 100,000 different kinds. In general, each gene codes for the
production of a single protein, and a gene and its protein can be
related to each other by means of the genetic code. Therefore,
scientists will be able to turn to the DNA sequence of the human
genome and obtain detailed information on both the structure and
function of any gene or protein of interest. In addition, all genes and
proteins will be classified into large family groups that provide valuable
clues to their functions. In this way, many previously unknown human
genes and proteins will become available for biochemical, physiolog-
ical, and medical studies. The knowledge gained will have a major
impact on health care and disease prevention; it will also raise
challenging issues regarding rational, wise, and ethical uses of science
and technology.
GENOMES, GENES, AND GENOMIC MAPS
To understand the importance of knowledge about the human
genome, one must first understand the genome s functions.
Genomes Consist of DNA Molecules That Contain Many Genes
The genome of all living organisms consists of DNA, a very long
two-stranded chemical polymer (Figure 2-11. Each DNA strand is
composed of four different units, called nucleotides, that are linked
end to end to form a long chain (Figure 2-2~. These four nucleotides
are symbolized as A, G. C, and T. which stand for the four bases-
adenine, guanine, cytosine, and thymine that are parts of the nu-
cleotides. One DNA molecule, which together with some associated
proteins constitutes a chromosome, differs from another in its length
and in the order of its nucleotides. Each DNA molecule contains
many genes, which are its functional units. These genes are arranged
in a defined order along the DNA molecule. Most genes code for
protein molecules- enzymes or structural elements-that determine
the characteristics of a cell. In bacteria, the coding sequences of a
gene.are continuous strings of nucleotides, but in mammals the coding
segments in a gene (called exons) are generally separated from one
OCR for page 14
14
C,
-
E ~
in ~
C)
A
s
Q
C7
o
s O
Q
I Q
C;5 y
~ IS
U) ~
_ ~
~ E E
.
C')
~ ~ I
N~<=
/ ~ ;$~ e
~ ~C
~ W~~,~ ~
{~\ ~ ~ ~ ~ ~ 3°
~V Am ~
V Z
CO
Q ._
Q
-
~0 ~
Z ~ r~
~ C)
O o
_ ~ ~
~ ~ _
O
C ~
Ct O L'
.
C
~ O
·_ C) tt
_ ~ O
V:) ·-
O c: ~
O O C'
os ~ "C
~ . _ _
e ~3
~>
C'7 ~
~ ·-
C~ o ~
, Q ._
C o Q
~ .
o ~ ~
c) ct
X Q
~ -
£
o
C~
Ct
o ~ ~
C~
o ~ £
£ -
-~ ° £
C) ~ o
,= ~ ~
~ ~ ~.
C ~
·- C o
~ 3 _
- CC:
,= ._
. ~
Q
3 ~
o
C~
OCR for page 15
l ~
1 ~
() r
o\ _
~ ~ \'O , ~
\
ID
a,
_ ~ ~ Z ~ 00
C °
A, D O _
t
O O
' 3 4' C ~
~ ~ _ ~ ~
D ~ ~ o D
O ~ Ce
,= (~) -o 0
·- ~ 4,) ~
4, 3 ~ c' ~
~ ~ C, ~ . °
._ ~ ~
D V) ~ pL] E
O Ct v, ~ 3
S V, C ~
TIC "C o C C
Cat ·Ct ~ 5
~ ~ - , ~ ~
O C ~ ~ Q
O ~
·- ~ O
C) E ~ c Ha
o o o . C- ~
cry ~ ~ Q
o C _
04 of
- C~0 ~ tV
C E_ ~ . ~ C
E~ ce ° · ce
~ c~ Cn D
I C ,C ,C C
~ 3 . f:1,~
~) D ~- ~ ~
V ~ ~ ~ :,
~ ct 3 ~ =°
\
1'WGI, °'~- °
o
(~°-~\\o
°\ \ o
o' C
o' ~
o U'
1
C~
C
cn
o
I ~ >
/ I
Z IlllllIIII O
/ \\
._
~ ~I
~ s /
0 0111111111 I Z
\ // \
OCR for page 16
16
MAPPING AND SEQUENCING THE HUMAN GENOME
another by noncoding segments (called introns) (Figure 2-31. Often
each exon will encode a different structural region (or domain) of a
larger protein molecule. Many exons have been found to be part of a
family of related coding sequences that are used in the construction
of many different genes (Doolittle et al., 19861. Because of the many
introns in mammalian genes, a single gene is often more than than
10,000 nucleotides long, and genes that span 100,000 nucleotides are
not uncommon (Table 2-11.
For the information in the coding sequences of a gene to be
expressed, the DNA of a gene must first be transcribed into an RNA
molecule (Figure 2-31. Before the RNA strand leaves the cell's nucleus,
the intron sequences are cut out of this RNA strand by a process
called RNA splicing, thereby bringing the exon sequences into con-
tiguity. Then the RNA can be translated into a protein molecule
according to the genetic code (every group of three nucleotides codes
for one amino acid). Nucleotide sequences adjacent to the coding
sequences in each gene encode regulatory signals for activating or
inactivating transcription of the gene. Gene activity is a dynamic
process; at any given time and in any given cell type, only a subset
of genes is active. These active genes determine the course of
embryological development and the characteristics of cells and or-
ganisms.
The Human Genome Is Composed of 24
Different Types of DNA Molecules
Human DNA is packaged into physically separate units called
chromosomes. Humans are diploid organisms, containing two sets of
genetic information, one set inherited from the mother and one from
the father. Thus, each somatic cell has 22 pairs of chromosomes
called autosomes (one member of each pair from each parent) and
two sex chromosomes (an X and a Y chromosome in males and two
X chromosomes in females). Each chromosome contains a single very
long, linear DNA molecule. In the smallest human chromosomes this
DNA molecule is composed of about 50 million nucleotide pairs; the
largest chromosomes contain some 250 million nucleotide pairs.
The diploid human genome is thus composed of 46 DNA molecules
of 24 distinct types. Because human chromosomes exist in pairs that
are almost identical, only 3 billion nucleotide pairs (the haploid genome)
need to be sequenced to gain complete information concerning a
representative human genome. The human genome is thus said to
contain 3 billion nucleotide pairs, even though most human cells
contain 6 billion nucleotide pairs.
OCR for page 17
INTROD UCTION
TABLE 2-1 The Size of Some Human Genes
Gene Size mRNA Size
- (in thou- (in thou
sands of sands of Number of
Gene nucleotides) nucleotides) Introns
Small
Alpha globin 0.8 O.S 2
Beta globin - 1.5 0.6 2
Insulin 1.7 0.4 2
Apolipoprotein E 3.6 1.2 3
Parathyroid 4.2 1.0 ~
Protein kinase C 11 1.4 7
Medium
Collagen I
Pro-alpha-1 (I)
Pro-alpha-2(I)
Albumin
H~gh-mobility group
CoA reductase
Adenosine deaminase
Factor IX
Catalase
Low-density
18 5 50
38 5 50
2.1 14
25
25
32
34
34
4.2
1.5
2.8
1.6 12
.
lipoprotein
receptor 45 5.5 17
Large
Phenylalanine
hydroxylase
Factor VIII
Thyroglobulin
Very large
Duchenne muscular
dystrophy
90 2.412
186 925
300 8.736
>2,000 ~ 17~50
a Table provided by Victor McKusick.
17
DNA is a double helix: Each nucleotide on a strand of DNA has a
complementary nucleotide on the other strand. The information on
one DNA strand is therefore redundant to that on the other (that is
because of complementary base pairing (Figure 2-2A), one can in
principle determine the nucleotide sequence of one strand from the
other). However, it is currently necessary to determine the sequences
of the nucleotides on the two DNA strands separately to achieve the
OCR for page 18
8
c h r o m o s o m e c o n b l n l n g ~ \ \ \ \ \
Noncoding region,
including a
regulatory region
Primal RNA
transcript
Messenger RNA
(mRNA)
Protein
MAPPING AND SEQUENCING THE HUMAN GENOME
Exon Exon Exon Exon
· _t t
.
. . ~
~ Noncoding
Intron Intron Intron region
Transcriotion
RNA Splicing
Translation
FIGURE 2-3 How genes are expressed in human cells. Each gene can specify the synthesis
of a particular protein. Whether a gene is off or on depends on signals that act on the
regulatory region of the gene. When the gene is on, the entire gene is transcribed into a
large RNA molecule (primary RNA transcript). This RNA molecule carries the same genetic
information as the region of DNA from which it is transcribed because its sequence of
nucleotides is determined by complementary nucleotide pairing to the DNA during RNA
synthesis. The RNA quickly undergoes a reaction called RNA splicing that removes all of
its intron sequences and joins together its coding sequences (its exons). This produces a
messenger RNA (mRNA) molecule. The RNA chain is then used to direct the sequence of
a protein (translation) according to the genetic code in which every three nucleotides (a
codon) specifies one subunit (an amino acid) in the protein chain.
desired accuracy of any DNA sequence, with the sequence of each
strand being used as a check on the other. For this reason, a total of
6 billion nucleotides must actually be sequenced to order the 3 billion
nucleotide pairs in the haploid human genome.
The average size of a protein molecule allows one to predict that
there are approximately 1,000 nucleotide pairs of coding sequence per
gene. Since humans are thought to have about 100,000 genes, a total
of about 100 million nucleotide pairs of coding DNA must be present
OCR for page 19
INTROD UCTION
19
in the human genome. That this is only about 3 percent of the total
size of the genome leads one to conclude that less than 5 percent of
the human genome codes for proteins. The vast bulk of human DNA
lies between genes and in the introns. Some of the noncoding DNA
plays a role in regulating gene activity, while other portions are
believed to be important for organizing the DNA into chromosomes
and for chromosome replication (Alberta et al., 1983; Lewin, 1987~.
The function of most noncoding regions of the human genome,
however, is unknown; much of this DNA may have no function at
all.
The Human Grenome (:an Be Mapped in Many Different Ways
It would be enormously useful to determine the order and spacing
of all the genes that make up the genome. Such information is said
to constitute a gene or genome map. Since there are 24 different DNA
molecules in the human genome, a complete human gene map consists
of 24 maps, each in the linear form of the DNA molecule itself.
One type of useful genome map is the messenger RNA (mRNA) or
exon map. Cellular enzymes transcribe or copy all of an organism's
genes into mRNAs so that the functions of the genes can be expressed.
Complementary DNA (cDNA) of all the mRNAs present in an organism
can be synthesized enzymatically with reverse transcriptase. These
cDNAs can then be cloned and used to locate the corresponding
genes on a chromosome map. In this way, the genes can be mapped
in the absence of knowledge of their function. Another type of genome
map would consist of an ordered set of the overlapping DNA clones
that constitute an entire chromosome. Both the exon map and the
ordered set of DNA clones are usually referred to as physical maps.
Alternatively, the position of a gene can be mapped by following the
effect of the expression of the gene on the cells containing it. Here,
a map is constructed on the basis of the frequencies of coinheritance
of two or more genetic markers. This type of map is referred to as a
genetic linkage map. The distinction between physical and genetic
linkage maps is discussed in detail in Chapter 4.
Maps of the human genome can be made at many different scales,
or levels of resolution. Low-resolution physical maps have been
derived from the distinctive patterns of bands that are observed along
each chromosome by light microscopy of stained chromosomes. Genes
have been physically associated with particular bands or clusters of
bands in a number of ways. These associations permit genes to be
mapped only approximately since a given gene might be assigned to
a region of about 10 million nucleotides containing several hundred
OCR for page 20
20
MAPPING AND SEQUENCING THE [IUMAN GENOME
genes. Its exact position on the chromosome must be determined by
more precise methocis.
Maps of higher resolution are based on sites in the DNA cut by
special proteins called restriction enzymes. Each enzyme recognizes
a specific short sequence of four to eight nucleotides (a restriction
site) and cuts the DNA chain at one point within the sequence (Watson
e! al., 19834. Since dozens of different sequences are recognized by
one or another enzyme, and these sequences are closely spaced
throughout the genome, high-resolution physical maps can be con-
structed by determining the relative location of different restriction
sites precisely. Of particular value in human gene mapping are
restriction sites that are highly variable (or polymorphic) in the
population. DNA lacking a specific restriction site yields a larger
restriction fragment when cut by the enzyme than DNA containing
the site; hence, the designation restriction fragment length polymorph-
ism (REDIP). Hundreds of polymorphic restriction sites have so far
been identified and mapped in the human genome. Some disease-
related genes have already been localized by determining the frequency
of coinheritance of RF~Ps anti genetic diseases (GuselIa et a/., 1983~.
Examples of these diseases include cystic fibrosis, Duchenne muscular
dystrophy, Alzheimer's disease, and neurofibromatosis. Identifying a
much larger number of useful polymorphic restriction sites should
make it possible to map disease-related genes precisely enough to
greatly facilitate the isolation of any human gene.
The map based on a collection of ordered clones of genomic
fragments has a special value (see Chapter 41. In such a map, not
only are the genomic positions of restriction fragments known, but
each fragment is available as a clone that can be propagated and
distributed to interested researchers. Such clones are immensely
valuable because they serve as the starting point for gene isolation,
for functional analyses, and for the determination of nucleotide
sequences.
The ultimate, highest resolution map of the human genome is the
nucleotide sequence, in which the identity and location of each of 3
billion nucleotide pairs is known (see Chapter 51. Only such a sequence
reveals all or nearly all the information in the human genome. A
number of specific regions of human DNA have already been analyzed
in this way, providing information about the structure of genes and
their encoded proteins in both normal and abnormal individuals and
about sequences that regulate gene expression (Figure 2-41. At present,
however, the nucleoticie sequence of substantially less than 0.1 percent
of the human genome is known. This includes the sequence containing
0.5 percent of our genes.
OCR for page 21
INTROD UCTION
CCCTGTGGAGCCACACCCTAGGGTTGGCCA
ATCTACTCCCAGGAGCAGGGAGGGCAGGAG
CCAGGGCTGGGCATAAAAGTCAGGGCAGAG
CCATCTATTGCTTACATTTGCTTCTGACAC
AACTGTGTTCACTAGCAACTCAAACAGACA
CCATGGTGCACCTGACTCCTGAGGAGAAGT
CTGCCGTTACTGCCCTGTGGGGCAAGGTGA
ACGTGGATGAAGTTGGTGGTGAGGCCCTGG
GCAGGTTGGTATCAAGGTTACAAGACAGGT
TTAAGGAGACCAATAGAAACTGGGCATGTG
GAGACAGAGAAGACTCTTGGGTTTCTGATA
GGCACT GACTC T CT CT GCCTATTGGT CTAT
TTTCCCACCCTTAGGCTGCTGGTGGTCTAC
CCTTGGACCCAGAGGTTCTTTGAGTCCTTT
GGGGATCTGTCCACTCCTGATvCTGTTATG
GGCAACCCTAAGGTGAAGGCTCATGGCAAG
AAAGTGCTCGGTGCCTTTAGTGATGGCCTG
GCTCACCTGGACAACCTCAAGGGCACCTTT
GCCACACTGAGTGAGCTGCACTGTGACAAG
CTGCACGTGGATCCTGAGAACTTCAGGGTG
AGTCTATGGGACCCTTGATGTTTTCTTTCC
CCTTCTTTTCTATGGTTAAGTTCATGTCAT
AGGAAGGGGAGAAGTAACAGGGTACAGTTT
AGAATGGGAAACAGACGAATGATTGCATCA
GTGTGGAAGTCTCAGGATCGTTTTAGTTTC
TTTTATTTGCTGTTCATAACAATTGTTTTC
TTTTGTTTAATTCTTGCTTTCTTTTTTTTT
CTTCTCCGCAATTTTTACTATTATACTTAA
TGCCTTAACATTGTGTATAACAAAAGGMA
TATCTCTGAGATACATTAAGTAACTTAAAA
AAAAACTTTACACAGTCTGCCTAGTACATT
ACTATTTGGAATATATGTGTGCTTATTTGC
ATATTCATAATCTCCCTACTTTATTTTCTT
TTATTTTTAATTGATACATAATCATTATAC
ATATTTATGGGTTAAAGTGTAATGTTTTAA
TATGTGTACACATATTGACCAAATCAGGGT
AATTTTGCATTTGTAATTTTAAAAAATGCT
TTCTTCTTTTAATATACTTTTTTGTTTATC
TTATTTCTAATACTTTCCCTAATCTCTTTC
TTTCAGGGCAATAATGATACAATGTATCAT
GCCTCTTTGCACCATTCTAAAGAATAACAG
TGATAATTTCTGGGTTAAGGCAATAGCAAT
ATTTCTGCATATAAATATTTCTGCATATAA
ATTGTAACTGATGTAAGAGGTTTCATATTG
CTAATAGCAGCTACAATCCAGCTACCATTC
TGCTTTTATTTTATGGTTGGGATAAGGCTG
GATTATTCTGAGTCCAAGCTAGGCCCTTTT
GCTAATCATGTTCATACCTCTTATCTTCCT
CCCACAGCTCCTGGGCAACGTGCTGGTCTG
TGTGCTGGCCCATCACTTTGGCAAAGAATT
CACCCCACCAGTGCAGGCTGCCTATCAGAA
AGTGGTGGCTGGTGTGGCTAATGCCCTGGC
CCACAAGTATCACTAAGCTCGCTTTCTTGC
TGTCCAATTTCTATTAAAGGTTCCTTTGTT
CCCTAAGTCCAACTACTAAACTGGGGGATA
TTATGAAGGGCCTTGAGCATCTGGATTCTG
CCTAATAAAAAACATTTATTTTCATTGCAA
TGATGTATTTAAATTATTTCTGAATATTTT
ACTAAAAAGGGAATGTGGGAGGTCAGTGCA
TTTAAAACATAAAGAAATGATGAGCTG'rTC
AAACCTTGGGAAAATACACTATATCTTAAA
CTCCATGAAAGAAGGTGAGGCTGCAACCAG
CTAATGCACATTGGCAACAGCCCCTGATGC
CTATGCCTTATTCATCCCTCAGAAAAGGAT
TCTTGTAGAGGCTTGATTTGCAGGTTAAAG
TTTTGCTATGCTGTATTTTACATTACTTAT
TGTTTTAGCTGTCCTCATGAATGTCTTTTC
21
FIGURE 2-4 The DNA sequence of the human
gene for beta-globin (a protein of 146 amino acids
that forms part of the hemoglobin molecule that
carries oxygen in the blood). The sequence of only
one of the two DNA strands is given since the other
one has a precisely complementary sequence The
sequence should be read from left to right in sueees-
sive lines down the page. as if it were normal text.
The human genome is about 2 million times as long
as this small gene of 1,500 nueleotides (see Table 2-
1), which contains three exons and two introns (the
boundaries between exons and introns are not indi-
cated here). Reprinted, with permission. from Alberts
et al. (1989).
OCR for page 22
22
MAPPING AND SEQUENCING THE HUMAN GENOME
MEDICAL IMPLICATIONS OF DETAILED
HUMAN GENOME MAPS
Advances in molecular genetics made over the past two decades
are already having a major impact on medical research and clinical
care. The ability to clone and analyze individual genes and to deduce
the amino acid sequences of encoded proteins has greatly increased
our understanding of genetic disorders, the immune system, endocrine
abnormalities, coronary artery disease, infectious diseases, and can-
cer. A few proteins produced on a commercial scale by recombinant
DNA methods are available for therapeutic use or in clinical trials,
and many more are in earlier developmental stages. Recent progress
in determining the genetic basis for such neurological and behavioral
disorders as Huntington's disease (GuselIa et al., 1983), Alzheimer's
disease (St George-HysIop et al., 1987), and manic-depressive illness
(Egeland et at 1987) promises new insights into these common and
serious conditions. Higher resolution maps of the human genome will
accelerate progress in understanding disease pathogenesis and in
developing new approaches to diagnosis, treatment, and prevention
in many areas of medicine. In Chapter 3 the potential medical impact
of a detailed human genomic map is discussed further.
IMPLICATIONS FOR BASIC BIOLOGY
The generation of a physical map of the human genome and the
determination of its nucleotide sequence will provide an important
research too] for basic biology. This is especially true because we
expect a human genome project to support mapping and sequencing
investigations that are carried out concurrently in other extensively
studied organisms, including the Escherichia cold bacterium, the lower
eucaryote Saccharomyces cerevisiae (a yeasts, the nematode worm
Caenorhabditis elegans, the fruit fly Drosophila melanogaster, the
mouse Mus musculus, and possibly also a plant such as maize or
Arabidopsis. Analyzing these genomes will approximately double the
total amount of DNA to be mapped and sequenced. But the additional
effort will make it possible to test the function of genes that have
been identified in humans in other organisms that are experimentally
accessible and for which powerful genetic techniques exist. It will
thereby be possible to firmly establish the exact role of these genes
in important biological processes. Conversely, proteins that are
discovered to be of special interest in any of these other organisms
can be immediately identified by amino acid homology in the human,
thereby enabling investigators to conduct well-focused studies of the
OCR for page 23
INTROD UCTlON
23
function of the corresponding human protein and its gene. The
extensive DNA sequence and functional comparisons that are gen-
erated will also represent an invaluable resource for evolutionary
biologists. These and other implications for basic biology are discussed
in greater detail in Chapter 3.
EXPECTED TECHNOLOGICAL DEVELOPMENTS
GENERATED BY A HUMAN GENOME PROJECT
AND THEIR IMPACT ON BIOLOGICAL RESEARCH
The process of mapping and sequencing the human genome is likely
to have important spin-offs in the form of new technologies with
broad applicability in both basic and applied biological research. For
example, efficient methods for mapping complex genomes are still
being developed, and a human genome project would accelerate this
process. Such methods include improvements in the production,
separation, and cloning of large pieces of DNA and methods for
constructing an ordered set of genomic clones (see Chapter 43. This
methodology will be directly applicable to the development of a
physical map of the genomes of many experimentally and commercially
important animals and plants.
Similarly, an effort to sequence the human genome will require
much more efficient nucleotide sequencing technology than now exists
(see Chapter 5~. These improvements will greatly reduce the time
spent on DNA sequencing in individual research laboratories. In the
future, the development of institution-wide or regional sequencing
facilities equipped with highly automated instruments could serve a
large number of scientists, freeing them to concentrate on more
advanced stages of their research problems.
Finally, the generation of a detailed map of the human genome will
require new computer-based methods for collecting, storing, and
analyzing the large amount of information expected (see Chapter 6~.
These methods can easily be adapted to handling analogous data from
other organisms. Scientists will thus have immediately available
through computer networks an enormous store of biological infor-
mation supported by methods for using it, such as clone collections;
these resources are likely to have a major beneficial impact on the
way that individual scientists do research.
IMPACT ON THE RESEARCH BY SMALL GROUPS
One of the key features and attractions of biomedical research today
is that it is based primarily on the efforts of small, independent groups
OCR for page 24
24
MAPPING AND SEQUENCING THE HUMAN GENOME
of scientists. The major advances of the past decades can be traced
to the creativity of these groups, or even to single individuals, often
near the beginnings of their careers. Mapping and sequencing the
human genome, on the other hand, is likely to require organizational
arrangements on a considerably larger scale than is customary in
other biological research. Some see this as a threat to the independence
of individual investigators. In the committee's view, however, a
mapping and sequencing project should have as its primary goal an
increase in the power and range of the research potential of small
groups of indivicluals.
The complete nucleotide sequences of the genomes of the several
organisms of major experimental interest will provide a critical
reference data base for interpreting and studying the many human
genes that will be discovered. To take just one example, an individual
cancer researcher who discovers a new oncogene in a human tumor
will have immediate access by computer search to all the proteins
that are likely to have a related function in lower organisms. Since
these genes can be experimentally manipulated in ways that are
impossible in humans, the function of the corresponding gene can be
determined much more readily in a fruit fly, a nematode worm, or a
yeast cell. The results are certain to provide important insights into
human cancer that could not be obtained by direct research on
humans. Conversely, researchers interested primarily in yeast cells
will benefit from the information about yeast genes that can be derived
from studies on its homologues that are initially conducted with
another organism.
Even among researchers whose efforts are confined exclusively to
humans, small group efforts will be encouraged. The human genome
map and an ordered set of human DNA clones will be available as a
resource for the use of all investigators, enabling them to concentrate
on the most interesting parts of their research. In addition, new areas
of research are likely to emerge as a result of this resource, particularly
in relation to human health. In short, the committee believes that the
mapping and sequencing project will make an important contribution
to primary research conducted by small groups of independent
investigators, extending their reach into currently inaccessible prob-
lems.
A project to map and sequence the human genome has many
different components. In the following sections of this report, we
examine implications for medicine and science (Chapter 3), mapping
(Chapter 4), sequencing (Chapter 5), data handling and analysis
(Chapter 6), implementation and management strategies (Chapter 7),
and commercial, legal, and ethical implications (Chapter 81.
OCR for page 25
INTROI:) UCTION
25
REFERENCES
Alberts, B., D. Bray, J. Lewis, M. Raff, K. Roberts and J. D. Watson. 1983. Molecular
Biology of the Cell. Garland, New York. 1146 pp.
Alberts, B., D. Bray, J. Lewis, M. Raff, K. Roberts, and J. D. Watson. 1989. Molecular
Biology of the Cell, 2nd edition, Garland, New York, in press.
Doolittle, R. F., D. F. Feng, M. S. Johnson. and M. A. McClure. 1986. Relationships of
human protein sequences to those of other organisms. Cold Spring Harbor Symp.
Quant. Biol. 51:447-455.
Egeland, J. A., I). S. Gerhard, D. L. Pauls, J. N. Sussex, K. K. Kidd, C. Allen, A. M.
Hostetter, and D. E. Housman. 1987. Bipolar affective disorders linked to DNA
markers on chromosome 11. Nature 325:783-787.
Gusella, J. F., N. S. Wexler, P. M. Conneally, S. L. Naylor, M. A. Anderson, R. E. Tanzi,
P. C. Watkins, K. Ottina, M. R. Wallace, A. Y. Sakaguchi, A. B. Young, I. Shoulson.
E. Bonilla, and J. 13. Martin. 1983. A polymorphic DNA marker genetically linked to
Huntington's disease. Nature 306:234-238.
Lewis B. 1987. Genes, 3rd ed. John Wiley & Sons, New York. 737 pp.
St George-Hyslop, P. H., R. E. Tanzi, R. J. Polinsky, J. L. Haines, L. Nee, P. C. Watkins,
R. H. Myers, R. G. Feldman, D. Pollen, D. Drachman, J. Growdon, A. Bruni, J.-F.
Foncin, D. Salmon, P. Frommelt, L. Amaducci, S. Sorbi, S. Piacentini, G. D. Stewart.
W. J. Hobbs, P. M. Conneally, J. F. Gusella. 1987. The genetic defect causing familial
Alzheimer's disease maps on chromosome 21. Science 235:885-890.
Watson, J. D., J. Tooze, and D. T. Kurtz, 1983. Recombinant DNA: A Short Course. W.
H. Freeman, San Francisco.
Representative terms from entire chapter:
dna molecule