Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
CHAPTER 3 IMPLICATIONS FOR MEDICINE AND SCIENCE MEDICAL USES A Map of the Human Genome Will Greatly Facilitate the Identification of Specific Disease Genes Humankind is afflicted by more than 3,000 known different inherited disorders. Taken together, these disorders affect every organ, system, and tissue in the human body. Some cause disease even before birth, whereas others are observed only in adulthood. Some are common, others rare. Although their overall impact on human health is enormous, until recently our understanding of the vast majority of these disorders has been meager. Even today we have identified the responsible gene in fewer than 3 percent of all known inherited disorders. In nearly all of these cases the disease gene codes for a known protein. For diseases in which the responsible protein has been identified, it is now regularly possible, with recombinant DNA methods, to clone the gene and begin to understand the genetic defect. In this way we have learned much about conditions such as thalassemia, sickle cell anemia, hemophilia, Tay-Sachs disease, and familial hypercholesterolemia. However, most disorders result from mutations in genes whose protein products have not been defined. In these situations, identification of a DNA segment that is regularly altered (either by deletion, rearrangement, or point mutation) in a given disorder provides clues to identifying the disease gene. So far, the genes for three disorders--Duchenne muscular dystrophy, retinoblastoma, and chronic granulomatous disease--have been successfully identified in this manner. This approach is also making possible an ongoing search for the genes relevant to such conditions as cystic fibrosis, Huntington's disease, and familial Alzheimer's disease. These are but a small subset of the numerous Mendelian disorders for which direct genetic analysis offers the best hope of identifying the responsible genes. The availability of various types of maps of the human genome would greatly facilitate the search for genes related to specific inherited diseases. A detailed genetic linkage map based on RFLPs would permit rapid assignment of disease loci to subchromosomal regions, perhaps at a resolution of 1 million nucleotides. The availability of DNA clone collections and a restriction map of the genome would then allow efficient comparative analysis of DNAs from normal and affected individuals to pinpoint with higher resolution the area In which the relevant gene resides. Finally, a DNA sequence of the genome would allow all putative genes in the region to be identified and would also provide a data base for evaluating sequences obtained in samples of DNA from patients. Although more complicated in its execution, similar approaches could be applied to the more common multigenic disorders, i.e., those for which more than one gene may be responsible. Examples include hypertension, some forms of cancer, diabetes, schizophrenia, mental retardation, and 28
neural tube defects. Thus, the availability of a map and sequence would greatly accelerate the identification of disease genes and permit investigators to focus more rapidly on the nature of the gene products and their cellular roles. Disease Genes Promise to Provide Important Insights into Human Biology An understanding of normal physiology and biochemistry has often been gained through the study of single gene disorders for which protein products have been characterized. For example, elucidation of many pathways of intermediary metabolism resulted from the examination of cells from patients in whom a single enzyme activity was abolished. Similarly, the study of individual mutant genes encoding uncharacterized products is certain to illuminate new biochemical and cellular mechanisms related to both normal human physiology and to the development of disease. The rapid identification of disease genes will enable investigators to examine in detail the protein product of such genes and their role in cellular biology. When few clues to pathophysiology exist (e.g., neurofibromatosis, polycystic kidney disease, or retinitis pigmentosa), this strategy will provide new insights into pathogenesis. The implications of such research are likely to be extensive. In many instances, examination of an apparently rare situation may lead to a clearer understanding of normal mechanisms that may be adversely affected in other ways in more common diseases. For instance, studies of the recently isolated gene responsible for the rather rare childhood tumor known as retinoblastoma should increase our understanding of more common cancers (Dryja et al.. 1986; Friend et al.. 1986), and studies of the genes involved in an apparently uncommon type of Alzheimer's disease may explain more general features of aging (St George-Hyslop et al.. 1987). Specific Medical Applications An improved capacity to identify genes related to disease will have an immediate impact on the diagnosis, treatment, and prevention of genetic disorders. As more disease genes are isolated, DNA-based diagnosis will become more common and the potential for somatic cell gene therapy will increase. Furthermore, the availability of molecular probes for specific gene loci will permit detection of the carriers of disease-associated genes. This ability will enable parents to identify the extent to which their offspring may be at risk for a genetic defect. In addition, the identification and characterization of disease genes will lead (and already has led for many genetic disorders) to improved prenatal diagnosis of serious conditions by direct DNA analysis. Finally, the ability to determine whether individuals are carriers for specific gene defects will facilitate various epidemiological investigations of the risks associated with specific environmental factors, occupational settings, or drugs. Toward an Understanding of Cancer Cancer results from the unregulated growth of cells. What has been learned over the past decade or so, largely through the application of molecular genetic tools, is that deregulation of growth is caused by 29
specific genetic abnormalities, i.e., mutations in growth-related genes that are either inherited or acquired during life. Inherited defects generally confer increased susceptibility to a particular form of cancer, for example, retinoblastoma, cancer of the colon, certain kidney tumors, and malignant melanoma. Only in retinoblastoma has the susceptibility gene been identified. The search for the responsible genes in other instances is in its early stages and will be greatly facilitated by detailed RFLP and DNA clone maps and the nucleotide sequence. With the susceptibility genes in hand, it will be possible to identify by testing an individual's DNA those who need special surveillance for precancerous or early cancerous changes so that appropriate treatment can be applied at an early stage of disease. It may also become possible to counter the effects of inherited susceptibility more directly once the physiological effects of the various genes are understood. In recent years much has been learned about acquired genetic abnormalities related to cancer. During one's lifetime, the DNA in somatic cells undergoes mutation, either spontaneously or as induced by environmental mutagens. These mutations involve changes in nucleotides, rearrangements, duplications, or deletions. Some of these changes occur in genes that regulate growth. Several dozen genes are now known that, when mutated in specific ways or overexpressed, deregulate cell proliferation. Some of these abnormal genes (called oncogenes) have been found in human cancer cells and seem to contribute to their tumorigenic properties. In several instances the proteins encoded by oncogenes have been shown to be altered forms of cell growth stimulators or the cellular receptors for growth stimulators. Other oncogenes encode proteins that are involved in the response of cells to growth stimulators. As a result of these findings, primary questions regarding cell growth and human cancer have come into sharp focus: What normal human proteins are involved in cell growth and how do they act? How do changes in one or more of these proteins cause cells to grow into tumors and to spread to distant organs? What genetic mechanisms underlie these changes? What is the spectrum of oncogenes or metastasis genes present in human tumors? The availability of a map and sequence of the human genome and of the genomes of simpler organisms will help answer these questions. It will facilitate the isolation of genes that are homologous to known growth-related genes and the identification of previously undiscovered genes that play a role in cell growth and development. The characterization of the genes and proteins that regulate cell growth and are responsible for neoplasia and metastasis of tumor cells is likely to lead to more sensitive diagnostic and prognostic tests and to new approaches to the control of cancer. IMPLICATIONS FOR BASIC BIOLOGY What Aspects of Genome Organization are Important for Genome Function? The principles of genome organization are poorly understood. The human chromosome contains functional segments that are not genes. Specific segments are essential for the duplication of the chromosomes before cell
division and for ensuring that the correct complement of chromosomes segregate into the two daughter cells. The nature of these segments within a chromosome and the mechanism by which they carry out their functions are poorly understood in mammals. A physical map of the human genome will provide the basis for experimentation into the identity and role of these and other elements. The study of genome organization, that is, the order in which genes occur along a chromosome and their relations to various other components, will be enhanced by the existence of a physical map. For example, we do not know in most cases whether the order of genes on a given chromosome is important to their function. Is there a selective advantage to the organism to maintain the proximity of genes that are expressed together? Limited studies comparing the overall organization of genes in the chromosomes of humans and mice suggest that the organization of large blocks of genes has often been conserved, but it is not known whether this is important to their function (Sawyer and Hozier, 1986). By comparing the physical maps of a variety of organisms, it will become apparent which segments are conserved in their gene order across species and therefore are likely to have functional significance. The detailed comparison of corresponding mouse and human DNA sequences is likely to be of special importance. Sufficient time (an estimated 70 million years) has elapsed since the divergence of mice and humans from a common mammalian ancestor for those chromosomal regions whose nucleotide sequence is not crucial for the function of the organism to differ extensively as a result of random events that change nucleotide sequences. Thus, a comparison of mouse and human sequences can reveal those regions of our chromosomes with crucial functions that are reflected as conserved (i.e., common) nucleotide sequences. Evolutionary biologists believe that changes in most of these sequences have occurred at one time or another during evolution, but because the changes were deleterious, the mutant individuals who carried such changes were eliminated from the population by natural selection. Included among the conserved sequences will be the exons of important proteins as well as the sequences in genes that regulate gene expression. Other conserved sequences whose function cannot be anticipated will no doubt be discovered in this way; their identification should eventually provide many new insights into the functions of both genes and genomes. Many New Human Genes and Proteins Will Be Identified Only a small percentage of the human genes involved in normal development and disease have been identified to date. Mapping and sequencing the human genome will result in the identification of a large number of new genes and their encoded proteins. As one benefit, the physical map will help pinpoint the position of human genes that have been mapped to specific chromosomal locations but have not yet been isolated. Moreover, genetic studies of the mouse have revealed mutations in many genes that cause interesting pathological defects, but little is known about these genes except their location on the genetic map of mice. By knowing the specific correspondence between the physical maps of humans and 31
mice, the corresponding gene can be identified and studied in both organisms. There are also computer-based methods for detecting genes when the only information available is a long stretch of continuous nucleic acid sequence (Staden and McLachlan, 1982). These methods have been improving dramatically, and a human genome project will stimulate further improvement in existing computer-based tools. At present, the identification of genes and their protein products relies on several methods. First, the exons within a DNA sequence can often be predicted by identifying those segments that contain open reading frames (regions of nucleotide sequence without the "stop codons" that terminate protein synthesis) and also have codon usage biases (the preferential use of one of several codons that specifies a particular amino acid) that are consistent with other genes in that organism. Moreover, there are conserved sequences that always flank an intron. As a second approach, genes often share homologies with one another on the basis of common evolutionary history; these homologies have been successfully exploited in a number of areas, for example, to identify related family members of lymphokines, to find new receptor proteins for neurotransmitters, and to find genes that may play important roles in pattern formation in development. Many sequence motifs that encode protein domains with a similar function have been identified, such as the common domain found in all protein kinases. These have been useful in predicting the function of unidentified gene products from their amino acid sequences. As increasing numbers of new proteins are isolated and functionally characterized, the data base available for such comparisons will be greatly increased. Many proteins contain domains that have been used over and over again in the construction of related proteins. Therefore, it should eventually be possible to discover a great deal about the structure and function of a protein from the amino acid sequence derived from its gene. Because exons coincide in many instances with protein domains, knowledge of the exon-intron structure of a gene can also provide insights into both the structure and function of the protein. How Do Organisms Evolve? To gain a deep understanding of organisms we must understand how they evolved, and much of the evolutionary history of humans is present in our genomes. If we knew the complete DNA sequences of humans and other organisms, we should be able to trace the origins of most of our genes; however, because all mammals are constructed from similar sets of proteins, the building blocks that are used to construct a human and whale are very much the same. The many differences between mammalian species are therefore believed to depend largely on differences in the regulatory signals that control the timing, level, and cell specificity of gene expression. Thus, the orderly development of the human embryo requires that specific gene sets be activated at exactly the right place and time as new cell types arise from multi-potential stem cells. This process is controlled at least in part by regulatory DNA sequences located near the genes. In many cases, these sequences will be homologous among those genes that are coactivated. The sequence analysis of the human genome, and its 32
comparison with the sequence of other mammalian genomes such as the mouse, should allow us to identify regulatory DNA sequences. Moreover, one can hope to begin to understand not only the rules that govern gene regulation but also the changes that have occurred during evolution that have differentiated the human organism from our mammalian relatives. In summary, the acquisition of the map and sequence of the human genome will expand our understanding of many basic questions in biology. To maximize this impact, it will be necessary to pursue the analysis of genomes of organisms that can be experimentally manipulated. Thus, for example, the function of regulatory sequences detected in humans can be tested by experiments in the mouse in which transgenic animals can be constructed with appropriately engineered genes. Because many crucial insights may be gained from such comparative studies, experiments in several other organisms will inevitably be required to test the function of potentially important human genes. REFERENCES Dryja, T. P., J. M. Rapaport, J. M. Joyce, R. A. Petersen. 1986. Molecular detection of deletions involving band q!4 of chromosome 13 in retinoblastomas. Proc. Natl. Acad. Sci. U.S.A. 83:7391-7394. Friend, S. H., R. R. Bernards, S. Rogelj, R. A. Weinberg, J. M. Rapaport, D. M. Albert, T. P. Dryja. 1986. A human DNA segment with properties of the gene that predisposes to retinoblastoma and osteosarcoma. Nature 323:643-646. Sawyer, J. R., and J. C. Hozier. 1986. High resolution of mouse chromosomes: Banding conservation between man and mouse. Science 232:1632-1635. Staden, R., and A. D. McLachlan. 1982. Codon preference and its use in identifying protein coding regions in long DNA sequences. Nucleic Acids Res. 10:141-156. St George-Hyslop, P. H., R. E. Tanzi, R. J. Polinsky, J. L. Raines, L. Nee, P. C. Watkins, R. H. Myers, R. G. Feldman, D. Pollen, D. Drachman, J. Growdon, A. Bruni, J.-F. Foncin, D. Salmon, P. Frommelt, L. Amaducci, S. Sorbi, S. Piacentini, G. D. Stewart, W. J. Hobbs, P. M. Conneally, J. F. Gusella. 1987. The genetic defect causing familial Alzheimer's disease maps on chromosome 21. Science 235:885-890. 33