Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
1 Executive Summary Humans have long been intrigued by the forces that shape them and other organisms. What blueprint dictates blue eyes, brown hair, or the form of a flower? More than 100 years ago Gregor Mende! discovered that such inherited traits are controlled by cellular units that later became known as genes. In recent years, our understanding of these genes has been greatly increased by knowledge of the molecular biology of DNA, the giant molecule from which genes are formed. It is now feasible to obtain the ultimate description of genes and DNA, since recently developed techniques enable us to map (Iocate) the genes in the DNA of any organism and then to sequence (order) each of the DNA units, known as nucleotides, that constitute the genes. As more of our genes are mapped and their DNA sequenced, we will have an increasingly useful resource an essential data base that will facilitate research in biochemistry, physiology, cell biology, and medicine. This data base will have a major impact on health care and disease prevention as well as on our understanding of cells and organisms. The concept of organizing a large project to map and sequence the DNA in the genes and the intergenic regions that connect them (the entire human DNA complement or genome) has received increasing attention worldwide. Several countries have expressed interest in launching such a project. To evaluate what the United States should be doing in this area, the Board on Basic Biology of the National Research Council's Commission on Life Sciences estab- lished the Committee on Mapping and Sequencing the Human Genome, whose findings are reported in this document.
2 MAPPING AND SEQUENCING TlIE HUMAN GENOME In this report the committee explores how, when, and why we should map and sequence the DNA in the human genome. In studying these issues, the committee reached the following conclusions: · Acquiring a map, a sequence, and an increased understanding of the human genome merits a special effort that should be organized and funded specifically for this purpose. Such a special effort in the next two decades will greatly enhance progress in human biology and , . . mealclne. · The technical problems associated with mapping and sequencing the human and other genomes are sufficiently great that a scientifically sound program require a diversified, sustained effort to improve our ability to analyze complex DNA molecules. Although the needed capabilities do not yet exist, the broad outlines of how they could be developed are clear. Prospects are therefore good that the required advanced DNA technologies would emerge from a focused effort that emphasizes pilot projects and technological development. Once es- tablished, these technologies would not only make the complete analysis of the human and other genomes feasible, but would also make major contributions to many other areas of basic biology and biotechnology. · Important early goals of the effort should be to acquire a high- resolution genetic linkage map of the human genome, a collection of ordered DNA clones, and a series of complementary physical maps of increasing resolution. The ultimate goal would be to obtain the complete nucleotide sequence of the human genome, starting from the materials in the ordered DNA clone collection. Attaining this goal would require major (but achievable) advances in DNA handling and sequencing technologies. · A comparative genetic approach is essential for interpreting the information in the human genome. Therefore, intensive studies of those organisms that provide particularly useful models for under- standing human gene structure, function, and evolution must be carried out in parallel. · The mapping and sequencing effort should begin primarily as a series of competing, peer-reviewed programs emphasizing technology development. Funding should include both grants to individuals and grants to medium-sized multidisciplinary groups of scientists and engineers. Because the technology required to meet most of the project's goals needs major improvement, the committee specifically recommends against establishing one or a few large sequencing centers at present. · The human genome project should differ from present ongoing
EXECUTIVE SUMMAR Y 3 research inasmuch as the component subprojects should have the potential to improve by 5- to 10-fold increments the scale or efficiency of mapping, sequencing, analyzing, or interpreting the information in the human genome. · Progress toward all the above goals will require the establishment of well-funded centralized facilities, including a stock center for the cloned DNA fragments generated in the mapping and sequencing effort and a data center for the computer-based collection and distribution of large amounts of DNA sequence information. The committee suggests that the groups supplying these services be selected through open competition. On the basis of these conclusions, the committee recommends the following: · In New of the importance and magnitude of the task, a rapid scale-up to $200 million of additional funding per year is recommended. These additional funds should not be diverted from the current federal research budget for biomedical sciences. A majority of the committee recommends: · A single federal agency should serve as the lead agency for the project. This agency would receive and administer the funds for the project and would be responsible for the operation of the stock center and data center, as well as administer the peer review system utilized in determining the recipients of funds. It should work closely with a Scientific Advisory Board in developing and implementing a high standard of peer review. The Scientific Advisory Board, composed primarily of expert scientists knowledgeable in relevant fields, would provide advice not only on peer review, but also on quality control, international cooperation, coordination of efforts of the laboratories in the project, and the operations of the stock and data centers. An outline of the major issues presented in this report follows, with genome mapping, genome sequencing, the handling of information and materials, and strategies for implementation and management of a human genome project discussed in turn. An outline of the human genome and its central role in human biology is shown in Figure 1-1. GENOME MAPPING The two main types of human genome maps are genetic linkage maps and physical maps. Genetic linkage maps are made mainly by
1. Nucleo1Ides, four diNerenl ~_~- ~^~_~- _l~ _c u~ . paired In specific =_~ ^n _~ 2. ^genecon1~1ns _~ ~- _- _ _ a _~ ~m ^ am_ ~ ~ long DN^ doubly helix -~ with protelos. Scull ~ ~" -~_m~ c-1elas about 1 ~ and ^~_~ chromosomes in its Mu_ in10rma1~n In 1- =m" ~ menu1ecture proteins. 5. Thy human body has abou110 __ ~ 1 _ at avow _ _ _~, ~__~- 3 billion palrs ol nucleo11des ___ of genes and chromosomes. ~;I~ ~ ~:~..~isSi~i~!!~is~:~i~:~s:~s~ss~s~:;:~:ss~s~i~.~ss~.sss,~ FIOURE 1-1 ^d~ted Mom an illustralion by Women Isensee far ~e CA~~/e Or a, September 3, 1986, wi1b permission Tom 1be publisher.
EXECUTIVE SUMMARY 5 studying families and measuring the frequency with which two different traits are inherited! together, or linked. Physical maps are derived mainly from chemical measurements made on the DNA molecules that form the human genome. These maps can be of several different types and include restriction maps and ordered DNA clone collections, as well as lower resolution maps of expressed genes or anonymous (function unknown) DNA segments that are mapped by somatic cell hybridization or by in situ chromosome hybridization. All these maps share the common goal of placing information about human genes in a systematic linear order according to their relative positions along each chromosome. Knowing the location of genes and the correspond- ing genetic traits they produce allows us to discover patterns of genomic organization with important functional consequences and to compare humans with other mammals. Detailed maps of the human genome should quickly lead to major human health benefits. For example, by identifying genes or regions of DNA involved in several diseases, including hereditary forms of cancer, Alzheimer's disease, manic-depressive illness, Huntington's disease, and cystic fibrosis, new methods of diagnosis and treatment can be developed. Equally important, the better understanding of human biology that would follow from these studies would contribute broadly to the treatment of most diseases. The committee believes that full-scale mapping, both genetic linkage and physical, should begin immediately. Current mapping efforts are being carried out gene by gene. Each gene is only a small part of the entire complement of DNA, and the methods involved therefore require the equivalent of repeatedly finding a needle in a haystack. In contrast, in any effort to map the entire human genome, each of the many DNA segments that are obtained by cloning the human genome will be initially kept as relevant to the project. These then represent part of a puzzle that is solved by ordering each DNA segment according to its position in the genome. The cost of obtaining any particular DNA clone in such a collection of ordered DNA clones is relatively small. As a result, a project of this type will quickly pay for itself by saving the enormous aggregate costs involved when each laboratory must find its own DNA clones. Several recent breakthroughs in mapping methods make obtaining the type of detailed data needed in human genome maps a realistic goal. These breakthroughs range from vastly improved methods for physical mapping that rely on new techniques for separating and manipulating DNA molecules to much more accurate mathematical methods for analyzing genetic linkage data on the basis of restriction fragment length polymorphisms (RF~Ps). A great deal of synergism
6 MAPPING AND SEQUENCING TlIE HUMAN GENOME exists between genetic linkage and physical mapping methods. Because of the simultaneous advances in both techniques, there is a real possibility that a detailed physical and genetic linkage map of the human genome couIcl be constructed in a relatively short time. This map would be extremely useful in its own right and would set the stage for constructing the ultimate physical map-the complete DNA sequence of the human genome. The committee concluded that the development and refinement of techniques should be emphasized early in the mapping part of the project. Despite recent advances, physical mapping methods need improvement. For example, DNA fragments as much as 10 million nucleotides long (Woo the total human genome) can be handled only with considerable difficulty, and such large fragments cannot yet be cloned. Ordered DNA clone collections have been started, but not completed, for several organisms with genomes that are at most l/50 the size of the human genome. Advanced technology, such as the handling of larger DNA molecules and the development of new cloning vectors for them, will expedite the preparation of such clone collec- tions. Thus, much of the effort in the next few years should be devoted to refining existing mapping techniques and developing even more powerful new ones.' The committee believes that most support should go to groups that are attempting to map large genomes, with support for different mapping methods proceeding in parallel. Improved methods for the following would facilitate map construction and usefulness: · Separating intact human chromosomes. · Separating and immortalizing identified fragments of human chro- mosomes. · Cloning complementary copies of expressed genes, called com- plementary DNAs (cDNAs), especially those that represent rare cell-, tissue-, and development-specific messenger RNAs. · Cloning very large DNA fragments. · Purifying very large DNA fragments, including higher resolution methods for separating such fragments. · Ordering the adjacent DNA fragments in a DNA clone collection. · Automating the various steps in DNA mapping, including those of DNA purification and hybridization analysis, and the development of novel methods that allow simultaneous handling of many DNA samples. GENOME SEQUENCING The nucleotide sequence of the genome is the physical map at the highest level of resolution. It provides the information that constitutes
EXECUTI VE S UMMAR Y 7 the genetic complement of an individual. For the human, a total of about 3 billion (3 x 109) nucleotides must be ordered; simply to print out such a DNA sequence would require nearly a million pages in a book like this. To obtain this critical resource in a timely fashion a special effort must be mounted, but, because of the high cost arid slow rate of DNA sequencing with current technology, sequencing of the entire genome should not be initiated at present. Instead' the committee believes that two general types of effort should be en- couraged to increase the efficiency of DNA sequencing. First, pilot projects should be corrected with a goal of sequencing approximately 1 million continuous nucleotides (which is 5 to 10 times as large as the largest continuous regions that have been sequenced to date). Such projects will provide an opportunity to implement and test improvements of existing technology as they occur and will also provide a practical impetus for technological developments. They will also reveal where the most serious problems in data analysis are likely to arise in still larger projects. For example, will repetitive sequences or cloning artifacts complicate the assembly of a unique, contiguous sequence? How will new genes be identified correctly? Only by attempting relatively large-scale nucleotide sequencing will we gain insight into these issues. Second, improvements in existing sequencing technology and the development of entirely new technologies should be vigorously en- couraged. This would include applications of automation and robotics at all steps in sequencing. It is useful to think in terms of trying to achieve 5- to ]0-fold incremental improvements in the scale and speed of DNA sequencing. To derive the major benefits from a human genome sequence, it will be necessary to have an extensive data base of DNA sequences from the mouse (whose genome is the same size as that of the human) and from simpler organisms with much smaller genomes, such as bacteria, yeast, Dro.sophilc' melc~no~c~ster (a fruit fly), and Caenor- hc~bditi* elegant (a nematode worm). This information would allow the counterparts of important human genes to be readily identified in organisms in which their functional roles are generally easier to study. In addition, many genes will initially be found to be important in these other organisms and will lead to corollary human studies. Comparative sequence analysis with an organism such as the mouse is a powerful technique for distinguishing those elements of a nucleotide sequence that are important (and therefore conserved during evolution) from those that are not. To succeed, therefore, this project must not be restricted to the human genome; rather, it must include an extensive sequence analysis of the genomes of selected other species. A mechanism of quality control is needed for the groups that are
8 MAPPING AND SEQUENCING THE HUMAN GENOAdE contributing large amounts of sequence information. For example, a unit could be established to redetermine a small fraction of the sequence submitted by each sequencing unit, thereby providing an independent check on the accuracy of the sequences being obtained by the unit. INFORMATION AND MATERIALS HANDLING Considerable data will be generated from the mapping and sequenc- ing project. Unless this information is effectively collected, stored, analyzed, and provided in an accessible form to the general research community worldwide, it will be of little value. This project will also require an unprecedented sharing of materials among the laboratories involved. Because access to all sequences and materials generated by these publicly funded projects should and even must be made freely available, two different types of centralized facilities will be needed: (1 ) information centers to collect and distribute mapping and sequenc- ing data and (2) centers to collect and distribute materials such as DNA clones and human cell lines. For an information center to cope effectively with the vast amount of DNA sequence data, all such data must be provided to the center in electronic or magnetic form. The information center must also be effectively linked by a computer network to all the users of the data. An initial analysis of these data should be carried out by the central facility to help in classifying the data for future research accessibility. Both at the information center and in other laboratories, extensive research in methods of sequence data analysis should be encouraged. A facility for collecting and distributing materials should be orga- nized to handle the cloned DNA fragments generated and mapped in the many different laboratories involved. This facility would store the appropriate DNA clones, index them according to some agreed-on plan, and then redistribute them to all laboratories that request them. The facility might also be involved in the routine conversion of large human DNA fragments, cloned as artificial chromosomes, into more readily accessible bacterial virus or cosmic DNA clone collections. It may also need to fingerprint all the DNA clones by a single method to provide a standard indexing procedure. IMPLEMENTATION STRATEGIES Much of the concern that has been expressed about a project to map and sequence the human genome stems from its high projected cost and the potential changes that may result in the infrastructure of
EXECUTIVE SUMAIAR Y 9 the current biological research community. The committee examined the cost of the project and concluded that an annual budget of $200 million over the next 15 years would not be excessive when compared with the value of the results that would be produced. The expenditure of $200 million per year on the project would represent roughly 3 percent of the total annual U.S. government expenditure on biological research. It would thus leave the crucial task of functional studies to traditionally supported biological research. All decisions for funding should be based on a peer review by those expert in the methods involved. This does not mean that funding would be allocated only to individual investigators, inasmuch as multidisciplinary research centers of modest size, as well as an information center and material handling unit, will be required. Some groups may be more appropriately funded by contracts than by grants. However, the committee believes that these contracts should be awarded only after an open, peer-reviewed competition. Genome mapping, both genetic linkage ant] physical, is already under way and should be intensified, although a major portion of the initial monies should be devoted to improving technologies. Large- scale sequencing should be deferred until technical improvements make this effort appropriate. This recommendation is based on the realization that the human genome is orders of magnitude larger than the genome of any other organism that has yet been mapped or sequenced. To cope with this vastly greater size, it seems advisable to establish a special competitive program that focuses on improving in 5- to 10-fold increments the scale or efficiency of mapping, sequencing, analyzing, or interpreting the biological information in the human genome. The actual mapping of the human genome should begin now. In contrast, while a variety of pilot projects should be encouraged, only after the technology is developed and an adequate quality control procedure is established should a large-scale sequencing effort begin on the human genome. A human genome project of this type need not threaten the existing biological research community for several reasons. First, the money ought not be provided at the expense of currently funded biological research. Second, it ought to be distributed by peer review. Third, by including selected other organisms required for the interpretation of the human genome map and sequence, the project should not mislead the public into placing a false emphasis on the uniqueness of human materials for understanding ourselves. Fourth, this project ought to include work by both small research laboratories and larger multidisciplinary centers formed by juxtaposing several small research
lo MAPPING AND SEQUENCING THE HUMAN GENOME groups having different expertise. Since individual investigators work- ing in small groups have been the source of nearly all the major methodological breakthroughs that have driven the modern revolution ire biology, the proposed organization ensures that our extraordinarily successful pattern of doing biology will be preserved. In multidisciplinary centers, 3 to 10 research groups, each with an outstanding independent scientific director anal a different but related focus, are envisioned as sharing equipment and personnel in core facilities and collaborating to accomplish a larger goal than any single group could readily achieve on its own. These centers could efficiently coordinate the large number of different experimental and computer capabilities needed for the development of techniques as well as work out optimal strategies that produce actual mapping and sequencing data. The committee does not believe that one or a few large production centers for mapping or sequencing should be established at this time. Strong technical and intellectual advantages are obtained by distrib- uting mapping and sequencing work among smaller multidisciplinary centers and individual research laboratories. One major advantage is that the resulting competition will stimulate research. Another is that it allows the most successful units to be identified so that the available resources can be directed to them. Moreover, the dispersal of the groups will allow close interactions to be established with a large number of other biological scientists. These interactions will be essential both for the intellectual contributions derived from other scientists and for enabling the new techniques developed in this project to be applied quickly and efficiently to a wide variety of important biological problems. MANAGEMENT STRATEGY For the human genome project to be of maximum value, the committee believes that it needs to be well organized and coordinated. For this to be effectively done, a majority of the committee members feels that the project should be sited within one of three federal agencies: the National Institutes of Health, the Department of Energy, or the National Science Foundation. This lead agency would receive a specific appropriation for the project and be responsible for the disbursement of funds through a peer-review process. It would be responsible for the operation of the stock center and the data center, the coordination of the efforts of the many laboratories involved in the effort, and serve as an information clearinghouse. It would also
EXECUTIVE SUMMARY 1 1 handle the many other administrative details that will arise in a project of this magnitude. Although the lead agency would have the ultimate responsibility for funding and policy decisions, it should draw on the advice and expertise of a Scientific Advisory Board (SAB). The SAB would be made up predominately of scientists with expertise in the methods and goals of the project. The major responsibilities of the SAB would include: · To facilitate coordination of the efforts of the many laboratories that are expected to participate in this effort. · To help assure the accessibility of all information and materials generated in the project by advising on the oversight of the data center and the stock center and recommending contracts where appropriate. It would oversee formation of standard terminologies and reporting formats so that the large body of information to be obtained can be readily communicated and analyzed by the entire scientific community. · To monitor the quality of research by helping to assure a uniform standard of peer review. · To suggest mechanisms for strict quality controls on the sequence and mapping data collected. · To promote international cooperation, serving as a liaison to projects outside the United States regardless of their funding sources. · To make recommendations concerning the establishment of large sequencing endeavors, thereby balancing focus with breadth. · To publish periodic reports stating progress, problems, and recommendations for research. The committee strongly believes that a project to map and sequence the human genome should be undertaken. It is aware of the ethical, social, and legal implications of such an effort, but feels that they can be adequately addressed. This project would greatly increase our understanding of human biology and allow rapid progress to occur in the diagnosis and ultimate control of many human diseases. As visualized, it would also lead to the development of a wide range of new DNA technologies and produce the maps and sequences of the genomes of a number of experimentally accessible organisms, provid- ing central information that will be important for increasing our understanding of all biology.