What Will It Take to Sequence an Individual’s Genome for Under $1,000 in Less Than 10 Years?
WORKING GROUP DESCRIPTION
Following the sequencing of the human genome, the next challenge is to ascertain how variations among genomes of individual persons play significant roles in their variations in disease susceptibility and severity and the efficacy of medical therapies. An individualized approach to medicine requires the ready acquisition of information about the sequence of individual persons’ genomes with reliable technology at acceptable costs. Thus, the ability to sequence an individual person’s genome for under $1,000 within the next 10 years has been set as a goal.
What are the critical technological roadblocks and fundamental biological showstoppers? The traditional method of sequencing involves gel-based technology with optical detection. The estimated cost of using such a technology for each entire genome would be approximately $20 million. However, a variety of new sequencing methods is under development. For example, Shendure and colleagues (2005) have developed a new technology that uses color-coded beads (~1 µm in diameter), a microscope, and a camera to replicate thousands of oligo strands of DNA with
each strand on its own tiny bead. Fourteen million such beads can be packed in an area of the size of a dime. Each camera frame can analyze beads, each of which has one of four dye colors. The flow of the beads is computer controlled and the camera records the dye color and hence the sequence. It is estimated that the cost per base with this new technology is ~1/9 of conventional sequencing. With continued development of novel technologies (Shendure et al., 2005; Margulies et al., 2005) that can reduce the cost of sequencing by factors of 10, the goal of $1,000 should be achieved. With a reference sequence to guide analysis of individual genomes, assembly is straightforward.
Is it indeed so far fetched or can it be done for each individual using analysis of the cluster (macro-level) variations from the baseline of a generic human genome? It is expected that the average variation across humans in the genome sequence is 0.1 percent or less, and the changes in the coding regions are much smaller. The single nucleotide polymorphism and haplotype mapping projects are beginning to provide the variations across humans that might give rise to pathology. Initially, efforts to define diseasecausing variants will use SNP (single-nucleotide polymorphism) markers, enhanced in power by the HapMap, but this will switch to whole genome analysis as sequencing costs drop. The challenge then will be to identify the causative changes among the many revealed differences. Comparative sequence analysis is rapidly identifying the estimated 5 percent of the genome that is well conserved and appears to be functional, thus substantially narrowing the search.
What will we do with the information for diseases?
The awareness of the potential for specified diseases would provide an opportunity for changing lifestyles that would be more conducive to healthy living.
The awareness would also enable medical monitoring on a periodic basis along with risk assessments (genomic triage).
In a decade it is likely that gene therapy and other therapeutic methods will begin to take shape for targeted therapeutics. Identification of specific gene defects through selective sequencing can aid in focused efforts for gene therapy.
Margulies, M., M. Egholm, W. E. Altman, S. Attiya, J. S. Bader, L. A. Bemben, J. Berka, M. S. Braverman, Y. J. Chen, Z. Chen, S. B. Dewell, L. Du, J. M. Fierro, X. V. Gomes, B. C. Godwin, W. He, S. Helgesen, C. H. Ho, G. P. Irzyk, S. C. Jando, M. L. Alenquer, T. P. Jarvie, K. B. Jirage, J. B. Kim, J. R. Knight, J. R. Lanza, J. H. Leamon, S. M. Lefkowitz, M. Lei, J. Li, K. L. Lohman, H. Lu, V. B. Makhijani, K. E. McDade, M. P. McKenna, E. W. Myers, E. Nickerson, J. R. Nobile, R. Plant, B. P. Puc, M. T. Ronan, G. T. Roth, G. J. Sarkis, J. F. Simons, J. W. Simpson, M. Srinivasan, K. R. Tartaro, A. Tomasz, K. A. Vogt, G. A. Volkmer, S. H. Wang, Y. Wang, M. P. Weiner, P. Yu, R. F. Begley, and J. M. Rothberg. 2005. Genome sequencing in microfabricated high-density picolitre reactors. Nature. Published online before print, Jul. 31. Online at www.nature.com/nature/journal/vaop/ncurrent/abs/nature03959.html, accessed 2/2/2006.
Pevzner, P. A., H. Tang, and G. Tesler. 2004. De novo repeat classification and fragment assembly. Genome Research 14(9):1786-1796. Erratum in Genome Research 14(12):2510.
Shendure, J., G. J. Porreca, N. B. Reppas, X. Lin, J. P. McCutcheon, A. M. Rosenbaum, M. D. Wang, K. Zhang, R. D. Mitra, and G. M. Church. 2005. Accurate multiplex polony sequencing of an evolved bacterial genome. Online at Sciencexpress www.sciencemag.org/cgi/content/abstract/1117389v1, accessed 2/2/2006.
WORKING GROUP SUMMARY
Summary written by:
Leah Moore Eisenstadt, Graduate Student, Science Journalism, Boston University
Working group members:
Steven Brenner, Associate Professor, Plant and Microbial Biology, University of California, Berkeley
Siobhan Dolan, Obstetrics, Gynecology and Women’s Health, Albert Einstein College of Medicine and March of Dimes Birth Defects Foundation
Leah Moore Eisenstadt, Graduate Student, Science Journalism, Boston University
Mark Guyer, Director, Division of Extramural Research, The National Human Genome Research Institute
Leonid Kruglyak, Professor of Ecology and Evolutionary Biology, Lewis-Sigler Institute for Integrative Genomics, Princeton University
Babak Parviz, Assistant Professor, Electrical Engineering, University of Washington
Holger Schmidt, Assistant Professor, Electrical Engineering, University of California, Santa Cruz
Eric Topol, Provost and Chief Academic Officer, Cleveland Clinic Foundation
Victor Ugaz, Assistant Professor, Department of Chemical Engineering, Texas A&M University
Huntington Willard, Director and Professor, Institute for Genome Sciences and Policy, Duke University
Shaying Zhao, Assistant Professor, Biochemistry and Molecular Biology, and Institute of Bioinformatics, The University of Georgia
On April 14, 2003, scientists announced that the Human Genome Project was essentially complete—two years ahead of schedule. After 13 years of work, the three billion base pairs of the human genome were known in great detail and with high accuracy. The Washington Post said that the project “revealed in exquisite detail the genetic blueprint underlying all human life.” The Boston Globe deemed the genome “the last milestone for one of the modern era’s grandest scientific endeavors.”
The project cost $3 billion to complete, and despite today’s technology, which has advanced significantly since the start of the project in 1990, it would cost a whopping $20 million to do it again. Lowering the cost of sequencing would make the genome both a useful research tool and a potential avenue to personalized clinical management and treatment. Working group 12, therefore, was given the task of answering the question: What will it take to sequence an individual’s genome for under $1,000 in fewer than 10 years?
As the group first gathered in the room, the team of biologists, engineers, and institute directors threw around the idea of what the group’s task actually was. Some members interpreted the question as requiring answers of a technical nature, while others saw it as merely a conversation starter to discuss the future of genomic sequencing technology.
Why do we need the $1,000 genome?
One issue that was quickly brought to the table, and that persisted throughout the sessions, was the necessity of a $1,000 genome. A cheaper genome could one day inform patients of their genetic risk for cancer, or it could allow researchers to identify the next avian flu. Because the group’s initial task referred to an individual’s genome, the question of clinical outcomes was on everyone’s mind. The group, however, saw the research uses of the $1,000 genome as being much more promising much sooner. A cheaper genome could allow scientists to characterize all the differences between a given human genome and a reference genome and help identify possible genetic sources of disease. Being able to quickly and very cheaply classify bacterial or viral pathogens in the field would also be a powerful tool.
The usefulness to research is a “no brainer,” said Eric Topol, of the Cleveland Clinic Foundation, but the value for clinical practice was another question entirely. Siobhan Dolan, a physician with Albert Einstein College of Medicine and the March of Dimes, agreed: “What do the genome bases mean for health care?” Dolan was concerned with the clinical uses of widely available, low-cost genomic information, and reflected on her experience with newborn baby testing. “You get a printout, give it to the parents, and say, ‘This is what you have, go!’” The group also noted that few diseases are monogenetic in origin, which complicates the genotype-phenotype relationship. For diseases that result from the complex interplay of several genetic mutations and epigenetic factors, the path from identifying the mutations in the lab to a treatment in the clinic may not be smooth. Others acknowledged the cart-before-the-horse result of genetic information that comes without the knowledge of what to do with it, but some members of the group suggested leaving sociology out of the discussion.
Even if the $1,000 genome were available, the group wondered, would the clinical outcomes be worth it? Screening for a few relevant SNPs might give a patient as much relevant, clinical information as the whole genome but with a much lower price tag. As the human genome has only been sequenced recently, however, there are potentially important areas of the genome that wouldn’t be included in a screen limited to SNPs. “Who can declare what is junk DNA?” asked Hunt Willard, director of the Institute for Genome Sciences and Policy at Duke University, adding, “I’d be afraid to bet my health on it.” After discussing the utility of the complete genome versus an incomplete one, the group decided that the cost-effective sequenc-
ing of complete genomes would be vital for research in the upcoming years, potentially yielding information about previously overlooked areas of the genome.
What does the $1,000 genome mean?
One step these experts took was to redefine what version of the genome should be reduced to $1,000. Babak Parviz, assistant professor of electrical engineering at the University of Washington, noted that to improve human health, genomes would be needed for both humans and pathogens, both small and large. Some microbes and parasites have genomes of 3 to 25 megabases, while the human genome is 3 gigabases long. Therefore, flexibility in the technology to sequence genomes would be best. Ideally, the cost of sequencing the genome would be scalable, lowering the cost of yeast or bacterial genomes to $0.30 per megabase.
The working group’s task was questioned by Leonid Kruglyak, professor of ecology and evolutionary biology at the Lewis-Sigler Institute for Integrative Genomics at Princeton University. “It’s not the main goal to resequence genomes over and over,” he said, referring to the narrow focus of the working group’s question at hand. “The main goal is to get cheaper sequencing in general.” Mark Guyer, director of the Division of Extramural Research at The National Human Genome Research Institute, agreed. “Focusing on the $1,000 genome may be too oversimplifying,” he said. “What we’re talking about is cheap data generation.” The group concurred, and chose to redefine the $1,000 genome as a metaphor for improving technology for sequencing and for capturing genomic variation. The improved technology, in turn, would spur reduction in costs for sequencing other genomes, such as the $1 bacterial genome and the $5 parasite genome.
Another point of discussion was the accuracy and completeness of the $1,000 genome. One extreme is a genome that has over 90 percent accuracy over 80 percent of the genome with tens of thousands of false positives and no information about structural variation. The other extreme is letter-perfect sequence with technology that also captures variations in structure, copy number, translocations, and inversions. Thus, the full clinical utility will depend not merely on the $1,000 needed for a complete genome sequence but also the cost of additional technologies to assess genome and chromosome structure (currently assessed by karyotyping), as well as the copy number of each segment of the genome (currently assessed by meth-
ods such as array-based comparative genome hybridization). “I want the latter, not the cheapo version,” said Eric Topol. “Maybe structural things and epigenetics are important in health and disease; we don’t know how much simple sequence is tied to that.”
How do we get to the $1,000 genome?
When the Human Genome Project began in 1990, the cost of sequencing was $10 per base pair. The project used sequencing methods based on those developed by Frederick Sanger, which involves polymerase chain reaction (PCR) amplification of DNA that is then separated by gel electrophoresis. During the genome project, the cost of sequencing dropped 2,000-fold, now approaching $0.10 per base pair. In order to get the total cost down to $1,000, the cost needs to drop another 20,000-fold, which is not impossible when compared to some of the other fantastic advances in technology. “Compared to advances in semiconductors,” said Babak Parviz, “this is not science fiction.” Electronic chips can have billions of devices and sell for $200, but it took almost four decades to increase from 4,000 transistors per device in the 1960s to billions today.
The group discussed the source of the genome’s current sticker shock. Shaying Zhao, a University of Georgia molecular biologist, suggested that much of the cost lies in the fluorescent dye used in conventional sequencing. It has to be a revolutionary and radical change, she said, to bring the cost down so many orders of magnitude. “[But] even if you improve the dye to decrease cost,” Zhao added, “a $1,000 genome is still impossible with the current methods.” Ron Davis, a geneticist and biochemist from Stanford University, who visited the group during the second two-hour session, said that sequencing costs are split evenly between labor, reagents, and instrumentation.
New and exciting technologies that can revolutionize sequencing methods are in development by private companies and academic researchers. Microelectrophoretic devices use microfabricated wafers containing 384-well capillaries. DNA is injected at the perimeter of the wafer and runs toward the center, where detection of the bases is carried out by confocal fluorescence scanning. Another new technology is nanopore sequencing, which lets single-stranded DNA strands pass single-file through a nanopore in a lipid bilayer. With this technology, ionic current through the open channel drops as various polynucleotides pass through the pore, but the pores must be improved to achieve single-base resolution.
Another new method is sequencing by hybridization, in which the differential hybridization of oligonucleotide probes reveals the sequence. In one type of hybridization sequencing, DNA is immobilized and serial hybridizations are carried out with short probes. The differential binding of specific probes to DNA can be used to identify the sequence. In another type of hybridization sequencing, used by the companies Affymetrix and Perlegen, probes are immobilized to arrays of sample DNA. Each array, or chip, has four features, with the middle of each feature being either an A, C, G, or T base pair. Labeled sample DNA is hybridized to the chip and the sequence is determined by measuring which feature gives the strongest signal. Hybridization technology faces challenges in approaching $1,000 for a whole genome: avoiding cross-hybridization of probes to incorrect targets, and the requirement for sample preparation, such as PCR amplification.
Cyclic-array sequencing methods, such as fluorescent in situ sequencing, pyrosequencing, and single-molecule methods, take advantage of the power of parallel sequencing. These methods use repeated cycles of polymerase extension, or synthesis, with one nucleotide at each step. The magnitude of the signal during each cycle can be used to infer the order of bases in each sequence. All cyclical methods involve amplification steps that are spatially isolated. One company, 454 Corporation, scaled up pyrosequencing by using thousands of parallel picoliter-volume PCR amplifications. One difficulty with cyclic-array methods is consecutive runs of the same base. Relative amounts of signal may be used to reveal the length of those runs, but one solution involves reversible terminators that would enable simultaneous use of all four bases. Another cyclic-array method eliminates the need for amplification by directly sequencing single molecules. Some companies developing this technology include Solexa, Genovoxx, Nanofluidics, and Helicos. But our group recognized that significant public and private funding would need to be in place for these companies to develop new technology, in addition to a biological driver.
Mark Guyer told the group that the National Human Genome Research Institute is spending $25 million per year in grants to (1) reduce the cost of sequencing by two orders of magnitude to $100,000 within 5 years and (2) to reduce that to $1,000 over the next 10 years. Ron Davis said, “I think it’s technically feasible in 10 years.” Robert Waterston, who worked on the first human genome, told the group, “It’s going to be done.” He noted that Solexa would release a machine next year that it says will give 10-fold coverage of the genome for $100,000. That price tag will only get smaller with time.
The genome isn’t everything
If given the capability of a $1,000 genome tomorrow, Hunt Willard said he would spend a decade doing basic and clinical research, and eventually translate that into clinical care. Everyone agreed, knowing the hurdles of using genomic information in the clinic. The group agreed that a cost-effective human genome would not be a panacea in terms of health care. Eric Topol said, “No matter if you had the whole ball of wax, it’s only a small piece of the story, not including the interaction of genes or environment.” Hunt Willard responded, “But you could do more research faster with the $1,000 genome.”
Even when genomic information can inform medical decisions, physicians may be reluctant to use it. “It’s the technology/culture divide,” Willard said. “If you delivered this information [to clinicians] tomorrow, people would drop it like it’s a hot potato; they’d have no idea what to do with it in the current healthcare climate.” It’s not the healthcare climate that’s to blame. The fact is, science doesn’t yet know what all the information means. Genomics is not ready for prime time in routine medical practice. But Topol voiced the core of what motivates and inspires researchers to improve sequencing technology: “If you can cut through that divide, the opportunities for understanding the biological basis of disease is extraordinary.”