advance both biomedical research and clinical care. Meanwhile, the magnitude of the challenges posed by the sheer scientific complexity of the molecular influences on health and disease are becoming apparent and suggest the need for powerful new research resources. All these changes provide an opportunity for the biomedical science and clinical communities to come together to improve both the discovery of new knowledge and health-care delivery. As discussed in this chapter, the Committee concluded that this opportunity could best be exploited through a major, long-term commitment to create an Information Commons, a Knowledge Network of Disease, and a New Taxonomy.

BIOLOGY HAS BECOME A DATA-INTENSIVE SCIENCE

Advances in DNA-sequencing technology powerfully illustrate biology’s conversion to a data-intensive science. The first papers describing practical methods of DNA sequencing were published in 1977 (Maxam and Gilbert 1977; Sanger et al. 1977). These methods required radioisotopic labeling of DNA, hand-crafting of large electrophoretic gels, and considerable expertise with biochemical and recombinant-DNA techniques. Although the impact of these early DNA-sequencing methods on biological discovery was profound, the total amount of sequence deposited in GenBank, the central depository for such data, did not pass one billion base pairs (one-third of the length of a single human genome) until 1997 (NCBI 2011a), and it only reached this landmark after a first generation of automated instruments came into widespread use (Favello et al. 1995). Since then > 300 billion base pairs (Benson et al. 2011) have been deposited, illustrating the still ongoing explosion of genomic data in the last 20 years.

The National Human Genome Research Institute estimated that the total cost of obtaining a single human-genome sequence in 2001 was $95 million (Wetterstrand 2011; see Figure 2-1). Costs subsequently dropped exponentially following a trajectory described in electronics as Moore’s Law, connoting a reduction of cost by 50 percent every two years, until the spring of 2007, at which point the estimated cost of a single human-genome sequence was still nearly $10 million. At that point, introduction of a second generation of automated DNA-sequencing instruments, based on massively parallel, miniaturized analysis, led to a collapse in costs far faster than the Moore’s Law projection. The most recent update, in January 2011, estimates the cost of a complete-genome sequence at $21,000, and the cost is still dropping rapidly, with a “$1000 genome” becoming a realistic target within a few years. (Wolinsky 2007; MITRE Corporation 2010; Mardis 2011)While whole-genome sequencing remains expensive by the standards of most clinical laboratory tests, the trend-line leaves little doubt that costs will drop into the range of many routine clinical tests within a few years. Whole-genome sequencing will soon become cheaper than many of the specific genetic tests that are widely ordered today and ultimately



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement