test tube. Increasingly, however, biologists are spending all of their time working in silico, that is, making discoveries entirely on computers using data from gene sequences.
A federal advisory committee in June 1999, urged the National Institutes of Health to train biologists in computing. The advisory panel, chaired by David Botstein of Stanford University and Larry Smarr of the University of Illinois at Urbana-Champaign, recommended that NIH establish up to 20 new centers at the cost of $8 million per center per year to teach computer-based biomedical research. This is $160 million per year out of NIH’s budget for training computational biologists.
Citing a quote in Nature magazine that said, “It’s sink or swim as a tidal wave of data approaches,” Dr. Boguski said the need for computational biologists was urgent. The information landscape is vast in genomics research, and Dr. Boguski said he would describe that landscape and the staffing needs that must be fulfilled to negotiate it successfully.
One measure of the growth of scientific information is the growth in the MEDLINE database at the NIH’s National Library of Medicine. The database contains over 10 million articles, and this is growing by 400,000 articles per year; these are peer-reviewed journal articles. If one looks at the subset of molecular biology and genetics articles, the growth rate is considerably faster than that for biomedical literature as a whole. Another measure of the growth in biological information is the growth of DNA sequences. Rapid DNA sequencing technology was invented in 1975, but until it was automated in 1985 and until the Human Genome Project took off, the growth of DNA sequences was modest. Since the early 1990s, however, the growth rate has been extremely steep.
Comparing the growth in articles on DNA sequences with the growth of DNA sequences, Dr. Boguski identified a serious gap for the biomedical research community. From 1975 through 1995, there were more papers published in DNA sequences than the number of sequences. Since 1995—roughly five years after the inception of the Human Genome Project—an enormous gap has opened up. There are now more genes than articles about those genes. The challenge for biology today is to find ways to bridge the gap between the number of genes being discovered and techniques to classify and understand their function.
One technology that seeks to bridge the gap is functional genomics. Functional genomics refers to
The development and application of global (genome-wide or system-wide) experimental approaches to assess gene function by making use of the information and reagents provided by the genome projects. It is characterized by the high throughput or large-scale experimental methodologies combined with statistical and computational analysis of the results. The fundamental strategy in a functional genomics approach is to expand the scope of biological investigation from studying single genes or proteins to studying all genes or proteins in a systematic fashion.