FIGURE 5-1 The growth in the size of the international sequence databases, since their inception in 1982. This graph shows the size of the databases at regular intervals, in numbers of nucleotides. (Data from the European Bioinformatics Institute.)

computational methods to predict which sequences constitute genes, that is, actually code for RNA and proteins. This is far from being a solved problem even for “complete” genomes. It will be even more difficult for the fragmentary sequences that will typically be obtained in metagenomics projects. Two databases, The Protein Information Resource and Swiss-Prot, were established as community resources for protein sequence data, in 1984



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement