Skip to main content

Currently Skimming:

Maintaining the Integrity of Databases
Pages 17-22

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 17...
... "I work on genome annotation, which, broadly speaking, is the analysis and management of genomic data to predict and archive various kinds of biologic features, particularly genes, biologic signals, sequence characteristics, and gene products." Presented with a gene of unknown function, a gene annotator will look for other genes with similar sequences to try to predict what the new gene does. "What we would like to end up with," Overton explained, "is a report about a genomic sequence that has various kinds of data attached to it, such as experimental data, gene predictions, and similarity to sequences from various databases.
From page 18...
... "Using a rule base that included a set of syntactic rules written as grammar," Overton said, "we went through all the GenBank entries for eukaryotic genes and came up with a compact representation of the syntactic rules that describe eukaryotic genes." If a GenBank entry was not "grammatical" according to this set of syntactic rules, the system would recognize that there must be an error and often could fix it.
From page 19...
... "At GenBank, data are entered by the biologists who determine a sequence. They are not trained annotators; but when they deposit the nucleic acid sequence in GenBank, they are required to add various other information beyond the sequence data.
From page 20...
... Knowledge Bus is developing databases that incorporate ontologic theories theories about the nature of and relationships among the various types of objects in a database. In other words, the databases "know" a good deal about the nature of the data that they contain and what to expect from them, so they can identify various errors simply by noting that the data do not perform as postulated.
From page 21...
... "The rule," Andersen explained, "just explains that if the normal glucose phosphorylation has a certain free energy, then the one catalyzed by hexokinase will have the same free energy." (Free energy, a concept from thermodynamics, is related to the amount of work performed, or able to be performed, by a system.) Suppose that the system pulls in perhaps from one or more databases on the Internet experimentally determined values for the free energy of glucose phosphorylation and of glucose phosphorylation catalyzed by hexokinase, and suppose further that they do not agree.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.