National Academies Press: OpenBook
« Previous: Parallel Computing
Suggested Citation:"COMPARING ONE SEQUENCE AGAINST A DATABASE." National Research Council. 1995. Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology. Washington, DC: The National Academies Press. doi: 10.17226/2121.
×
Page 81

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

SEEING CONSERVED SIGNALS: USING ALGORITHMS TO DETECT SIMILARITIES BETWEEN BIOSEQUENCES 81 memory that can be loaded with the scores for δ(ai, ?), δ (−, ?), and δ (ai, −) where ? is any symbol in the underlying alphabet ψ. The beauty of the systolic array is that it can perform comparisons of A against a stream of B sequences, processing each symbol of the target sequences in constant time per symbol. With current technology, chips of this kind operate at rates of 3 million to 4 million symbols per second. A systolic array of 1,000 of these simple processors computes an aggregate of 3 billion to 4 billion dynamic programming entries per second. COMPARING ONE SEQUENCE AGAINST A DATABASE The current GENBANK database (Benson et al., 1993) of DNA sequences contains approximately 191 million nucleotides of sequence in about 183,000 sequence entries, and the PIR database (Barker et al., 1993) of protein sequences contains about 21 million amino acids of data in about 71,000 protein entries. Whenever a new DNA or protein sequence is produced in a laboratory, it is now routine practice to search these databases to see if the new sequence shares any similarities with existing entries. In the event that the new sequence is of unknown function, an interesting global or local similarity to an already-studied sequence may suggest possible functions. Thousands of such searches are performed every day. In the case of protein databases, each entry is for a protein between 100 and 1,500 amino acids long, the average length being about 300. The entries in DNA databases have tended to be for segments of an organism's DNA that are of interest, such as stretches that code for proteins. These segments vary in length from 100 to 10,000 nucleotides. The limited length here is not intrinsic to the object as in the case of proteins, but because of limitations in the technology and the cost of obtaining long DNA sequences. In the early 1980s the longest consecutive stretches being sequenced were up to 5,000 nucleotides long. Today the sequences of some viruses of length 50,000 to 100,000 have been determined. Ultimately, what we will have is the entire sequence of DNA in a chromosome (100 million to 10 billion nucleotides), and entries in the database will simply be annotations describing interesting parts of these massive sequences.

Next: Heuristic Algorithms »
Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology Get This Book
×
 Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology
Buy Paperback | $80.00
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

As researchers have pursued biology's secrets to the molecular level, mathematical and computer sciences have played an increasingly important role—in genome mapping, population genetics, and even the controversial search for "Eve," hypothetical mother of the human race.

In this first-ever survey of the partnership between the two fields, leading experts look at how mathematical research and methods have made possible important discoveries in biology.

The volume explores how differential geometry, topology, and differential mechanics have allowed researchers to "wind" and "unwind" DNA's double helix to understand the phenomenon of supercoiling. It explains how mathematical tools are revealing the workings of enzymes and proteins. And it describes how mathematicians are detecting echoes from the origin of life by applying stochastic and statistical theory to the study of DNA sequences.

This informative and motivational book will be of interest to researchers, research administrators, and educators and students in mathematics, computer sciences, and biology.

READ FREE ONLINE

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!