National Academies Press: OpenBook
« Previous: Approximate Pattern Matching
Suggested Citation:"Parallel Computing." National Research Council. 1995. Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology. Washington, DC: The National Academies Press. doi: 10.17226/2121.
×
Page 79
Suggested Citation:"Parallel Computing." National Research Council. 1995. Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology. Washington, DC: The National Academies Press. doi: 10.17226/2121.
×
Page 80

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

SEEING CONSERVED SIGNALS: USING ALGORITHMS TO DETECT SIMILARITIES BETWEEN BIOSEQUENCES 79 N, there are algorithms that solve the text searching problem in O(P+N), O(PN), and O(PN3) time, depending on whether the pattern is a simple sequence, a regular expression, or context-free language, respectively. Fusing the concept of exact pattern matching and sequence comparison gives rise to the class of approximate pattern matching problems. Given a pattern, a database, a scoring scheme, and a threshold, one seeks all substrings of the database that align to some sequence denoted by the pattern with score better than the threshold. In essence, one is looking for substrings that are within a given similarity neighborhood of an exact match to the pattern. Within this framework, the similarity search problem is an approximate pattern matching problem where the pattern is a simple sequence. We showed earlier that this problem can be solved in O(PN) time. For the case of regular expressions, the approximate match problem can also be solved in O(PN) time (Myers and Miller, 1989), and, for context-free languages, an O(PN3) algorithm is known (Myers, 1994a). While the cost of searching for approximate matches to context-free languages is prohibitive, searching for approximate matches to regular expressions is well within reason and finds applications in searching for matches to structural patterns that occur in proteins. Parallel Computing The basic problem of comparing sequences has resisted better than quadratic, O(MN) time algorithms. This has led several investigators to study the use of parallel computers to achieve greater efficiency. As stated above, the S-matrix can be computed in any order consistent with the data dependencies of the fundamental recurrence. One naturally thinks of a row-by-row or column-by-column evaluation, but we pointed out as a third alternative that one could proceed in order of antidiagonals. Let antidiagonal k be the set of entries {(i j): i + j = k}. Note that to compute antidiagonal k, one only needs antidiagonals k − 1 and k − 2. The critical observation for parallel processing is that each entry in this antidiagonal can be computed independently of the other entries in the antidiagonal, a fact not true of the row-by-row and column-by-column

SEEING CONSERVED SIGNALS: USING ALGORITHMS TO DETECT SIMILARITIES BETWEEN BIOSEQUENCES 80 evaluation procedures. For large SIMD (single-instruction, multiple-data) machines, a processor can be assigned to each entry in a fixed antidiagonal and compute its result independently of the others. With O(M) processors, each antidiagonal can be computed in constant time, for a total of O(N) total elapsed time. Note that total work, which is the product of processors and time per processor, is still O(MN). The improvement in time stems from the use of more processors, not from an intrinsically more efficient algorithm. This observation about antidiagonals has been used to design custom VLSI (very large scale integration) chips configured in what is called a systolic array. The ''array" consists of a vector of processors, each of which is identical, performs a dedicated computation, and communicates only with its left and right neighbors, making it easy to lay out physically on a silicon wafer. For sequence comparisons, processor i computes the entries for row i and contains three registers that we will call L(i), V(i), and U(i). At the completion of the kth step, the processors contain antidiagonals k and k − 1 in their L and V registers, respectively, and the characters of B flow through their U registers. That is, L(i)k = S(i, k − i − 1), V(i)k = S(i, k − i), and U(i)k = bk−i , where X(i)k denotes the value of register X at the end of the kth step. It follows from the basic recurrence for S-values that the following recurrences correctly express the values of the registers at the end of step k + 1 in terms of their values at the end of step k: These recurrences reveal that to accomplish step k + 1, processor i − 1 must pass its register values to processor i and each processor must have just enough hardware to perform three additions and a three-term maximum. Moreover, each processor must have a (2|ψ|+1)-element

Next: COMPARING ONE SEQUENCE AGAINST A DATABASE »
Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology Get This Book
×
 Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology
Buy Paperback | $80.00
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

As researchers have pursued biology's secrets to the molecular level, mathematical and computer sciences have played an increasingly important role—in genome mapping, population genetics, and even the controversial search for "Eve," hypothetical mother of the human race.

In this first-ever survey of the partnership between the two fields, leading experts look at how mathematical research and methods have made possible important discoveries in biology.

The volume explores how differential geometry, topology, and differential mechanics have allowed researchers to "wind" and "unwind" DNA's double helix to understand the phenomenon of supercoiling. It explains how mathematical tools are revealing the workings of enzymes and proteins. And it describes how mathematicians are detecting echoes from the origin of life by applying stochastic and statistical theory to the study of DNA sequences.

This informative and motivational book will be of interest to researchers, research administrators, and educators and students in mathematics, computer sciences, and biology.

READ FREE ONLINE

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!