National Academies Press: OpenBook
« Previous: K-Best Alignments
Suggested Citation:"Approximate Pattern Matching." National Research Council. 1995. Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology. Washington, DC: The National Academies Press. doi: 10.17226/2121.
×
Page 78

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

SEEING CONSERVED SIGNALS: USING ALGORITHMS TO DETECT SIMILARITIES BETWEEN BIOSEQUENCES 78 scores in order or give a color-coded visualization of the edit graph that colors edges according to the score of the best path utilizing the edge. Another interesting variation is to explore the range of solutions not by enumerating near-optimal answers, but by studying the range of optimal answers produced by parametrically varying aspects of the underlying scoring scheme (Waterman et al., 1992). Approximate Pattern Matching A variation on the local alignments problem discussed above is the approximate match problem. For this problem, imagine that A is a very long sequence and B a comparatively short query sequence. The problem is to find substrings of A, called match sites, that align with the entirety of B with a score greater than some user- specified threshold. An example might be to find all locations in a chromosome's DNA sequence (A) where a particular DNA sequence element (B) or some sequence like it occurs. It is not hard to see that this problem is equivalent to finding sufficiently high scoring paths that begin at a vertex in row 0 and end at row N of the edit graph for A and B. By simply permitting 0 to be a term in the computation of S-values in row 0 and checking values in row N, one obtains the desired modification of the basic dynamic programming algorithm. The problem is taken to another level by generalizing B, the query, from a sequence to a pattern (that describes a set of sequences). This variation is called approximate pattern matching. Computer scientists working on text-searching applications have long studied the problem of finding exact matches to a pattern in a long text. That is, given a pattern as a query, and a text as a database, one seeks substrings of the database text that match the pattern (exactly). Pattern types that have been much studied include the cases of a simple sequence, a regular expression, and a context-free language. Such patterns are notations that denote a possibly infinite set of sequences, each of which is said to (exactly) match the pattern. For example, the regular expression A(T|C)G* denotes the set of sequences that start with an A followed by a T or a C and then zero or more G's, that is, the set {AT, AC, ATG, ACG, ATGG, ACGG, ATGGG,. . .}. Assuming the pattern takes P symbols to specify and the text is of length

Next: Parallel Computing »
Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology Get This Book
×
Buy Paperback | $80.00
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

As researchers have pursued biology's secrets to the molecular level, mathematical and computer sciences have played an increasingly important role—in genome mapping, population genetics, and even the controversial search for "Eve," hypothetical mother of the human race.

In this first-ever survey of the partnership between the two fields, leading experts look at how mathematical research and methods have made possible important discoveries in biology.

The volume explores how differential geometry, topology, and differential mechanics have allowed researchers to "wind" and "unwind" DNA's double helix to understand the phenomenon of supercoiling. It explains how mathematical tools are revealing the workings of enzymes and proteins. And it describes how mathematicians are detecting echoes from the origin of life by applying stochastic and statistical theory to the study of DNA sequences.

This informative and motivational book will be of interest to researchers, research administrators, and educators and students in mathematics, computer sciences, and biology.

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!