National Academies Press: OpenBook

Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology (1995)

Chapter: Chapter 3 Seeing Conserved Signals: Using Algorithms to Detect Similarities between Biosequences

« Previous: REFERENCES
Suggested Citation:"Chapter 3 Seeing Conserved Signals: Using Algorithms to Detect Similarities between Biosequences ." National Research Council. 1995. Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology. Washington, DC: The National Academies Press. doi: 10.17226/2121.
×
Page 56
Suggested Citation:"Chapter 3 Seeing Conserved Signals: Using Algorithms to Detect Similarities between Biosequences ." National Research Council. 1995. Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology. Washington, DC: The National Academies Press. doi: 10.17226/2121.
×
Page 57

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

SEEING CONSERVED SIGNALS: USING ALGORITHMS TO DETECT SIMILARITIES BETWEEN BIOSEQUENCES 56 Chapter 3— Seeing Conserved Signals: Using Algorithms to Detect Similarities between Biosequences Eugene W. Myers University of Arizona The sequence of amino acids in a protein determines its three-dimensional shape, which in turn confers its function. Segments of the protein that are critical to its function resist evolutionary pressures because mutations of such segments are often lethal to the organism. These critical "active sites" tend to be conserved over time and so can be found in many organisms and proteins that have similar function. Analogously, functionally important segments of an organism's DNA tend to be conserved and to recur as common motifs. In this chapter, the author introduces algorithms for comparing DNA and protein sequences to reveal similar regions. Particular attention is given to the problem of searching a large database of catalogued sequences for regions similar to a newly determined sequence of unknown function. Since the advent of deoxyribonucleic acid (DNA) sequencing technologies in the late 1970s, the amount of data about the protein and DNA sequence of humans and other organisms has been growing at an exponential rate. It is estimated that by the turn of the century there will be terabytes of such biosequence information, including DNA sequences of entire human chromosomes. Databases of these sequences will contain a wealth of information about the nature of life at the molecular level if we can decipher their meaning.

SEEING CONSERVED SIGNALS: USING ALGORITHMS TO DETECT SIMILARITIES BETWEEN BIOSEQUENCES 57 Proteins and DNA sequences are polymers consisting of a chain of monomers with a common backbone substructure that links them together. In the case of DNA, there are 4 types of monomers, the nucleotides, each having a different side chain. For proteins, there are 20 types of monomers, the amino acids. With just a few exceptions, the sequence of monomers, that is, the primary structure, of a given protein or DNA strand completely determines the three-dimensional shape of the biopolymer. Because the function of a molecule is determined by the position of its atoms in space, this almost perfect correlation between sequence and structure implies that to know the function of a biopolymer, it in principle suffices to know its primary sequence. The primary sequence of a DNA segment is denoted by a string consisting of the four letters A,C,G, and T. Analogously, the primary sequence of a protein is denoted by a string consisting of 20 letters of the alphabet, one for each type of amino acid. In principle, these strings of symbols encode everything one needs to know about the protein or DNA strand in question. If the primary sequences of two proteins are similar, then it is reasonable to conjecture that they perform the same function. Because DNA's principal role is one of encoding information (including all of an organism's proteins), the similarity of two segments of DNA suggests that they code similar things. Mutation in a DNA or protein sequence is a natural evolutionary process. Errors in the replication of DNA can cause a change in the nucleotide at a given position. Less often, a nucleotide is deleted or inserted. If the mutation occurs in a region of DNA that codes for protein, these changes cause related changes in the primary sequence and, hence, the shape and activity of the protein. The impact of a particular mutation depends on the degree to which the original and new amino acid sequences differ in their physical and chemical properties. Mutations that result in proteins that are so altered that they function improperly or not at all tend to be lethal to the organism. Nature is biased against mutations in those critical regions central to a protein's function and is more lenient toward changes in other regions. Similarity of DNA sequences is a clue to common evolutionary origin. If two proteins in two organisms evolved from a common precursor, one will generally find highly similar segments, reflecting strongly conserved critical regions. If the proteins are very recent derivatives, one might expect to see similarity over the entire length of the sequences. While

Next: FINDING GLOBAL SIMILARITIES »
Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology Get This Book
×
Buy Paperback | $80.00
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

As researchers have pursued biology's secrets to the molecular level, mathematical and computer sciences have played an increasingly important role—in genome mapping, population genetics, and even the controversial search for "Eve," hypothetical mother of the human race.

In this first-ever survey of the partnership between the two fields, leading experts look at how mathematical research and methods have made possible important discoveries in biology.

The volume explores how differential geometry, topology, and differential mechanics have allowed researchers to "wind" and "unwind" DNA's double helix to understand the phenomenon of supercoiling. It explains how mathematical tools are revealing the workings of enzymes and proteins. And it describes how mathematicians are detecting echoes from the origin of life by applying stochastic and statistical theory to the study of DNA sequences.

This informative and motivational book will be of interest to researchers, research administrators, and educators and students in mathematics, computer sciences, and biology.

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!