National Academies Press: OpenBook

Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology (1995)

Chapter: Assembling Physical Maps by "Fingerprinting" Random Clones

« Previous: Excursion: Susceptibility to Colon Cancer in Mice and the Large Deviation Theory of Diffusion Processes
Suggested Citation:"Assembling Physical Maps by "Fingerprinting" Random Clones." National Research Council. 1995. Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology. Washington, DC: The National Academies Press. doi: 10.17226/2121.
×
Page 42
Suggested Citation:"Assembling Physical Maps by "Fingerprinting" Random Clones." National Research Council. 1995. Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology. Washington, DC: The National Academies Press. doi: 10.17226/2121.
×
Page 43
Suggested Citation:"Assembling Physical Maps by "Fingerprinting" Random Clones." National Research Council. 1995. Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology. Washington, DC: The National Academies Press. doi: 10.17226/2121.
×
Page 44
Suggested Citation:"Assembling Physical Maps by "Fingerprinting" Random Clones." National Research Council. 1995. Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology. Washington, DC: The National Academies Press. doi: 10.17226/2121.
×
Page 45
Suggested Citation:"Assembling Physical Maps by "Fingerprinting" Random Clones." National Research Council. 1995. Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology. Washington, DC: The National Academies Press. doi: 10.17226/2121.
×
Page 46

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

MAPPING HEREDITY: USING PROBABILISTIC MODELS AND ALGORITHMS TO MAP GENES AND GENOMES 42 Returning to the problem of colon cancer, we applied this analysis to the mouse genome, which has C ≈ 20 chromosomes and genetic length G ≈ 16. By genetic mapping, we found a striking region on mouse chromosome 4 for which Zmax = 4.3. The nominal significance level of the statistic is p = 1.7 × 10−5. After correcting for searching over an entire genome (by multiplying by 2G(Zmax)2, the significance level is p ≈ 0.01. This suggests that there is indeed a modifying gene in this region of chromosome 4. On the strength of this analysis, several additional crosses were arranged to confirm this result. With more than 300 animals analyzed, the results are now unambiguous: the corrected significance level is now < 10−10, and it appears that a single copy of the suppressing form of the gene can decrease tumor number at least twofold. Experiments are now under way to clone the gene, in order to learn its role in reducing colon cancer in genetically predisposed mice. With luck, it may suggest ways to do the same in humans. PHYSICAL MAPPING Assembling Physical Maps by "Fingerprinting" Random Clones Genetic mapping is only the first step toward positional cloning of a gene. Once a gene has been determined to lie between two genetic markers, the geneticist must produce a physical map—consisting of overlapping clones spanning the chromosomal region between the two flanking markers. Traditionally, physical maps have been produced by the process of chromosomal walking: one starts with clone C1 containing one of the genetic markers, uses C1 as a probe to find an overlapping clone C2, uses C2 as a probe to find C3, and so on until the region has been spanned (Figure 2.6). Chromosomal walking is an inherently serial procedure, and each step may take several weeks (due to the laboratory procedures involved in making and using a probe). This tedious process could be eliminated if one simply constructed a complete physical map of overlapping clones spanning the entire genome. The idea is more practical than it may seem at first glance. Whereas chromosomal walking proceeds serially, a physical map of an entire

MAPPING HEREDITY: USING PROBABILISTIC MODELS AND ALGORITHMS TO MAP GENES AND GENOMES 43 genome can be constructed in parallel. The idea is to describe each clone C by an easily determined fingerprint F (C)—which can be thought of as a set of "attributes" of C. If two clones have substantial overlap, their fingerprints should be similar. Conversely, if two clones have very similar fingerprints, they are likely to overlap. In principle, one should be able to construct a physical map by fingerprinting a large collection of clones and using computer analysis to compare the fingerprints and recognize the overlaps. Figure 2.6 Schematic diagram illustrating chromosome walking. One starts by isolating a clone C1 containing the initial starting point. C1 is then used as a probe to isolate overlapping clones, such as C2. The process is iterated to obtain successive steps in the walk. Although at each step one isolates clones extending in either direction, only those clones extending the walk to the right are shown in the diagram. The choice of a fingerprinting method depends principally on laboratory considerations; certain types of clones are more amenable to certain types of analysis. Given a large collection of random subclones taken from a genome G, possible fingerprints include the following: • Complete DNA sequence. For very small genomes such as those of viruses, it is practical to reassemble the genome from very short subclones of length ~300 to 500 base pairs. For such short subclones, the best fingerprint is the complete DNA sequence of the subclone. It turns out to be relatively easy to sequence such short subclones in one laboratory step, and the resulting sequence provides the most complete possible fingerprint of the clone. Using this information, one can attempt to find the overlaps and piece together the sequence. In fact, this is a widely used technique, referred to as "shotgun" sequencing (Figure 2.7). However, the method is effective only for genomes of length < 100,000 base pairs. For larger genomes (such as the genome of even the simplest bacterium), it is difficult to analyze enough subclones to ensure that the entire genome is covered

MAPPING HEREDITY: USING PROBABILISTIC MODELS AND ALGORITHMS TO MAP GENES AND GENOMES 44 Figure 2.7 Schematic diagram illustrating "shotgun" DNA sequencing assembly. To obtain the sequence of a larger piece of DNA, one determines the sequence of random subclones and pieces together the complete pieces based on the overlaps. In practice, the subclones are considerably larger than those shown (typically 300 to 500 base pairs) and the overlaps used in assembling the sequence are much larger. (see the discussion of the coverage problem below). Moreover, the ability to reassemble the sequence is stymied by the frequent occurrence of repeat sequences, which hamper the recognition of overlaps. Nonetheless, shotgun sequencing of small subclones is the method of choice for sequencing moderate- sized DNA fragments. • Restriction map. Larger genomes must be analyzed by studying larger subclones. Such subclones are typically too large to be conveniently sequenced. Instead, restriction maps can provide a useful fingerprint. Restriction maps show the positions of recognition sites at which particular restriction enzymes cut. For example, the restriction enzyme EcoRI cleaves at the sequence GAATTC. In effect, a restriction map is an ordered list of the restriction fragments in a clone. To make a restriction map, one can use the method of partial digestion: one radioactively labels one end of a clone, adds a restriction enzyme briefly so that only a random selection of the sites are cut, and measures the lengths of the resulting fragments (Figure 2.8). Restriction maps can be efficiently constructed for clones of moderate size (up to about 50,000 base pairs), although the procedure can be tedious and exacting. If two clones have restriction maps that share several

MAPPING HEREDITY: USING PROBABILISTIC MODELS AND ALGORITHMS TO MAP GENES AND GENOMES 45 consecutive fragments, it is a good bet that they overlap. With this strategy, Kohara and colleagues (1987) constructed a complete physical map of the bacterium Escherichia coli with a genome of 4.6 million base pairs using phage clones containing fragments of about 15,000 base pairs. Restriction fragment sizes. Rather than constructing an ordered list of the restriction fragments, one can construct an unordered list. This turns out to be technically simpler, because one need not carefully control the rate of cutting as in partial digestion. Clones can instead be digested to completion and the fragment lengths measured. Although the unordered list contains less information, it can still provide an adequate fingerprint. For Figure 2.8 Schematic diagram illustrating restriction mapping of a DNA fragment by partial digestion. The DNA fragment at the top has several sites (denoted by E) that can be cleaved by the restriction enzyme EcoRI. A large collection of molecules of this DNA fragment is radioactively labeled at one end (denoted by a star) and then exposed briefly to the restriction enzyme. The period of exposure is sufficiently brief that the enzyme can cleave only about one site per molecule, resulting in a collection of radioactively labeled fragments terminating at the various E sites. The length of these fragments (and thus the positions of the E sites) can be determined by gel electrophoresis of the fragments and subsequent exposure of the gel to x-ray film.

MAPPING HEREDITY: USING PROBABILISTIC MODELS AND ALGORITHMS TO MAP GENES AND GENOMES 46 example, Olson and colleagues (1986) used this approach to construct a physical map of the yeast Saccharomyces cerevisiae with a genome of 13 million base pairs. • Content of sequence tagged sites. For very large genomes such as the human genome with 3 × 109 base pairs, it is necessary to work with large subclones of length > 100,000 base pairs. For such large subclones, a different fingerprinting strategy has gained favor in recent years. The method is based on sequence tagged sites (STSs), which are very short unique sequences taken from the genome which can be easily assayed by the polymerase chain reaction (PCR). The fingerprint of a clone is the list of STSs contained within it; the data form an incidence matrix of clones by STSs (Figure 2.9). Clones containing even a single unique STS in common should overlap. As an aside, the determination of which clones contain a given STS is typically made using a combinatorial pool scheme that avoids having to test each STS against each clone (Green and Olson, 1990). Using this approach, Foote et al. (1992) and Chumakov et al. (1992) constructed the first complete maps of human chromosomes (Y and 21, respectively). Regardless of the experimental details of the fingerprinting scheme, there are two key mathematical issues pertinent to the construction of a physical map: 1. Algorithms for map assembly. Given the fingerprinting data, what algorithm should be used for constructing a physical map? This question is closely related to graph theory: given information about adjacency among clones inferred from their fingerprints, one must reconstruct the underlying geometry of the physical map. 2. Statistics of coverage. How many clones must be studied to yield a map covering virtually the entire genome? This question belongs to probability theory: assuming that subclones are distributed randomly across the genome, one needs to know the distribution of gaps—uncovered regions or undetected overlaps—in the map.

Next: Excursion: Designing a Strategy to Map the Human Genome »
Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology Get This Book
×
 Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology
Buy Paperback | $80.00
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

As researchers have pursued biology's secrets to the molecular level, mathematical and computer sciences have played an increasingly important role—in genome mapping, population genetics, and even the controversial search for "Eve," hypothetical mother of the human race.

In this first-ever survey of the partnership between the two fields, leading experts look at how mathematical research and methods have made possible important discoveries in biology.

The volume explores how differential geometry, topology, and differential mechanics have allowed researchers to "wind" and "unwind" DNA's double helix to understand the phenomenon of supercoiling. It explains how mathematical tools are revealing the workings of enzymes and proteins. And it describes how mathematicians are detecting echoes from the origin of life by applying stochastic and statistical theory to the study of DNA sequences.

This informative and motivational book will be of interest to researchers, research administrators, and educators and students in mathematics, computer sciences, and biology.

READ FREE ONLINE

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!