National Academies Press: OpenBook

Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology (1995)

Chapter: Excursion: Susceptibility to Colon Cancer in Mice and the Large Deviation Theory of Diffusion Processes

« Previous: Efficient Algorithms
Suggested Citation:"Excursion: Susceptibility to Colon Cancer in Mice and the Large Deviation Theory of Diffusion Processes." National Research Council. 1995. Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology. Washington, DC: The National Academies Press. doi: 10.17226/2121.
×
Page 37
Suggested Citation:"Excursion: Susceptibility to Colon Cancer in Mice and the Large Deviation Theory of Diffusion Processes." National Research Council. 1995. Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology. Washington, DC: The National Academies Press. doi: 10.17226/2121.
×
Page 38
Suggested Citation:"Excursion: Susceptibility to Colon Cancer in Mice and the Large Deviation Theory of Diffusion Processes." National Research Council. 1995. Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology. Washington, DC: The National Academies Press. doi: 10.17226/2121.
×
Page 39
Suggested Citation:"Excursion: Susceptibility to Colon Cancer in Mice and the Large Deviation Theory of Diffusion Processes." National Research Council. 1995. Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology. Washington, DC: The National Academies Press. doi: 10.17226/2121.
×
Page 40
Suggested Citation:"Excursion: Susceptibility to Colon Cancer in Mice and the Large Deviation Theory of Diffusion Processes." National Research Council. 1995. Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology. Washington, DC: The National Academies Press. doi: 10.17226/2121.
×
Page 41

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

MAPPING HEREDITY: USING PROBABILISTIC MODELS AND ALGORITHMS TO MAP GENES AND GENOMES 37 Statistical Significance In many genetic situations, one may search for a disease gene by estimating at many locations along the genome. When multiple comparisons are done, the threshold for statistical significance must be higher than the threshold for a single comparison. But how high should the threshold be? In principle, looking for the presence of a gene at every position along a continuous line involves infinitely many tests—although nearby points are clearly correlated. Surprisingly, the answer to this threshold question turns out to depend on relatively recent results from the theory of large deviations of diffusion processes. This idea is elaborated on in the next section, based on an example from recent work in our laboratory on susceptibility to colon cancer. Excursion: Susceptibility to Colon Cancer in Mice and the Large Deviation Theory of Diffusion Processes Colon cancer is one of the most prevalent malignancies in Western societies, with an estimated 145,000 new cases and 60,000 deaths per year in the United States alone. Although environmental factors such as diet can markedly influence the incidence of the disease, genetic factors are known to play a key role. Some families show striking clusters of colon cancer, with aggregations far beyond what could be explained by chance alone. Among such colon cancer families, there is a distinctive subtype called familial adenomatous polyposis (FAP), which is characterized by the fact that affected individuals develop a large number of intestinal growths called polyps that can become tumors. Genetic mapping studies (Bodmer et al., 1987; Leppert et al., 1987) showed that FAP was genetically linked to a region on the long arm of human chromosome 5; subsequently, physical mapping studies led to the isolation of the responsible gene, named APC (Groden et al., 1991; Kinzler et al., 1991; Nishisho et al., 1991). One way to study the role of APC in tumorigenesis is to turn to biochemistry, in an effort to understand the cellular components with which the protein product interacts. Another way is to turn back to genetics for further insight. One observation about FAP families is that individuals inheriting precisely the same APC mutation may be affected to

MAPPING HEREDITY: USING PROBABILISTIC MODELS AND ALGORITHMS TO MAP GENES AND GENOMES 38 different degrees. What is the reason for the variability in the manifestation of the disease? Is it due to environment or to the effects of other genes? If the latter, then finding such modifying genes could shed light on the process by which colon cancer develops. By the usual scientific serendipity, animal studies turned out to hold an important clue. In 1990, William Dove's laboratory at the University of Wisconsin was performing mutagenesis experiments and identified a mouse that spontaneously developed colon tumors (Moser et al., 1990). The dominantly acting mutation responsible for the trait was named Min (for multiple intestinal neoplasia). After considerable genetic mapping and cloning, Dove and his colleagues showed that Min was in fact a mutation in the mouse version of the APC gene (Su et al., 1992). The Min mouse thus provided a model of human colon cancer and, in particular, a way to look for other genes that might suppress the development of colon tumors. The Min mutation is usually maintained in a heterozygous state on a mouse strain called B6, and such B6 Min/+ mice typically develop about 30 intestinal tumors and die by 3 to 4 months of age. When Dove and his colleagues crossed this mouse to another mouse strain called AKR, they got a surprising result: the F1 Min/+ progeny developed many fewer colon tumors. On average, the F1 mice developed about six tumors and most did not die from them. Somehow, the AKR strain must have contributed alleles at one or more genes that substantially modified the effects of Min. Dove's laboratory and our laboratory decided to collaborate to try to map the modifying genes (Dietrich et al., 1993). A backcross was arranged in which the F1 progeny were mated back to the more susceptible B6 strain (Figure 2.4). For any modifier locus, 50 percent of the progeny should inherit one copy of the suppressing allele from the AKR strain (that is, have genotype AB) and 50 percent should be homozygous for the nonsuppressing allele from B6 (that is, have genotype BB). Each animal inheriting the Min mutation was scored for its phenotype by dissecting the intestine and counting the number of tumors and for its genotype by typing the mice for a dense map of DNA polymorphisms that had been constructed in our laboratory (Dietrich et al., 1992). The complete data for animal i consists of two parts: phenotype i and a continuous function gi(x) indicating the genotype—which is either AB or AB—at each position along the chromosome (Figure 2.5). Actually,

MAPPING HEREDITY: USING PROBABILISTIC MODELS AND ALGORITHMS TO MAP GENES AND GENOMES 39 Figure 2.4 Distribution of colon tumors caused by the Min mutation. Mice from the B6 strain carrying the genotype Min/+ develop about 30 tumors on average. When these mice are crossed to the AKR strain, the resulting F, progeny develop only about 6 tumors. When the F, progeny are crossed back to the B6 strain, the resulting backcross progeny show a wide distribution in tumor number. (A) Design of cross. (B) Scatterplot of tumor numbers from different generations in the cross.

MAPPING HEREDITY: USING PROBABILISTIC MODELS AND ALGORITHMS TO MAP GENES AND GENOMES 40 the problem is slightly more complicated because one can only observe the genotype at the location of the DNA polymorphisms studied. However, for this discussion, the map can be assumed to be so dense that the data are essentially continuous. It can also be assumed that the number n of progeny is very large. Figure 2.5 Schematic representation of data for genetic analysis of quantitative traits in a backcross. Every offspring (i = 1,2,. . ., n) has a phenotype that is a continuous variable f and a genotype at every position in the genome. The genotype gi (x) at position x has two possible states in a backcross (homozygous or heterozygous, encoded as 0 or 1 and represented by black or white in the figure). The figure illustrates the case where the phenotype might depend on two quantitative trait loci (QTL1 and QTL2), according to a linear model = a1 g1 + a2g2+ ε, where g1 is the genotype at QTLp, the a1 are constants, and e is a normal random variable. At every position x along the chromosome, the animals can be divided into two sets according to their genotype: AB(x) = {animal i | gi (x) = AA} and BB(x) = {animal i | gi (x) = BA}. If a major modifier gene occurs at location x*, then the animals in AB(x*) should have many fewer tumors than the animals in BB(x). One could thus perform a t-test (the usual two sample t statistic based on the number of tumors per animal in the two groups) at every position along the

MAPPING HEREDITY: USING PROBABILISTIC MODELS AND ALGORITHMS TO MAP GENES AND GENOMES 41 chromosome to find a region where the t-statistic Z exceeds some critical threshold T. How high a threshold is needed to ensure statistical significance, if one scans the entire genome? If for a single chromosome there is no modifying gene along the chromosome, the t-statistic Z(x) at any given point x should be normally distributed with mean 0. It is thus easy to determine the appropriate significance level for the single test at x. But we need to know about the distribution of max Z(x), where the maximum is taken over the entire chromosome. This question belongs to the field of Gaussian processes. A family of variables {Y(x), a ≤ x ≤ b} is called a Gaussian process if for each n = 1,2,. . . and each x1 < x2 <. . .<xn, the random variables Z(x1),Z(x2),. . .,Z(xn) are jointly normally distributed. A Gaussian process is specified by its mean µ(t) = E(Z(t)) and its covariance C(s,t) = cov(Z(s),Z( t)). An important example is the ''Ornstein-Uhlenbeck process," in which µ(t) = 0 and C(s,t) = e−ß|s−t|. The Ornstein-Uhlenbeck process arises naturally in physics, because it describes the behavior of a particle undergoing Brownian motion trapped in a potential well. In recent years, Gaussian processes have been a subject of considerable mathematical interest, and the large deviation theory has been worked out for many cases, including the Ornstein-Uhlenbeck process. Interestingly, it is not hard to show that the statistic Z(x) in our genetic example also follows an Ornstein- Uhlenbeck process with ß = 2. (The mean is 0, and the covariance follows essentially from the Haldane mapping function mentioned above.) Using recent mathematical results (Feingold et al., 1993; Lander and Botstein, 1989), one can thus show that, for large t, P{max 0≤x≤G Z(x) ≥ t} ~ (C + 2Gt2)(1 − Φ(t)), where Φ (t) is the standard normal cumulative distribution function, C is the number of chromosomes, and G is the length of the genome in morgans. In short, the probability of exceeding threshold t somewhere along a genome of length G is larger by a factor of about 2Gt2 than the probability of exceeding it at a single point.

Next: Assembling Physical Maps by "Fingerprinting" Random Clones »
Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology Get This Book
×
 Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology
Buy Paperback | $80.00
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

As researchers have pursued biology's secrets to the molecular level, mathematical and computer sciences have played an increasingly important role—in genome mapping, population genetics, and even the controversial search for "Eve," hypothetical mother of the human race.

In this first-ever survey of the partnership between the two fields, leading experts look at how mathematical research and methods have made possible important discoveries in biology.

The volume explores how differential geometry, topology, and differential mechanics have allowed researchers to "wind" and "unwind" DNA's double helix to understand the phenomenon of supercoiling. It explains how mathematical tools are revealing the workings of enzymes and proteins. And it describes how mathematicians are detecting echoes from the origin of life by applying stochastic and statistical theory to the study of DNA sequences.

This informative and motivational book will be of interest to researchers, research administrators, and educators and students in mathematics, computer sciences, and biology.

READ FREE ONLINE

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!