Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
THE SECRETS OF LIFE: A MATHEMATICIAN'S INTRODUCTION TO MOLECULAR BIOLOGY 7 ⢠In the cross of the F1 generation (Aa) to the pure breeding wrinkled strain (aa), the offspring were a 1:1 mixture of Aa:aa according to which allele was inherited from the F1 parent. In the cross between two F1 parents (Aa), the offspring were a 1:2:1 mixture of AA:Aa:aa according to the binomial selection of alleles from the two parents. It is striking to realize that the existence of genes was deduced in this abstract mathematical way. Probability and statistics were an intrinsic part of early genetics, and they have remained so. Of course, Mendel did not have formal statistical analysis at his disposal, but he managed to grasp the key concepts intuitively. Incidentally, the famous geneticist and statistician R.A. Fisher analyzed Mendel's data many years later and concluded that they fit statistical expectation a bit too well. Mendel probably discarded some outliers as likely experimental errors. It was almost 35 years before biologists had an inkling of where these hypothetical genes resided in the cell (in the chromosomes) and almost 100 years before they understood their biochemical nature. MOLECULAR BIOLOGY As suggested in Figure 1.1, the biochemical and the genetic approaches were virtually disjoint: the biochemist primarily studied proteins, whereas the geneticist primarily studied genes. Much like the great unifications in mathematics, molecular biology emerged from the recognition that the two apparently unrelated fields were, in fact, complementary perspectives on the same subject. The first clues emerged from the study of mutant microorganisms in which gene defects rendered them unable to synthesize certain key macromolecules. Biochemical study of these genetic mutants showed that each lacked a specific enzyme. From these experiments the hypothesis became clear that genes somehow must "encode" enzymes. This (Nobel-Prize-winning) notion was dubbed the "one gene-one enzyme" hypothesis, although today it has been modified to "one
THE SECRETS OF LIFE: A MATHEMATICIAN'S INTRODUCTION TO MOLECULAR BIOLOGY 8 gene-one protein." Of course, the mystery remained: How do genes encode proteins? The answer depended on finding the biochemical nature of the gene itself, thereby uniting the fields. To purify the gene as a biochemical entity, one needed a test tube assay for heredityâsomething that might seem impossible. Fortunately, scientific serendipity provided a solution. In a famous series of bacteriological studies, Griffith showed 50 years ago that certain properties (such as pathogenicity) could be transferred from dead bacteria to live bacteria. Avery et al. (1944) were able to successively fractionate the dead bacteria so as to purify the elusive "transforming principle," the material that could confer new heredity on bacteria. The surprising conclusion was that the gene appeared to be made of DNA. The notion of DNA as the material of heredity came as a surprise to most biochemists. DNA was known to be a linear polymer of four building blocks called nucleotides (referred to as adenine, thymine, cytosine, and guanine, and abbreviated as A,T,C, and G) joined by a sugar-phosphate backbone. However, most knowledgeable scientists reckoned that the polymer was a boring, repetitive structural molecule that functioned as some sort of scaffold for more important components. In the days before computers, it was not apparent how a linear polymer might encode information. If DNA contained the genes, the structure of DNA became a key issue. In their legendary work in 1953, Watson and Crick correctly inferred the structure of most DNA and, in so doing, explained the main secret of heredity. While some viruses have single-stranded DNA, the DNA of humans and of most other forms of life consists of two antiparallel chains (strands) in the form of a double helix in which the bases (nucleotides) pair up to form base pairs in a certain way (Figure 1.5) so that the sequence of one chain completely specifies the sequence of the other: an A on one chain always corresponds to a T on the other, and a G to a C. The sequences are complementary. The fact that the information is redundant explains the basis for the replication of living organisms: the two strands of the double helix unwind, and each serves as a template for the synthesis of a complete double helix that is passed on to a daughter cell. This process of replication is carried out by enzymes called DNA polymerases. Mutations are changes in the nucleotide sequence in DNA. Mutations can be induced by external
THE SECRETS OF LIFE: A MATHEMATICIAN'S INTRODUCTION TO MOLECULAR BIOLOGY 9 forces such as sunlight and chemical agents or can occur as random copying errors during replication. Figure 1.5 The DNA double helix consists of anti-parallel helical strands, with complementary bases (G-C and A-T). There remained the question of how the 4-letter alphabet of DNA could "encode" the instructions for the 20- letter alphabet of protein sequences. Biochemical studies over the next decade showed that genes correspond to specific stretches of DNA along a chromosome (much like individual files on a hard disk). These stretches of DNA can be expressed at particular times or under particular circumstances. Typically, gene expression begins with transcription of the DNA sequence into a messenger molecule made of ribonucleic acid (RNA) (Figure 1.6A). This transcription process is carried out by enzymes called RNA polymerases. RNA is structurally similar to DNA and consists of four building blocks, the nucleotides denoted A, U, C, and G, with U (uracil) playing the role of T. The messenger RNA (mRNA) is copied from the DNA of a gene according to the usual base pairing rules (a U in RNA corresponds to an A in DNA, an A corresponds to a T, a G to a C, and a C to a G). The messenger RNA copied from a gene is single-stranded and is just an unstable intermediate used for transmitting information from the cell nucleus (where the DNA resides) to the cytoplasm (where protein synthesis occurs). The mRNA is then translated into a protein by a remarkable molecular machine called the ribosome.
THE SECRETS OF LIFE: A MATHEMATICIAN'S INTRODUCTION TO MOLECULAR BIOLOGY 10 Figure 1.6 After messenger RNA is transcribed from the DNA sequence of a gene, it is translated into protein by a remarkable molecular device called the ribosome. (A) Ribosomes read the RNA bases and write a corresponding amino acid sequence. The correct amino acid is brought into juxtaposition with the correct nucleotide triplet through the mediation of an adapter molecule known as transfer RNA. (B) The table showing the correspondence between triplets of bases and amino acids is called the genetic code. Reprinted from Recombinant DNA: A Short Course by Watson, Tooze, and Kurtz (1994). Copyright 1994 James D. Watson, John Tooze, and David T. Kurtz. Used with permission of W.H. Freeman and Company.
THE SECRETS OF LIFE: A MATHEMATICIAN'S INTRODUCTION TO MOLECULAR BIOLOGY 11 B FIRST POSITION SECOND POSITION THIRD POSITION (5' END) U C A G (3' END) Phe Ser Tyr Cys U U Phe Ser Tyr Cys C Leu Ser Stop Stop A Leu Ser Stop Trp G Leu Pro His Arg U C Leu Pro His Arg C Leu Pro Gln Arg A Leu Pro Gln Arg G Ile Thr Asn Ser U A Ile Thr Asn Ser C Ile Thr Lys Arg A Met Thr Lys Arg G Val Ala Asp Gly U G Val Ala Asp Gly C Val Ala Glu Gly A Val Ala Glu Gly G Note: Given the position of the bases in a codon, it is possible to find the corresponding amino acid. For example, the codon (5') AUG (3') on mRNA specifies methionine, whereas CAU specifies histidine. UAA, UAG, and UGA are termination signals. AUG is part of the initiation signal, and it codes for internal methionines as well.
THE SECRETS OF LIFE: A MATHEMATICIAN'S INTRODUCTION TO MOLECULAR BIOLOGY 12 The ribosome "reads" the linear sequence of the mRNA and "writes" (i.e., creates) a corresponding linear sequence of amino acids of the encoded protein. Translation is carried out according to a three-letter code: a group of three letters is a codon that specifies a particular amino acid according to a look-up table called the genetic code (Figure 1.6B). There are 43 different codons. The codons are read in contiguous, nonoverlapping fashion from a defined starting point, called the translational start site. Finally, the newly synthesized amino acid chain spontaneously folds into its three-dimensional structure. (For a recent discussion of protein folding, see Sali et al., 1994.) The details of the genetic code were solved by elegant biochemical tricks, which were necessary because chemists had only the ability to synthesize random collections of RNA having defined proportions of different bases. With some combinatorial reasoning, this proved to be sufficient. For example, if the ribosome is given an mRNA with the sequence UUUUU . . ., then it makes a protein chain consisting of only the amino acid phenylalanine (Phe). Thus UUU must encode phenylalanine. By examining more complex mixtures, researchers soon worked out the entire genetic code. Molecular biology provides the third leg of the triangle, relating genetics and biochemistry (Figure 1.7). Figure 1.7 Molecular biology connected the disciplines of genetics and biochemistry by showing how genes encoded proteins.