Page 179

Chapter 7—
Unwinding the Double Helix:
Using Differential Mechanics to Probe Conformational Changes in DNA

Craig J. Benham
Mount Sinai School of Medicine

The two strands of DNA are usually bound together in a double helix. However, many key biological processes including DNA replication and gene expression-require unwinding of the double helix. Such unwinding requires the input of energy, a large part of which is stored in the form of supercoiling of a chromosome or chromosomal region. Given a supercoiled DNA molecule, where along its sequence will unwinding occur? In this chapter, the author shows how basic principles of statistical mechanics—together with some delicate numerical estimate—scan be applied to predict the sites of supercoil-induced unwinding. The mathematical predictions are abundantly confirmed by experimental data and, when applied to new situations, they suggest novel insights about gene regulation.

Deoxyribonucleic acid (DNA) usually occurs in the familiar Watson-Crick B-form double helix, in which the two strands of the DNA duplex are held together by hydrogen bonds between their complementary bases. Many important biological processes, however, involve separating the strands of the DNA duplex in order to gain access to the information encoded in the sequence of bases within individual strands. In transcription, the first step in gene expression, the DNA base pairing within the gene must be temporarily disrupted to allow an RNA molecule with a sequence complementary to one of the strands of the gene to be constructed. In DNA replication, the two original strands of a parent DNA molecule replicate to form two complete molecules, with each strand serving as a template for the synthesis of its complement. To



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 179
Page 179 Chapter 7— Unwinding the Double Helix: Using Differential Mechanics to Probe Conformational Changes in DNA Craig J. Benham Mount Sinai School of Medicine The two strands of DNA are usually bound together in a double helix. However, many key biological processes including DNA replication and gene expression-require unwinding of the double helix. Such unwinding requires the input of energy, a large part of which is stored in the form of supercoiling of a chromosome or chromosomal region. Given a supercoiled DNA molecule, where along its sequence will unwinding occur? In this chapter, the author shows how basic principles of statistical mechanics—together with some delicate numerical estimate—scan be applied to predict the sites of supercoil-induced unwinding. The mathematical predictions are abundantly confirmed by experimental data and, when applied to new situations, they suggest novel insights about gene regulation. Deoxyribonucleic acid (DNA) usually occurs in the familiar Watson-Crick B-form double helix, in which the two strands of the DNA duplex are held together by hydrogen bonds between their complementary bases. Many important biological processes, however, involve separating the strands of the DNA duplex in order to gain access to the information encoded in the sequence of bases within individual strands. In transcription, the first step in gene expression, the DNA base pairing within the gene must be temporarily disrupted to allow an RNA molecule with a sequence complementary to one of the strands of the gene to be constructed. In DNA replication, the two original strands of a parent DNA molecule replicate to form two complete molecules, with each strand serving as a template for the synthesis of its complement. To

OCR for page 179
Page 180 accomplish this, the strands of the parent molecule must separate to provide access to these templates. The regulation of important physiological processes is extremely precise and complex. In addition to many other layers of control, the strand separations required for specific functions must be carefully regulated to occur at the precise positions needed for each activity, and only at times when that activity is to be initiated. Because DNA prefers to remain in the B-form under normal conditions, strand separation requires the expenditure of (free) energy. The energy required for strand separation depends upon the sequence of base pairs being separated. Because A·T base pairs are held by only two hydrogen bonds whereas G·C pairs are held by three, it is energetically less costly to separate the former pairs than the latter. For this reason, strand separations tend to be concentrated in A+T-rich regions of the DNA. As we will see in this chapter, this provides the sequence dependence necessary to control the sites of separation. Controlling the occurrence of separations can be accomplished by modulating the amount of energy stored in the DNA molecule itself. This is done by changing the topological constraints on the molecule. DNA in living organisms is topologically constrained into domains within which the linking number is fixed. Enzymes can change this linking number, placing the DNA in a higher energy state in which pure B-form DNA is less favored and partial strand separation is thermodynamically more achievable. (The topology and geometry of superhelicity, which is the jargon name for this process, have been described by White in Chapter 6.) In order to illuminate the role of strand separation in DNA functions, one needs accurate theoretical methods for predicting how a particular DNA sequence will behave as its linking number is varied. This chapter describes methods that have been developed to make such predictions. The results of sample calculations are shown, and the insights that they provide regarding specific DNA activities are sketched. The global and topological nature of the constraints imposed on DNA causes behavior that exhibits many unusual and surprising features.

OCR for page 179
Page 181 DNA Superhelicity— Mathematics and Biology DNA in living cells is held in topological domains whose linking numbers can be individually regulated. In practice there are two types of domains. Small DNA molecules can occur as closed circles, whereas larger DNA molecules are formed into a series of loops by periodic attachments to a protein scaffold in a way that precludes local rotations at the attachment site. This arrangement constrains the portion of DNA between adjacent attachment sites to be a topological domain analogous to a closed circle. For simplicity we consider a closed circular duplex DNA molecule as the paradigm of the topological domain. (Closed circles are also the molecules of choice for experiments in this field.) The two strands that make up the DNA duplex each have a chemical orientation induced by the directionality of the bonds that join neighbor bases. This is called the 5'-3' orientation because each phosphate group in a strand joins the 5' carbon of one sugar to the 3' carbon of the next. This orientation must be the same for every phosphate group within a strand, which imparts a directionality to the strand as a whole. The two strands of the B-form duplex are oriented so their 5'-3' directions are antiparallel. In consequence, a duplex DNA molecule can be closed into a circle only by joining together the ends of each individual strand. Circularization by joining the ends of one strand to those of the other to form a Möbius strip is forbidden because the bonds required would violate the conservation of 5'-3' directionality. Hence a closed circular DNA molecule is composed of two interlinked, circular (antiparallel) strands. Circularization fixes the linking number of the resulting molecule; the linking number is the number of times that either strand links through the closed circle formed by the other strand. (Topological domains formed by periodic attachments have a functionally equivalent constraint.) The fixing of the linking number Lk within a topological domain provides a global constraint that topologically couples its secondary and tertiary structures according to White's (1988) formula Lk = Tw+ Wr.                              (7.1)

OCR for page 179
Page 182 Although Lk is fixed in a topological domain, both Tw and Wr may still vary, provided they do so in a complementary manner. Cutting one DNA strand in a domain releases the topological constraint of constant Lk, allowing it to find its most relaxed state. The two resulting ends may rotate freely, relaxing any torsional deformation imposed on the molecule. Writhing deformations can be converted to twist and then removed by this rotational relaxation. The sum of the twist and writhe in this relaxed state determines a relaxed linking number Lk0. Note that, while the linking number Lk of a circular DNA molecule must be an integer, the relaxed linking number Lk0 need not be integral. Stresses are imposed on a topological domain whenever its linking number Lk differs from the relaxed value. The resulting linking difference a = Lk - Lk0 must be accommodated by twisting and/or writhing deformations: a = D Tw + D Wr.                                       (7.2)  Topological domains in living systems are commonly found in a negatively superhelical state, in which the imposed linking number is smaller than its relaxed value, so a < 0. Negative superhelicity provides a mechanism for driving strand separation. Because the separated strands are less twisted than the B-form, they localize some of the linking deficiency as a decrease of twist at the transition site, thereby allowing the rest of the domain to relax a corresponding amount. Since strand separations require energy, they are disfavored in unconstrained or relaxed molecules. However, in a negatively superhelical domain, local strand separations are energetically favored to occur at equilibrium whenever the topological strain energy that is thus relieved exceeds the energetic cost of locally disrupting the base pairing between strands. The linking differences imposed on topological domains in vivo are carefully regulated. Virtually all organisms produce enzymes that alter Lk through the introduction of transient strand breaks (Gellert, 1981). The action of these molecules maintains topological domains in negatively superhelical, underlinked states (i.e., a <0). On average, bacteria and other primitive organisms maintain approximately half their domains in a superhelical state. Moreover, the amount of superhelicity imposed on DNA in vivo is known to vary with the cell division cycle in

OCR for page 179
Page 183 a carefully regulated manner (Dorman et al., 1988). The extent of superhelicity also varies in response to environmental changes (Bhriain et al., 1989; Malkhosyan et al., 1991). In multicelled organisms, superhelicity occurs primarily within domains containing actively expressing genes. The DNA within malignant cancer cells is maintained at more extreme negative linking differences than that characterizing the corresponding DNA in normal cells (Hartwig et al., 1981). Many important regulatory events are sensitive to the degree of superhelical stress imposed on the DNA. These include the initiation of gene expression (Smith, 1981; Pruss and Drlica, 1989; Weintraub et al., 1986) and of DNA replication (Kowalski and Eddy, 1989; Mattern and Painter, 1979). Substantial evidence suggests that superhelically driven strand separations may be involved in these processes. One wellcharacterized case occurs at the origin of DNA replication of the bacterium E. coli (Kowalski and Eddy, 1989). The DNA sequence at this origin site contains a triple repeat of an A+T-rich run of 13 base pairs that is required for the initiation of DNA replication. Deletion and substitution experiments have shown that the key functional attribute of this sequence is its susceptibility to superhelical strand separation. DNA sequence changes at this site that retain this attribute preserve its ability to initiate replication in vivo; DNA sequence changes that degrade this susceptibility destroy in vivo origin function. No other sequence specificity is observed. Such sequences are called duplex unwinding elements (DUEs) and are present at origins of DNA replication in many organisms (Umek et al., 1989). Superhelicity also is known to modulate the expression of some genes. In bacteria, superhelicity regulates the expression of the so-called SOS system, a suite of genes that are activated in response to environmental stresses or DNA damage. The bacterial response to deleterious environmental changes is to increase the superhelicity of its DNA, which activates expression of the SOS genes (Bhriain et al., 1989; Malkhosyan et al., 1991). Experimental (Kowalski et al., 1988) and theoretical (Benham, 1990) results indicate that the susceptibility of some DNA molecules to superhelical strand separation is confined to sites that bracket specific genes. This suggests that there may be at least two classes of genes, distinguishable by their sensitivities to superhelical separation, whose mechanisms of operation may be different.

OCR for page 179
Page 184 Strand separation in living organisms frequently arises through interactive processes, in which local superhelical destabilization of the B-form acts in concert with other factors. Biological systems may exploit marginal decreases in the stability of the B-form that occur at discrete sites in superhelical molecules. For example, consider an enzyme that functions by recognizing a particular sequence and inducing separation there. It might be energetically able to induce the transition only if the B-form already is marginally destabilized at that site. This suggests that superhelical helix destabilization also can regulate biological processes through mechanisms that need not involve preexisting separations. For this reason it is important also to develop methods to predict sites where superhelicity marginally destabilizes the duplex. Statement of the Problem This chapter develops methods to predict the strand separation and helix destabilization experienced by a specified DNA sequence when superhelically stressed. We will focus specifically on predictions regarding several plasmids (that is, circular DNA molecules) that have been engineered to include the E. coli replication origin or variants thereof. This is done because experimental information is available regarding superhelical strand separation in these molecules. In principle, the analysis of conformational equilibria is quite direct. Because every base pair can separate, there are many possible states of strand separation available to a topologically constrained DNA molecule. By basic statistical mechanics, a population of identical molecules at equilibrium will be distributed among its accessible states according to Boltzmann's law. If these states are indexed by i, and if the free energy of state i is Gi, then the equilibrium probability pi of a molecule being in state i, which is the fractional occupancy of that state in a population at equilibrium, equals                                    (7.3) Here Z is the so-called partition function, given by

OCR for page 179
Page 185                                   (7.4) where R is the gas constant and T is the absolute temperature. Thus the fractional occupancies of individual states at equilibrium decrease exponentially as their free energies increase. If a parameter z has value zi in state i, then its population average, that is, its expected value at equilibrium, is                                                (7.5) This expression can be used to evaluate any equilibrium property of interest, once the governing partition function is known. The application of this approach to the rigorous analysis of conformational equilibria of superhelical DNA molecules is complicated by three factors. First, the number of the states involved is extremely large. Every base pair can be separated or unseparated, so specification of a state of a molecule containing N base pairs involves making N binary decisions. This yields a total of 2N distinct states of strand separation. This precludes the use of exact methods, in which all states are enumerated, to analyze molecules of biological interest, as these commonly have lengths exceeding 1,000 base pairs. Most DNAs have sites whose local sequences permit transitions to other conformations in addition to separation, further increasing the number of conformational states. Second, because the free energy needed to transform a base pair to an alternative conformation depends on the identity of the base pair involved, the analysis of equilibria must examine the specific sequence of bases in the molecule. This precludes several possible strategies for performing approximate analyses, including combinatorial methods that assume transition energetics to be the same for all base pairs, or that average the base composition of blocks. Third, and most importantly, the global and topological character of the superhelical constraint means that the conformations of all base pairs in the molecule are coupled together. Separation of a particular base pair alters its helicity, which changes the distribution of Tw, and hence of a, throughout the domain. This in turn affects the probability of transition of every other base pair. Whether transition occurs at a particular site depends not just on its local sequence, but also on how effectively this transition competes with all other

OCR for page 179
Page 186 alternatives. Thus, separations at particular sites can be analyzed only in the context of the entire molecule. Divide-and-conquer strategies, in which the sequence is partitioned into blocks that are individually analyzed, are thus not feasible. Superhelical transitions must be analyzed as global events, including simultaneous competitions among all possible transitions. This renders the accurate analysis of superhelical transitions extremely difficult. It is not feasible to perform exact analyses of all states for the kilobase-length, topologically constrained molecules of biological importance, because the number of states grows exponentially. On the other hand, it is not enough to look only at the lowest energy states. Confining attention to the minimum-energy state provides a very poor depiction of transition behavior. Although any individual high-energy state is exponentially less populated, there are so many high-energy states that cumulatively they can dominate the minimum-energy state. The development of accurate methods to treat superhelical strand separation requires an intermediate approach (Benham, 1990). First, enough low-energy states must be treated exactly to provide an accurate depiction of the transition. Then the cumulative influence of the neglected, high-energy states must be estimated. Wherever possible, computed parameter values must be refined by the insertion of correction terms that account for the approximate influence of the neglected states. This is the strategy we adopt below. The Energetics of a State A superhelical linking difference a imposed on a DNA molecule can be accommodated by three types of deformation, each of which requires free energy. First, strand separations can occur. Second, the single strands in the separated regions can twist around each other, thereby absorbing some of the linking difference. Third, the portion of a not accommodated by these alterations imposes superhelical deformations on the balance of the molecule. Each of these deformations requires free energy that can be described by some simple formulas. Opening each new region of strand separation requires a free energy a, while separating each individual base pair within a region takes free energy bATor bGC, depending on the

OCR for page 179
Page 187 identity of the base pair. In practice, a >> bGC> bAT.Because the initiation free energy a is large, low-energy states tend to have only a small number of runs of strand separation. Because bGC is larger than bAT, these runs tend to be in A+T-rich regions. The free energy of interstrand twisting within separated regions is quadratic in the local helicity of the deformation, with coefficient denoted by C. The free energy of residual superhelicity has been measured experimentally to be quadratic in that deformation, with coefficient K. Combining these contributions (and allowing the interstrand twisting to equilibrate with the residual superhelicity), the free energy G of a state is found to depend on three parameters: the number n of separated base pairs, the number nATof these that are A·Ts, and the number r of runs of separation:                   (7.6) The energy parameters in this expression, a, bAT, bGC, C, and K, all depend on environmental conditions such as salt concentration and temperature. The values of the b's are known experimentally under a wide variety of conditions (Marmur and Doty, 1962; Schildkraut and Lifson, 1968). However, values for the other parameters are not so well understood. These parameters must be evaluated before the methods can yield quantitatively accurate results. We will do this by fitting these parameters to actual experimental data. Analysis of Superhelical Equilibria To calculate the equilibrium strand separation behavior of superhelical DNA molecules, we proceed as follows. First, the DNA sequence is analyzed and key information needed for later stages is stored. This step need be done only once per sequence. Next, the linking difference a and environmental conditions are specified, which sets the energy parameters and determines the free energy associated with each state. The state having minimum free energy under the given conditions

OCR for page 179
Page 188 is found from the free energy expression and the sequence data. Then an energy threshold q is specified, and all states i are found that have free energy exceeding the minimum Gmin by no more than this threshold amount. Three inequalities occur, one each for n, nAT, and r. Together the satisfaction of all three inequalities provides necessary and sufficient conditions that a state satisfy the energy threshold condition. For every set of values n, nAT, and r satisfying these inequalities, all states with these values are found from the sequence information. This is a very complex computational task. The number of states involved grows approximately exponentially with the threshold q. In cases where r > 1, care must be taken to verify that a collection of r runs having the requisite total length and A+T-richness neither overlap nor abut, but rather are distinct. An approximate partition function Zcal is computed from this collection of low-energy states to be                                         (7.7) By focusing only on the low-energy states, approximate ensemble average (that is, equilibrium) values are computed for all parameters of interest. These may include the expected torsional deformation of the strand-separated regions, expected numbers of separated base pairs, of separated A·T pairs, and of runs of separation, the ensemble average free energy , and the residual superhelicity. The most informative quantities regarding the behavior of the molecule are its destabilization and transition profiles. The transition profile displays the probability of separation of each base pair in the molecule. The separation probability p(x) of the base pair at position x is calculated from equation (7.5) using parameter zx, where zx = 1 in states where base pair x is separated and zx = 0 in all other states. This calculation is performed for every base pair in the sequence. The transition profile displays p(x) as a function of x. The destabilization profile is the incremental free energy needed to induce separation at each base pair. To calculate this quantity, let i(x) index the states in which the base pair at position x is separated. Then the average free energy of all such states is

OCR for page 179
Page 189                                (7.8) To determine the destabilization free energy G(x), we normalize by subtracting the calculated equilibrium free energy:                                                     (7.9) Base pairs that require incremental free energy to separate at equilibrium have G(x) > 0, while base pairs that are energetically favored to separate at equilibrium have G(x) £ 0. This calculation is performed for each base pair in the molecule, and the destabilization profile plots G(x) versus x. Examples of these profiles are given in Figure 7.1. Although individual high-energy states are exponentially less populated than low-energy states at equilibrium, they are so numerous that their cumulative contribution to the equilibrium still may be significant. The next step in this calculation requires estimating the aggregate influence of the states that were excluded from the above analysis because their free energies exceeded the threshold. This involves estimating the contribution Z(n,nAT,r) to the partition function from all states whose values n,nAT, and r do not satisfy the threshold condition. Here                 (7.10) where M(n,r) is the number of states with n separated base pairs in r runs, which for a circular domain is                                          (7.11) The only part of Z(n,nAT,r) not amenable to exact determination is Pn,r(nAT), the fraction of (n,r)-states that have exactly nATseparated A·T base pairs.

OCR for page 179
Page 191 is monotonically decreasing with r. One can find an such that T=p<1. Then for all r ³ we have Tr £r, so that                           (7.13) By a similar line of reasoning one also can find a value , above which the aggregate contribution to the partition function again is bounded above by a convergent geometric series:                              (7.14) In practice, low values ( » 0.1) for the series ratios r and s occur at reasonably small cutoffs ( » 8, » 150 for a molecule of N = 5,000 base pairs under reasonable environmental conditions). The contribution of the intermediate states having n £ and r £ but not satisfying the threshold requires estimating pn,rnAT, the fraction of n,r-states that have exactly nATseparated A·T base pairs. Although in principle one can compute this quantity from the base sequence, for molecules of kilobase lengths it is feasible to compute only the exact distribution of A+T-richness in r=I run states having n £ nmax » 200. Experience has shown that high accuracy is obtained by calculating pn,2nAT exactly, and using pn,2nAT as an estimate of pn,rnAT for r> 2. Once the sequence information needed in this step has been found (a calculation that need be performed only once per molecule), the performance of the rest of this refinement is computationally very fast. These results are used to estimate the contribution neg to the partition function from the neglected, high-energy states:                                       (7.15) (Here the carat marks denote approximate values.) Any parameter z that depends only on n, nAT, or r also can have its previously calculated

OCR for page 179
Page 192 approximate equilibrium value corrected for the estimated effects of all neglected, high-energy states:                         (7.16) Examples of correctable parameters include the population-averaged values of the total numbers of separated base pairs, runs of transition, and separated A·T pairs. The only important quantities that cannot be refined in this way are the transition and destabilization profiles, because their calculation involves positional information. However, their accuracy can be estimated by comparing the corrected ensemble average number of separated base pairs with its (uncorrected) value that is computed as the sum of the probabilities of separation for all base pairs in the sequence. In this way the accuracy of the profiles calculated with any specified threshold can be assessed. This allows the threshold to be chosen to give any required degree of accuracy. In practice accuracies exceeding 99 percent are feasible at physiological temperatures, even for highly supercoiled molecules. Evaluation of Free-Energy Parameters Before these techniques can yield quantitatively precise calculations, accurate values must be known for the energy parameters. Only the separation energetics bAT and bGC have been accurately measured under a wide range of environmental conditions (Marmur and Doty, 1962; Schildkraut and Lifson, 1968). The other parameters (the quadratic coefficient K governing residual linking, the cooperativity free energy a, and the coefficient C governing interstrand twisting of strand-separated DNA) are known only for a restricted range of molecules and environmental conditions. The theoretical methods described above can be used to determine the best fitting values of the unknown parameters based on the analysis of experimental data on superhelical strand separation. Allowing the parameters

OCR for page 179
Page 193 to vary within reasonable ranges, the analyses are repeated, and the set of values is found for which the computed transition properties best fit the experimental data (Benham, 1992). Application of this method to data on strand separation in pBR322 DNA at [Na+] = 0.01 M, T = 310 K finds a unique optimum fit when K  = 2350 ± 80  RT/N, a = 10.84 ± 0.2 kcal, and C= 2.5 ± 0.3 ´ 10-13erg-nt/rad2. Extensive sample calculations of strand separations in superhelical DNA have been performed using these energy parameters (Benham, 1992). As described above, substantial amounts of free energy are required to drive strand separation. In consequence, this transition is favored only when the DNA is significantly supercoiled. This is shown in Figure 7.2, where the solid line depicts the probability of strand separation in pBR322 DNA (N = 4,363 base pairs) as a function of imposed negative superhelicity under low-salt conditions. The dashed line gives the ensemble average number of strand-separated base pairs as a function of -a. Separation occurs only when the linking difference satisfies a £ -18 turns and is confined to the terminator (3,200 to 3,300) and promoter (4,100 to 4,200) regions of one particular gene, as shown in Figure 7.1 above. These results are in precise agreement with experiment (Kowalski et al., 1988). Figure 7.2 The onset of strand separation in pBR322 DNA.

OCR for page 179
Page 194 Accuracy of the Calculated Results Once the energetics governing transition under specific environmental conditions have been fit based on the transition behavior of one sequence, the accuracy of the analytical methods can be assessed by comparing their predictions with experimental results on other molecules (Benham, 1992). We did this for six DNA molecules synthesized by David Kowalski, a biochemist studying the role of strand separation in initiating replication (Kowalski and Eddy, 1989). Starting from a parent DNA molecule pORIC, Kowalski made various modifications. pDEL16 has a 16-base pair deletion from the replication origin site of pORIC. pAT105 and pGC91 were made by inserting an A+T-rich 105-base pair segment and a G+C-rich 91-base pair segment, respectively, into the deletion site of pDEL16. pAT1051 and pGC911 have the same insertions, but placed in reverse orientation. The complete DNA sequences of these plasmids were provided to the author by Dr. Kowalski (private communication). The transition profiles of these molecules were calculated using the energetics appropriate to the experimental conditions, which were the same as in the pBR322 experiments from which the energy parameters were derived. Figure 7.3 shows the computed transition profiles around the duplex unwinding element (DUE) of the origin site for the four plasmids of greatest interest. The region where strand separation was detected experimentally is shown by a double line in each case. Less separation was detected experimentally at this location in the pORIC plasmid than in the other two transforming molecules, and none was detected in pDEL16 or in the other two molecules whose profiles are not shown in the figure. These experimental results are in close agreement with the present predictions. In fact, the agreement may be even better than the figure indicates. Because the experimental method detects separation only in the interiors of open regions, the actual separated sites are slightly larger than what the experiment detects. These results show that the present methods for analyzing superhelical strand separation are highly accurate. The extensive variations in the locations of separated regions that result from minor sequence alterations are precisely depicted. The relative amounts of transition at each site also agree closely with experiment. The superhelicity required to drive a specific amount of separation is within 7 percent of the observed value, which reflects the limit of accuracy with which extents of transition are

OCR for page 179
Page 195 Figure 7.3 The transition profiles at DUE sequences. Reprinted, by permission, from Benham (1992). Copyright © 1992 by Academic Press Limited. measured in these experiments. This demonstrates that these analytical methods provide highly precise predictions of the details of strand separation in superhelical molecules. Applying the Method to Study Interesting Genes Having developed a method and confirmed its accuracy on test molecules, we can now apply it to study any DNA sequence of interest. It turns out to be particularly illuminating to examine the association between sites of superhelical destabilization and sites of gene regulation

OCR for page 179
Page 196 (Benham, 1993). Our calculations show some striking correlations involving sites for initiation of transcription, termination of transcription, initiation of DNA replication, and binding of repressor proteins. We find that some bacterial genes show superhelical destabilization at the sites where gene expression starts and the sites where it ends. One gene on the pBR322 DNA molecule (from which the data in Figure 7.1 came) and one on the ColE1 plasmid (from which the data in Figure 7.4a came) are bracketed by such sites, suggesting that their expression is regulated by the state of DNA supercoiling. And, indeed, experiments show that these bracketed genes are expressed at higher rates when their DNA is superhelical than when it is relaxed. The other genes on these molecules show no such destabilized regions. This result suggests that genes in bacterial DNA can be partitioned into two categories, depending on whether or not they are bracketed by superhelically destabilized regulatory regions. In a similar vein, we have analyzed the DNA sequences of two mammalian viruses, the polyoma and papilloma viruses, each of which can cause cancer. The most destabilized locations on these molecules occur precisely at the places where gene expression terminates, the so-called polyadenylation sites. The two most destabilized sites in the polyoma genome occur at the major (M) and minor (m) poly-adenylation sites, as shown in Figure 7.4b. Of the three most destabilized sites in the papilloma virus genome, two occur at known poly-adenylation sites for transcription from the direct strand. The other occurs at a location having the sequence attributes of a poly-adenylation site for transcription from the complementary strand. (This observation raises the intriguing possibility that the complementary strand of this molecule could transcribe, an event that has not been observed to date.) The strong association found between destabilized sites and the beginnings and ends of genes suggests that destabilization may play roles in their functioning. Many possible scenarios can be suggested for how this could occur. Clearly, destabilization at a gene promoter could facilitate the start of transcription by assisting the formation of a complex between the single strand to be transcribed and the enzyme complex that constructs the RNA transcript. What about destabilization at the sites where gene expression is completed (terminators in bacteria and polyadenylation sites in higher organisms)? In this case, a likely but subtler role can be suggested. The moving transcription apparatus is thought to

OCR for page 179
Page 197 Figure 7.4 The helix destabilization profiles of the circular molecules (top) ColEI plasmid DNA and (bottom) polyoma virus DNA. The locations of the promoter (P) and terminators (T1 and T2) of the bracketed transcription unit of ColEI are indicated. In polyoma the control region (denoted by a bar), replication origin (OR), and the major (M) and minor (m) poly-adenylation sites are shown. Reprinted, by permission, from Benham (1993). Copyright © 1993 by the National Academy of Sciences. push a wave of positive supercoils ahead and leave a wake of negative supercoils behind (Wu et al., 1988). A region of strand separation constitutes a localized concentration of negative superhelicity, due to the large decrease in twist that occurs. This could provide a sink for the positive supercoils generated by an approaching complex, preventing the accumulation of twisting and bending deformations that otherwise could impede its progress. This would facilitate efficient transcription of the gene involved. The wake of negative supercoils left behind could destabilize the promoter region in preparation for the next round of expression. This model could explain why terminal regions are the most

OCR for page 179
Page 198 destabilized sites found, and why some genes are bracketed by destabilized sites. DNA replication is another process for which it is interesting to study the correlation with superhelical destabilization. In the plasmids pBR322 and ColEl, replication is started by an RNA primer, which displaces one strand of the DNA at the replication origin by base pairing to the strand having a complementary sequence. The origin sites on these molecules are not destabilized by superhelicity, suggesting that the displacement event does not require a destabilized or separated site. By contrast, replication of the DNA of phage fl (a virus that attacks bacteria) involves enzymatic cutting of one strand that is known to require DNA superhelicity. If one role of DNA superhelicity is to promote strand separation, one would expect to find highly destabilized sites abutting the origin of replication. In fact, the calculations show precisely this, providing strong support for the assumption. Interesting results also emerge from the study of genes involved in the SOS response system in the bacterium E. coli. The SOS response system, as its name suggests, is a collection of genes that are turned on when the organism experiences any of a variety of serious problems, ranging from environmental stresses to DNA damage. These genes are usually turned off by the binding to their promoters of a repressor protein called LexA, which blocks transcription. Another protein, called RecA, plays a key role in initiating the SOS response by causing the removal of the LexA repressor, thereby allowing transcription. As it happens, the ColEl-encoded gene discussed above that was bracketed by superhelical destabilization sites is a member of the SOS response system. What about other SOS response genes? Do they also show superhelically destabilized regions? To address this question, we examined every known SOS response gene whose DNA sequence was available. In every case, the LexA binding site was contained in a strongly destabilized region. We note that this binding site is 16 base pairs long. Although it is reasonably A+T-rich, this is not long enough for its presence alone to assure destabilization. It is not hard to speculate on the function of these superhelical destabilization regions in the SOS response. The RecA protein is known to bind single-stranded DNA. If the SOS response is marshaled against DNA damage, the damaged region provides single-stranded DNA for RecA binding. The environmental stresses that activate the SOS

OCR for page 179
Page 199 response system cause an increase in DNA superhelicity (Bhriain et al., 1989), which can provide strand-separated regions near the LexA binding sites that allow RecA to bind. In addition to those sketched here, many other possible roles for superhelically destabilized regions can be suggested. For example, destabilization of one site in a molecule could protect other sites from separating that must remain in the duplex form to function. Discussion and Open Problems This chapter has described how DNA sequences can be analyzed to determine one biologically important attribute—the relative susceptibility of regions in the molecule to superhelical destabilization. The last section indicated how correlations between destabilized sites and DNA regulatory regions illuminate the mechanisms of activity of such regions. This work has many other possible uses, one of which is sketched here. Correlations of the type noted above between superhelically destabilized sites and regulatory regions can be used in searching DNA sequences for those regions. Most commonly available strategies search DNA sequences for short subsequences (that is, strings) whose presence correlates with a particular activity. So-called TATA boxes are present at promoters, for example, while poly-adenylation occurs near AATAAA sites. Sequence signatures are known for terminators and for several other types of regulatory sites. This string-search approach is possible because the enzymes involved with particular functions usually have either specific or consensus sequence requirements for activity. In most cases these string-search methods find large numbers of candidate sites having the sequence characteristics necessary for function. Among these, commonly only a small number of sites actually are active. The strong associations documented here between destabilized sites and particular types of regulatory regions suggest that this attribute also could be used to search genomic sequences for those regions. This would supplement existing string methods, providing more accurate predictions. For example, the bovine papilloma virus DNA sequence contains 9 sites having the AATAAA sequence needed for poly-adenylation, of which only 2 are known to be active. The most destabilized sites on the molecule contain 6 of these

OCR for page 179
Page 200 signal sequences, several of which are very close together, including both known poly-adenylation sites. As a second example, a search of all known E. coli sequences finds more than 100 locations having the sequence associated with LexA binding. Analysis of which of these sites are destabilized could suggest whether some might be promoters for previously unrecognized SOSregulated genes. The transition behavior of stressed DNA molecules can be complicated by several additional factors. First, there are other types of transitions possible for specific sequences within a DNA molecule. For example, sequences in which a purine (A or G) alternates with a pyrimidine (C or T) along each strand can adopt a left-handed helical structure. Transitions to this and to other alternative conformations also can be driven by imposed superhelicity. So the equilibrium experienced by a stressed molecule actually involves competition among several types of transitions, not just strand separation. Because these other conformations usually are possible only at a small number of short sites having the correct sequence, their analysis is combinatorially simpler than the treatment of strand separation. The theoretical methods described here are currently being extended to include the possible occurrence of other types of transitions. The second complication arises from the structural restraints on DNA in cells. There the DNA is not free to twist and writhe to minimize its energy, but instead is wound around basic proteins to form a chromatin fiber. This drastically alters the types of deformations the molecule can undergo. While it is not clear precisely how this constraint interacts with superhelicity, conformational transitions are expected to be driven by less extreme deformations in restrained molecules than in unrestrained ones (Benham, 1987). The approach outlined here has great promise for finding biologically important correlates of regulation and for illuminating specific mechanisms of function. References Benham, C.J., 1987, ''The influence of tertiary structural restraints on conformational transitions in superhelical DNA," Nucleic Acids Res. 15, 9985-9995. Benham, C.J., 1990, "Theoretical analysis of heteropolymeric transitions in superhelical DNA molecules of specified sequence," J. Chem. Phys. 92, 6294-6305.

OCR for page 179
Page 201 Benham, C.J., 1992, "Energetics of the strand separation transition in superhelical DNA," Journal of Molecular Biology 225, 835-847. Benham, C.J., 1993, "Sites of predicted stress-induced DNA duplex destabilization occur preferentially at regulatory loci," Proceedings of the National Academy of Sciences USA 90, 2999-3003. Bhriain, N. Ni, C. Dorman, and C. Higgins, 1989, "An overlap between osmotic and anaerobic stress responses: A potential role for DNA supercoiling in the coordinate regulation of gene expression," Mol. Microbiol. 3, 933-942. Dorman, C., G. Barr, N. Ni Bhriain, and C.F. Higgins, 1988, "DNA supercoiling and the anaerobic and growth phase regulation of tonB gene expression," J. Bacteriol. 170, 2816-2826. Gellert, M., 1981, "DNA topoisomerases," Annu. Rev. Biochem. 50, 879-910. Hartwig, M., E. Matthes, and W. Arnold, 1981, "Extremely underwound chromosomal DNA in nucleoids of mouse sarcoma cells," Cancer Letters 13, 153-158. Kowalski, D., and M. Eddy, 1989, "The DNA unwinding element: A novel cis-acting component that facilitates opening of the Escherichia coli replication origin," EMBOJ. 8, 4335-4344. Kowalski, D., D. Natale, and M. Eddy, 1988, "Stable DNA unwinding, not breathing, accounts for single-strand specific nuclease hypersensitivity of specific A+T-rich regions," Proceedings of the National Academy of Sciences USA 85, 9464-9468. Malkhosyan, S., Y. Panchenko, and A. Rekesh, 1991, "A physiological role for DNA supercoiling in the anaerobic regulation of colicin gene expression," Mol. Gen. Genet. 225, 342-345. Marmur, J., and P. Doty, 1962, "Determination of the base composition of deoxyribonucleic acid from its thermal denaturation temperature," Journal of Molecular Biology 5, 109-118. Mattern, M., and R. Painter, 1979, "Dependence of mammalian DNA replication on DNA supercoiling," Biochim. Biophys. Acta 563, 293-305. Pruss, G., and K. Drlica, 1989, "DNA supercoiling and prokaryotic transcription," Cell 56, 521-523. Schildkraut, C., and S. Lifson, 1968, "Dependence of the melting temperature of DNA on salt concentration," Biopolymers 3, 195-208. Smith, G., 1981, "DNA supercoiling: Another level for regulating gene expression," Cell 24, 599-600. Umek, R., M. Linskens, D. Kowalski, and J. Huberman, 1989, "New beginnings in studies of eucaryotic DNA replication origins," Biochem. Biophys. Acta 1007, 1-14. Weintraub, H., P. Cheng, and K. Conrad, 1986, "Expression of transfected DNA depends on DNA topology," Cell 46, 115-122. White, J.H., 1988, "An introduction to the geometry and topology of DNA structure," pp. 225-254 in Mathematical Methods for DNA Sequences, M.S. Waterman (ed.), Boca Raton, Fla.: CRC Press. Wu, H., S. Shyy, J.C. Wang, and L.F. Liu, 1988, "Transcription generates positively and negatively supercoiled domains in the template," Cell 53, 433-440.