Click for next page ( 33


The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 32
4 Secondary Structure of Proteins and Nucleic Acids PROTEINS The secondary structural features of proteins can be grouped into three broad classes: helical features, extended strands, and turns or loops. The most commonly seen helices are the so-called alpha helices, which were first described by Pauling and CoreY (Pauling et al., 1951~. Minor forms include the .3 ~ n ~nr1 ni Ah; 2 ~ ~ ~.~ a_ _ ~ r ~ ~ ~" ~l-ucl,ure~ are Iormea Irom hydrogen bonding between the backbone amides of extended polypeptide strands. Although such features are properly considered tertiary structure' they are often discussed as secondary structure. They are named according to whether the strand pairs run parallel or antiparallel to each other (based on a vector drawn from the N to C termini of the feature) and whether the sheets are folded or rolled into a barrel (Richard- son, 1981~. Turns are much less regular (Rose et al., 1985~. They are characterized according to local geometric features. Prediction of secondary structure has been of interest since the first protein structures were determined. Accurate secondary structure prediction is one direct approach to the development of a tertiary prediction algorithm. The methods used to predict secondary structure from amino acid sequence have been (1) calculation of the energies of the 32

OCR for page 32
33 major conformers for a given sequence; (2) statistical analysis of known structures; and (3) modeling. It is now computationally feasible to calculate energies for conformations of short peptides with or without solvent. However, this is not yet a definite method of predicting secondary structure in proteins because the energy differences among conformers are relatively small compared to the interactions between the peptide and the rest of the protein and because of neglect of the solvent entropy terms. Statistical methods began with the efforts of Chou and Fas- man (1974), who characterized the preference of each arn~no acid found in each type of secondary structure. Other early efforts focused onturnsinproteins(Kuntz, 1972; Lewis etal., 1971~. Robson (Robson and Osguthorpe, 1979; Robson, 1986) followed with more sophisticated approaches. These approaches give the general impression that the statistical methods are easy to use but have significant random and systematic noise that limits their accuracy. For example, they ignore long-range effects (Kabsch and Sander, 1983) and prosthetic groups. Modeling efforts have grown from the early observations of Schiffer and Edmundson (1967) that alpha helices in globular pro- teins often contain hydrophobic and hydrophilic faces in agreement with the ideas of Kauzmann (1959~. Many investigators followed this line of thought. Thus, helical propensity has been identified with helical nets, from a Fourier analysis of the hydrophobicity (Eisenberg et al., 1984; Finer-Moore and Stroud, 1984), or from pattern-matching (Cohen et al., 1986b). Beta structures have been treated in similar ways, although less successfully. Turns are often associated with regions of hydrophilicity. Some labelings are valuable for detecting higher order struc- tural information. One of the most common methods used to explore this information is secondary structure analysis. This anal- ysis provides information on possible patterns of coils, sheets, and helices in a protein. Other information can be extracted from the sequence by recognizing that certain combinations of amino acids indicate turns or other structural features. Alternative represen- tations can be used to deterrn~ne patterns comprising amphiphilic beta-sheets or alpha- or pi-helices (Kaiser and Kez~y, 1984~. Often this information can be gathered by empirical, rule-based systems, described below. However it is determined, this information can be used to build up a hierarchy of patterns. This hierarchy is

OCR for page 32
34 Domains Super-Secondar, Structural Elements Secondary Structural Elements - - 1 1 1 XXIXXXXXXXgXjXXXXXXXjXXXkXXXkXXX~xxX 1 ~ 1 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX FIGURE 4-1 Hierarchical relationship of a protein sequence to higher order patterns of organization, culminating in the tertiary structure of the protein itself. indicated symbolically in Figure 11, which is based on the general scheme shown in Figure 3-~. According to this scheme, patterns of partial sequences can be recognized as representing secondary structural elements. Combi- nations of these elements are recognized as higher order patterns that correspond to supersecondary structures. These, in turn, may be recognized as patterns that make up a domain. If any portion of such a hierarchy is determined using only a sequence, this repre- sents a major step toward constructing an actual three-dimensional structure for a protein. More frequently, this procedure is used with other computational techniques for examining combinations of patterns that may lead to recognizable structural motifs in three ~ c .lmenslons. The general method used to study these problems involves only a few basic stepse First, one or more techniques are used to make the first assignment of structural features to patterns in a sequence for example, hypothesized secondary structural elements associated with specific partial sequences. Second, the pattern or patterns obtained are compared to those derived from known structures or hypothesized by the investigator. Third, if a match is found, the investigator proceeds to the next step of search- ing for patterns of the next-highest order, which may be composed

OCR for page 32
35 of several different combinations of smaller patterns identified in the previous step. If one can derive sufficient structural informa- tion, hypothetical three-dimensional structures can be proposed. Several examples have appeared recently where this approach has been successfully applied to pattern-based elucidation of structural features of proteins of known and unknown structure (Abarb~nel, 1984,1986; Cohen et at., 1986; Taylor and Thornton, 1983~. Current Status and future Prospects The assignment of each amino acid in a protein sequence to a particular secondary structure class has rarely been more than 70 percent accurate and is often worse. Some of the newer ap- proaches increase accuracy by reducing the scope of the problem. For example, Cohen et al. (1986a) describe procedures for the pre- diction of turns in subgroups of proteins. By tailoring algorithms to take advantage of the characteristics of, for example, all-alpha domains, the accuracy is improved to about 90 percent. In the near term, developments in this area are most likely to be incremental improvements. Computer speed will surely con- tinue to increase substantially. Data bases will continue to grow at least linearly. Experiments on the structural consequences of modifying amino acids are beginning to be reported in significant numbers (Ultsch et al., 1985; Alber et al., 1987~. More powerful statistical and modeling efforts are under development. What is more important, these approaches can be combined in useful ways. Within five years, several laboratories should have set up unified programs that allow complex inquiries of structural data bases. Automatic learning programs that extract secondary structure features will also have been intensively studied. Within 10 years, we may well see major improvements in our ability to correlate sequences and secondary structure. The cur- rent goal of correctly predicting every major feature in a new sequence is a plausible target. Accurate specification of the sec- ondary structural environment of each arn~no acid in a protein is probably not attainable without a major conceptual or computa- tional breakthrough.

OCR for page 32
36 NUCLEIC ACIDS Predicting RNA Structure RNA molecules are crucial to all stages of protein synthesis. Messenger RNA carries the code that specifies the amino acid se- quence of the protein; the transfer RNA molecules translate the code word by word into protein; and the ribosomal RNAs in the ribosome provide part of the machinery to do the synthesis. Many animal and plant viruses that cause tremendous darnage to hu- man health and economic welI-being are RNA viruses. Human disease RNA viruses include those for colds and influenza, AIDS, some cancers, and hepatitis. It would be very useful to be able to use only the sequence to predict the folded three-dimensional structure of any RNA in any environment. This structure deter- mines how stable an RNA molecule will be in a biological cell, because the ability of the enzymes that hydrolyze RNA (exo and endo nucleuses) to degrade a particular RNAis very sensitive to RNA conformation. Also, each RNA molecule requires the correct conformation in order to function biologically. The conformation, in turn, will depend on the environment as characterized by the type and concentration of ions, the presence of sDecificinterar~tin~ molecules and other variables. O The computer programs now used to predict RNA conforma- tion from sequence have limited goals and limited success (see, for example, Zuker and Steigler, 1981~. They were designed to calculate secondary structure only to specify which bases are paired. The computerized procedure uses experimental thermody- namic data on double-strand formation in synthetic RNA oligonu- cleotides (Freier et al., 1986~.A dynamic programming algorithm considers all possible base pairs in the RNA a sequence of N nucleotides has N(N-1~/2 possible base pairs and calculates the free energies of the corresponding structures. The free energy of a structure is assumed to be the sum of the free energies of its constituent substructures (Tinoco et al., 1971), including single- stranded regions, double-stranded regions, bulges, hairpin loops, and interior loops. The lowest free energy structure is the pre- dicted secondary structure. The computer programs allow one to specify that any two bases are paired to each other or that any

OCR for page 32
37 base is unpaired. Thus, other experimental data based on enzy- matic digestion experiments, chemical reactivity, or phylogenetic comparisons can be introduced. Clearly, the predicted results are only as good as the ex- perimental thermodynamic and other data used. For example, although all transfer RNAs are thought to fold as clover leaves, present computer programs only calculate about 90 percent clover leaves. Also, the thermodynamic experiments have all been done in one standard solvent, so knowledge about other solvents is needed. Finally, a limited number of oligonucleotides have been studied; they provide only a very small sample of the structural elements present in natural RNA molecules. A much better un- derstanding of the thermodynamics of possible substructures in an RNA is needed before an accurate and complete prediction of secondary structure is possible. The existing computer methods for calculating secondary structure in RNA are useful as aids in designing experiments to determine the actual secondary structure. For example, a program can provide the calculated lowest free energy structure, as well as other significantly different low free energy structures (Williams and Tinoco, 1986~. Experiments are done to test some of the pre- dicted substructures, and their results are incorporated into the calculation of the next prediction. Successive approximations thus lead to more correctly determined secondary structures (Cech et al., 1983~. Computer methods of the future must be based on much more detailed knowledge of the thermodynamics of local regions of an RNA molecule. The extensive thermodynamic data needed for correct prediction of secondary structure will most likely come from computer interpolation and extrapolation of lirnitecl data measured on synthetic oligonucleotides in a few solvents. Once we understand better the forces and energies involved in the in- teractions of nucleotides with solvent, ions, and each other, it will become easier to calculate secondary structures for large RNA molecules. Algorithms exist for calculating low free energy struc- tures as a sum of substructure energies, so their use with RNA of up to about 600 nucleotides requires only the appropriate data. However, for a rigorous search for structures, central processor (CPU) time and memory requirements increase as the cube or fourth power of the number of nucleotides. We estimate that a molecule of 3,000 nucleotides (typical ribosomal RNAs or small

OCR for page 32
38 viruses) would require about 40 hours of CPU time on a Cray XMP and 2 gigabytes of memory. Parallel processing or equiva- lent improvements in hardware then become necessary. Prediction of tertiary structure in RNA the three-dimension- al structure of the RNA-is much more difficult. Levitt (1969) made an early attempt at predicting the structure of transfer RNA, but an algorithm to find the lowest free energy tertiary structure still does not exist. Proposals of methods to fold pos- sible secondary structures into thre~dimensional structures and calculate their energies are in very early stages of development. Prediction of secondary structure in RNA is about at the same stage as it ~ in proteins, but prediction of tertiary structure is far behind. Novel methods are needed for RNA that take into account either implicitly or explicitly: the long range electrostatic repulsion of phosphates shielded by counter ions; the detailed conformation of hairpin, bulge, and interior loops; the hydrogen bonding between IOODS and ~in~l~_~t.r:`ncl-A rim_ "ions (pseudo-knots); non-Watson-Crick base pairs and triple base interactions; and all the usual London van-der-Waals interactions, including solvent. - ~ ~--O ~- -^ ~._ , a, Kollman and his associates at the University of California, San Francisco have made a beginning in this direction with the program AMBER (Bash et al., 1987a). This program performs the molecular mechanics calculations of energies with parameters optimized for nucleic acids. Other programs calculate differences in free energies caused by changes in conformations. The most useful computer modeling process would provide real-time calculation of free energies as the folding of a macro- molecule in a solvent was shown on a computer graphics screen. Achieving this will require great improvement in hardware and software. Close collaboration with experimentalists will be needed to ensure meaningful calculated results. Rapid and efficient progress in this field will require: . Effective methods for crystallizing RNA oligonucleotides and naturally occurring RNA molecules. To date, transfer RNA is the only RNA molecule whose x-ray structure has been determined.

OCR for page 32
39 Nuclear magnetic resonance methods that can provide con formations for RNA molecules that contain from 10 to 100 nucleotides. Computer programs that can reproduce experimental results. The high charge densities in nucleic acids (one per nucleotide) require special care in the correct treatment of solvent and ionic effects. Once calculations can be done that provide known structures, we can place some confidence on extras oration to new structures. We have been stressing free energies (thermodynamics), but kinetics may be just as important; equilibrium is never attained in a living system. RNA is folded as it is synthesized, so kinetic barriers may prevent it from reaching a global minimum. For some RNA molecules, such as ribosomal RNA, the dynamic movement from one conformation to another may be an important part of their function. We would like to be able to calculate and verify structures of RNA molecules and their interactions with a wide variety of molecules. In a ribosome, for example, the ribosomal RNA inter- acts with messenger RNA and transfer RNAs, as well as all the proteins involved in protein synthesis. It would be very useful to know how a change in any variable would affect the efficiency and fidelity of protein synthesis. We want to be able to design effi- cient messenger RNAs to produce any protein desired. We need to develop models of the process of protein manufacture so that we can then improve the productivity, cut the cost, and ensure high quality output of proteins. Computer modeling and calculations should provide the sequences of the ribosomal RNAs, the transfer RNAs, and the messenger RNAs that would be optimal for the production of a particular protein. We are very far from this ideal. We need mathematical and dynamic structural models of how an RNA virus replicates, how reverse transcriptase copies the RNA into DNA, and how the RNA is packaged into its protein coat. With this knowledge, we will be much closer to finding ways to prevent or cure diseases caused by RNA viruses, which include colds, influenza, AIDS, and hepatitis. RNA IS now known to have catalytic activity (Zaug and Cech, 1986~. To date, it has been demonstrated that RNA catalysis is involved in RNA processing and in glycogen synthesis. This catalytic activity of RNA is needed for the replication cycle of

OCR for page 32
40 some viroids and virusoids (small infective RNA particles) and the processing of some rite osomal RNAs. Fundamental advances in understanding this catalytic activity require knowledge of the location and structure of the active site of RNA enzymes. Com- puter calculations in conjunction with mutation experiments may allow us to progress most rapidly. Predicting DNA Structure Although DNA and RNA are very similar, in practice, the important problems relating to their sequences and structures are usually different. DNA stores all the genetic information that determines the organism and its characteristics. Its sequence, con- formation, and interactions with proteins, small molecules, and other DNA molecules determine how and when the genetic infor- mation is expressed. The ability to understand and ultimately to control the genetic expression will make it possible to con- tro} genetic diseases, bacterial diseases, and DNA viral diseases. Thus,the unportant questions for DNA are: What is the detailed conformation of any sequence of double-stranded DNA and how does it depend on the en- vironment? How does the conformation interact with other molecules? To answer these questions, we will need to understand nucleic acid structure, protein structure, and their interactions in complex environments. It will take a great deal of effort to achieve this understanding, but the rewards for society will be very high. X-ray diffraction experiments have determined structures for double-stranded DNA, protein-DNA complexes, and DNA-small molecule compounds in crystals. Computer modeling is needed to extrapolate these results to predict what would occur in different biological environments and to other complexes. One obvious application is in the design of more specific and more effective antibiotics. In general, we would like to be able to design molecules that can start or stop the expression of any gene in any DNA.