Read "Computer Assisted Modeling: Contributions of Computational Approaches to Elucidating Macromolecular Structure and Function" at NAP.edu

« Previous: 4 Secondary Structure of Proteins and Nucleic Acids

Page 41 Cite

Suggested Citation:"5 Tertiary Structure of Proteins and Nucleic Acids: Experimental." National Research Council. 1987. Computer Assisted Modeling: Contributions of Computational Approaches to Elucidating Macromolecular Structure and Function. Washington, DC: The National Academies Press. doi: 10.17226/1136.

Page 42 Cite

Page 43 Cite

Page 44 Cite

Page 45 Cite

Page 46 Cite

Page 47 Cite

Page 48 Cite

Page 49 Cite

Page 50 Cite

Page 51 Cite

Page 52 Cite

Page 53 Cite

Page 54 Cite

Page 55 Cite

Page 56 Cite

Page 57 Cite

Page 58 Cite

Page 59 Cite

Page 60 Cite

Page 61 Cite

Page 62 Cite

Page 63 Cite

Page 64 Cite

Page 65 Cite

Page 66 Cite

Page 67 Cite

Page 68 Cite

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

5 Tertiary Structure of Proteins and Nucleic Acids: Experimental X-RAY DIFFRACTION OF BIOLOGICAL MACROMOLECULES In 1934 Bernal and Crowfoot demonstrated that a crystalline protein could give rise to a well-ordered x-ray diffraction pattern, thus setting the stage for modern analysis of the structure of pro- teins. Progress was gradual at first, interrupted by World War II, but in 1953 Green, Ingram, and Perutz took another essential step when they accomplished the first heavy atom analysis of a hemoglobin crystal (Green et al., 1954~. The culmination of these years of work came in 1959 when Kendrew and his colleagues (1960) reported the analysis of myogiobin at 2 A resolution, re- veaTing for the first time the underlying structure of a globular protein. They noted the complexity and lack of regularity of the molecule major features that continue to impress us today as general features of protein structure. The alpha-helices and beta sheets of Pauling and Corey (Pauling et al., 1951) form striking regions of regularity, but are joined together in very complex ways. Another major event of protein structure analysis occurred the same year when Cullis et al. (1959) described the structure of hemoglobin at 6 ~ and demonstrated that the folding of the gIobin chain is similar to that in myogIobin, despite relatively low 41

42 sequence homology between the two. This observation of a family pattern to the three-dimensional structure of globins has been followed by the identification of many other families. Today, several hundred proteins have been analyzed by x- ray diffraction and their three-dimensional structures catalogued. This number continues to grow at an ever-increasing rate and, together with the amino acid and gene sequence data, forms the principal basis for understanding the mechanisms of action of these proteins at the molecular level. The development of two-dimensional nuclear magnetic res- onance (NMR) techniques already is a valuable complement to x-ray diffraction for relatively small molecules (less than 10,000 molecular weight) but for the foreseeable future, crystal structure analysis will be the principal experimental source of structural data for enzymes, nucleic acid binding proteins, antibodies, and other proteins of the immune system, receptors, and indeed, for all proteins that can be effectively crystallized. Determining the three-dimensional structure of a biological macromolecule by crystallography involves a number of clearly defined steps. First, crystals of suitable size and diffraction prop- erties must be prepared. Next, x-ray diffraction data must be collected for these crystals and also, typically, for a number of heavy atom derivatives of the crystals. These data can then be as- sembled to obtain an electron density map using a computational process that resembles the action of the lens in a microscope. This map must then be fitted with a polypeptide chain of the appropri- ate amino acid sequence. Because the map is of less-than-atomic resolution and also contains errors in the phase determination, the investigator must have considerable skill to obtain the best fit. The resulting protein model must then be refined to remove the errors present in the map as much as possible as well as those errors introduced by the fitting process. Computers play an essential role in most of these steps. Even the analyses of the first protein crystal structure, myoglobin, could not have been accomplished without the use of the EDSACIl in Cambridge (Kendrew, 1960~. At present, modern crystallography depends completely on heavy computer use, and this dependence will certainly increase steadily in the future. In the four math- ematical procedures required to solve a structure using protein

43 crystallography data processing, phase determination, map fit- ting, and refinement-new methods are continually appearing that depend on ready access to considerable computer power. First, we will consider the first step, data collection. This is now changing significantly, as most laboratories in the United States convert from the use of film or diffractometer to the use of area detectors. These machines can increase the speed of data collection within the laboratory by as much as two orders of mag- nitude. The output from the area detectors is generally processed directly, resulting in the speedy production of finished intensity data. The ability to produce directly, in a few days, data that pre- viously were collected in weeks or months of labor-intensive work is revolutionizing the field at an opportune time, when developments in genetic engineering have made it possible to use site-directed mutagenesis to answer many structural questions. These ques- tions, however, require separate data sets for each mutant. When measuring x-ray intensities, whether using photography or area detectors, the presentation of the data on a computer graphics screen can make it much easier to analyze the diffraction pattern. Area detectors coupled with graphics facilities can now be used to align a crystal almost in real tune. It is not clear how much the use of computer graphics in data collection will continue to increase in the future, since its impact will probably be reduced by the increasing power of the data processing software. A major discovery of protein crystallography is that most proteins belong to families with closely related three-dimensional structures. Examples include the hemoglobins, serine proteases, aspartic proteinases, and the immunoglobulin domain structure. Consequently, we can now use the known structure to obtain phase information about an unknown structure, a method known as molecular replacement (Rossmann, 1972~. There have been numerous examples of the successful application of molecular re- placement to determine crystal structures. The structure of a bacterial serine protease inhibitor was used to analyze the struc- ture of the protease bound with an inhibitor (James et al., 1978~. Another example is the use of the known structures of lysozyme and the two-domain modules of an immunoglobulin Fab to analyze the structure of a monoclonal antibody bound to lysozyme (Sheriff et al., in press). The use of this molecular replacement method will become even more widespread as more members of a protein family are investigated.

44 Molecular replacement techniques have also been applied in the use of redundancy to obtain phase information (Rossmann, 1972; Harrison et al., 1978~. A spectacular illustration of this occurred in the recent analyses of the picornaviruses for polio and the common cold (Rossmann et al., 1985; Hogle et al., 1985~. These methods require heavy computational analysis for their success. For example, Rossmann and Argos (1977) concluded that they could determine the rhinovirus structure only with extensive use of the supercomputer at Purdue University and could not have carried out the analysis and phase extension without such a facility. COMPUTE:R-ASSISTED MODELING IN DNA STRUCTURE ANALYSIS In the current method of fitting the electron density map, computer graphics are essential. Although several programs can fit an approximate model of a protein to the map without human intervention, most crystallographers do not use them, preferring to use instead computer graphics to fit. Here, the development of color and stereo graphics systems has been an important advance. Computer graphics constitutes such a colorful and seductive tool that it virtually compels the nonscientific observer to believe in whatever phenomenon is being displayed. This inherent fasci- nation with graphics display extends to some working scientists as well. One always must ask whether ito paraphrase an old joke about statistics] a scientist is using computer graphics like a drunk uses a lamp post: more for support than for illumination. But in the area of macromolecular structural analysis, particularly with DNA and its complexes with antitumor drugs and control proteins such as repressors, computer graphics will illuminate by enabling the investigator to carry out the structure analysis efficiently and to see aspects of the structure of the molecule that he could per- ceive only with difficulty or would overlook entirely using more traditional methods. In the early 1950s, before any protein structure had been determined by x-ray methods, the British crystallographer J. D. Berna] once remarked that, even if we were to obtain an electron density map of a protein, we would never be able to understand it until we could build a map big enough to walk through and point out features around and above us as we walked (personal

45 communication, around 1955~. Kendrew had this dictum in mind when he constructed the first electron density map of any protein, myogiobin, by attaching color-coded spring clips up the lengths of steel rods mounted in a regular grid on heavy plywood baseboard. (A portion of this first map still exists on display in the Kensington Science Museum in London.) Richards (1968) provided the next step in the display of large and complex electron density maps of macromolecules, with a half-silvered mirror arrangement that came to be known as a Richards Box. The device superimposed a direct image of mylar sheets of electron density on a reflected image of the wire mode! being constructed (Figure 5-1~. In the 1960s and 1970s, virtually all macromolecular structure groups had at least one Richards Box to use in interpreting electron density maps and building macromolecular models into them. This entire half-silvered mirror box technology has been re- placed by new methods of computer graphics. Detailed chain fitting is carried out on the graphics screen, fitting stick bond skele- tons into "chicken-wire three-dimensional contoured volumes. Di- amond (1966) wrote the first such display program, BlEDER. This routine has largely been superseded by the easier-to-use FRODO routines of Jones (1985~. More recently many even more flexible packages have been offered, both for large mainframe computers and minicomputers. The software most widely used is the program FRODO de- veloped by Alwyn Jones, who is at the University of Uppsala, Sweden. The use of FRODO on a computer graphics system has now almost entirely replaced the construction of mechanical mod- els, and greatly improved the accuracy in the modeling and, in particular, the speed and precision of the more predictive kinds of modeling, such as fitting substrates to the surfaces of enzyme molecules. In these programs, atoms can be placed within the observed density with great accuracy. Because the coordinates are in the computer as soon as the atoms are located, acceptable bond lengths and angles can be built into the trial mode} from the outset. Once a trial mode! has been built into the displayed elec- tron density, least squares refinement against the x-ray can be carried out using one of several available programs that compen- sate for a too-small ratio of data points to refined parameters at less-than-atomic resolution. The programs impose constraints on

45: F1~UiE ~5~1~ ll~l~st~^t~ ~ the S~>thSS (~/ I the ~ fief ~ipS{6r gamy ~ ~te~t~el~t=~ ages; If ~ =~1~1~ S~ ~ < bag. S=t~s twang) to. of tag pr^41n tom an ~= A ~, Plexiglas sheets !~$ A- be sally 3z *B8s~~ ~ ~ i~gi$ ~x tag upper rear. TBe 1_ leg goat soon ~ lore =odei of-tBe It =ol~ule; co~st=~$e! in ~ Em data tr~sp~ent tog. The S~sll~reE ~lr~r ~ the awe ~ 45 degree ~ the =~< we ;uppo~ng $)e S s~per~pos~ ~ Slat v~ ~ tag contours gap ~3 ~ mat v1~ of $Le alp Bagel igloo. Tag apply ~= ~1~ teak ~S ~1~.

47 bond lengths, bond angles, and (if desired) bond torsion angles to keep them within known acceptable limits. In refining protein and nucleic acid structures, the use of new methods is increasing investigators' dependence on heavy comput- ing. Current methods of refinement include the use of restrained least squares programs originally developed by Hendrickson and Konnert (1980), Sussman (1985), and by Jack and Levitt (1978) among others. These programs can be used successfully on rela- tively slow computers, particularly when the Fast Fourier Trans- form algorithm is used to accelerate them. At present, the re- strained least squares refinement procedure must be interrupted frequently to compare the fit of the mode) to the electron den sity map, remodel to remove stereochemically unacceptable local minima, insert solvent molecules, and for other procedures. This process of mode! building is the step that is limiting the rate of the refinement procedure. It takes many days, even weeks, for a comprehensive, residue- by-residue examination of an average-sized protein molecule of, for example, 40,000 molecular weight. The procedure for identi- fying solvent molecules on the protein surface is also tedious and time-consuming. Any method that would reduce the number of interventions during a refinement or would require less human intervention during the modeling would speed up the refinement process. In this respect, Brunger et al. (1987) have recently shown that by combining protein dynamics s~rnulation with the refine- ment process, one can avoid many of the minor minima that sometimes occur in the usual refinement procedures. However, the successful application of the dynamics calculations would require considerably more computer power than is currently available to most crystallographic laboratories. At any point during refinement, computer graphics enable one to examine the trial structure alone or superimposed on the elec · . . . . ~ ~ tron density In one ot several options: a simple Fo electron density map, a (Fo ~ Fc) difference map, or the (2Fo - Fc) map that is in fact the superimposition of the two previous functions. When refinement is complete, the resulting coordinates are immediately available within the computer for use in drawing figures that illus- trate the final structure. This ability to work within the computer during mode] fitting, refinement, and display of results saves an enormous amount of time over the older technique of constructing physical models.

48 Difference Patterson Vector Maps to Locate Heavy Atoms The standard way of determining the phases of the x-ray diffraction pattern of a macromolecular structure ~ to prepare one or more heavy atom derivatives of the macromolecule that differ from the parent compound only by the addition of a heavy atom metal or halogen-at defined locations in the molecule. For DNA, this can be clone either by synthesizing the DNA oligomer using 5-bromocytosine instead of cytosine at one point in the sequence or by diffusing heavy atom complexes into the DNA after crystallization. Data are then collected on the parent DNA crystals ant] on each of the available heavy atom derivatives. Heavy atom positions are found by interpreting a difference Patterson vector map, which in principle has features that locate ah of the vectors between heavy atoms in the crystal unit cell. Difference Patterson vector maps traditionally are examined by using minimaps stacked on plexiglass sheets that are placed on a light box. Computer graphics display discussed above, how- ever, offers several advantages over this old technique. Once a trial position of a heavy atom ~ established, one must calculate Al of the heavy atom-heavy atom vectors possible and see how many of them correspond to features in the Patterson map. If the crystal has appreciable space group symmetry, this is a tedious and painstaking job. But it is a trivial task for a computer. The operator need only type in or locate a trial heavy atom position, and all the resulting interatomic vectors can be displayed instantly on the graphic image of the Patterson map. This allows rapid in- terpretation of the vector map and ~ especially important if the space group has high symmetry or there are multiple heavy atom sites. Computer graphics per se is of relatively little assistance in the subsequent process of refinement of heavy atoms and single or multiple ~somorphous replacement phase analysis. But once a rough electron density map of the DNA helix is calculated, graphics again proves extremely useful. ConstFuction of a Dial Structure into an Electron Density Map The advantages of computer graphics in fitting a structure into a displayed electron density map have already been men- tioned: greater accuracy in fitting the map, the ability to build in

49 realistic bond lengths and angles from the outset, and the imme- diate availability of the coordinates in the computer. Display of Intermediate Maps for Checking Errors in the Structure No restrained least squares refinement process is automatic, and people who have assumed too much in this regard have made serious errors. The investigator must monitor the progress fre- quently during least squares refinement by examining difference maps (2Fo - Fc) to look for wrongly positioned groups or missing features. This is tedious when one uses contoured m~nimaps, but can be much less so on a computer graphics display. The value of computer display of difference maps is well il- lustrated when one Is examining the structure of a DNA-drug complex. In solving the structure of a MONA dodecamer of se- quence ~G-C-G-A-A-T-T-~G-C-G with the antitumor antibiotic netropsin, Kopka first positioned the DNA in the crystal unit cell, using the results from the DNA alone, and refined the DNA until no further improvement was possible (Kopka et al., 1985a, 1985b, 1985c). She then calculated a difference map of coefficients (Fo - Fc), where Fo represented the observed intensity data from crystals of the DNA-drug complex, and Fc represented the trial structure calculated from the DNA alone. The result is a "chicken wire" contoured image of the drug molecule (Figure 5-2~. The known chemical structure of netropsin could be fitted easily and accu- rately into the graphics display, and refinement then continued to completion of the DNA and drug together. As a control, Kopka also drew a conventional minimap at the same point in refinement, but this was nearly uninterpretable because of the awkward orien- tation of the sectioning of the map and the clifficulty of building an idealized drug molecule into a map of stacked plexiglass sheets. In this particular application, the conventional minimap was tedious, but the graphics display was very simple to interpret. Location of Solvent Molecules and ions Around a Macromolecule The images of solvent molecules around a macromolecule can- not all be found from unrefined electron density maps. The quality of detail of the entire map improves as the fitting of the DNA is sharpened. Locating solvent molecules is a repetitive process that

so vt 1~ ail ~ By_ \\ t 'A FIGURE 5-2 Stereo pair drawing, photographed off the face of an E`rans and Sutherland Multipicture System graphics station, of the difference electron density of the antitumor drug netrop~in in its complex with B-DNA. The screen image was photographed onto Ektachrome slide film, and this film then was used as a negative in making a positive print. The framework is a representation in three dimensions of one contour level in the electron density of the drug, and the graphics operator has built a skeleton of the netropsin molecule within this contour cage. This is the first point at which information about the drug was built into the analysis and was the point of departure for further least-squares refinement of the DNA-drug complex. Source: Kopka et al., 1985c. involves adding a restricted number of solvent peaks in the im- mediate neighborhood of the DNA, refinement, and examination of the improved map for new images of solvent. This process Is shown in Figure 5-3 for the EDNA dodecamer C-G-C-G-A- A-T-T-C-G-C-G. This particular analysis was carried out with

~ - face , . ~ ,,, - a, I,,', \: FIGURE 5-3 Illustration of the iterative process of locating solvent molecules around a DNA double helix. The quality of the solvent images improves with refinement and improvement of the phases used in the electron density map calculation. One must avoid adding "solvents molecules too hastily, for fear of introducing erroneous peaks that then will persist and confuse during later refinement. Such a search for solvent requires the inspection of many successive electron density maps as refinement proceeds, and computer graphics are of enormous help in speeding up this process. (a) Vicinity of thymines No. 7 and 19 of the B-DNA dodecamer C-G-C-G-A-A-T-T-C-G- C-G, prior to the addition of any solvent molecules to the phasing. Residual error, R = 27 percent. (b) Later stage of refinement, R = 21 percent, three solvent peaks visible in these sections. (c) Still later stage, R = 20 percent, five solvent peaks visible. (d) Final map, R = 18 percent, ten solvent peaks indicated. Source: Drew and Dickerson, 1981. minimaps because the computer graphics capability did not ex- ist at the time, but recent DNA structure analyses use the more efficient graphics display. Display of Completed Macromolecular Structure The most familiar application of computer graphics to macro- molecular structure is the display of the final results. But even

52 1- ~ , , ~ 4 =N FIGURE 5.3b continued here, the flexibility of computer graphics enables one to go beyond simple drawing of views of the molecule and see features that or- dinarily would be overlooked. A case in point is provided by the stereo pair in Figure 5-4, which shows the drug molecule netropsin complexed with a 12 base pair B-DNA double helix. The image was oriented to sight directly down the minor groove, with the object of demonstrating that netropsin sits in the middle of the groove rather than to one side. But an unintended secondary div- idend emerged. In this view, the bases of each individual strand of the double helix are seen to be stacked atop one another, almost as though the other strand of the helix did not exist. This efficiency of intrachain base stacking means that when the two strands are wound around one another to build a double helix, the bases of each base pair are not coplanar; they are given what is defined as a positive propeller twist about the Tong axis connecting them. The

53 W; i - Am> ~ FIGURE 5.3c continued :F:~1 W:L I' ,'J ~ ~ ~ ~ ~ ~ <W ; : By' i ; . . ~ i. ,~ ,_ e )~25 1 ~ _ ,_ 'I' /, , ~ at' O n Hi, - ~: ' lessened resistance to propeller twisting in AT base pairs, which re- sults because AT base pairs have two connecting hydrogen bonds, as compared to three for GC pairs, means that the minor groove of B-DNA is closed down more in AT regions than in GC. This makes a flat multiring drug molecule such as netropsin fit more snugly into the narrow AT region and is part of the explanation for the previously established binding of netropsin only to AT base pairs. Hence, in this example, an unusual view of the DNA that was intended to illustrate one point revealed an unexpected new asso- ciation. Examples of this serendipity with computer graphics are found over and over again in macromolecular structure analysis. . . - Studies of Docking any Macromolecular l~teractione Once the structure of a macromolecule is known, it can be compared with those of related macromolecules or with other molecules with which it forms complexes. Computer graphics

|UL . /: 1 ) 1 ~ ~ _ FIGURE 5.3d continued permits one to move molecules relative to one another and to introduce minor changes in a manner that would be tedious or impossible with physical models. For example, Figure 5-5 shows the fitting of a netropsin molecule against the floor of the minor groove seen in profile. Without computer graphics, it would be virtually impossible even to represent the contour of the floor of the minor groove, yet the drug binds to this surface. The figure illustrates the structurally significant finding that the ends of the molecule are more closely associated with the DNA than is the central amide. One can change base pairs in the DNA from AT to 2-aminoadeninethymine and show the steric clash that then ensues with the drug. One also can modify the drug itself and see what effects this is likely to have on its binding to DNA. This ability important when to examine intermolecular ~docking" is equally the two molecules are known, but their complex is not. Triab of various docking geometries, with calculations of

55 -a Ha> Hi< rip at: - _.< a_ Am< FIGURE 5-4 Oblique computer graphics stereo diagram of the complex of netropsin with C-G-C-G-A-A-T-T-C-G-C-G, in a view sighting directly down the minor groove with the drug molecule slotted into it in a crescent curving away from the viewer. This view also illustrates the strong stacking of bases down each strand of a Beta-DNA double helix, in a more striking representation than is obtained from a more conventional view of the Beta helix. Positive print from Ektachrome photo taken directly from graphics terminal. Source: Kopka et al., 198Sa. relevant energies, can suggest new modes of interaction, as well as the experiments needed to test them. In summary, computer graphics Is tar more than an attractive way of drawing pictures of macromolecules. It is a very powerful toot that greatly aids in understanding the interactions between DNA and other macromolecules and so leads to insights that otherwise would be overlooked. Even without computer graphics, considerations of only energy minimization have led to a predicted structure of an enzyme-substrate complex (Pincus and Scheraga, 1979) that was subsequently verified by experiment (Smith-Gill et al., 1984~.

~:s :s:s:: :s: s:s: as:: ~:s: ::: ::s:~:s ~:~:~:~:~:~: :: ~s~s~s~s~s~s :: :: : a: s ~:s:~ :~ a: s::~ss~:ssss~:s am age: ;s~ hiss :sss~sss FIN ~5-5 ~ ~ ~'~ Is ~i AIRS ss~s~sS~1e!~S1~^~ ~s~ss~s~s~::~s~^~ I !~ss~ ~ ~:~ ~S~:~^ !~:ss I: Us Us ::: ash s: s ~ s: otiose e ~y ~ SO-SO IS: _~S~S~ Is I~ IS ~s~os~r~~ ~ Oh [aches l~^ ~ Aft ~ Pappas. akin ~ ll~str~t~ the 1~s It flak tar Mar ~ tag Pallor Acme ~ the Meter of tees Hag me tang ~ tag tag ems. Is: Spa et ~1.> 1~5~. ~ _ _ _ _ ~ _ e _ ~ __: ~:~ ~ ~ Ha: SASH ~:~ Sag ~IS '0~ ~ {~ ~ ~ -~1 ~· ~

57 USING NUCLEAR MAGNETIC RESONANCE TO DETERMINE TERTIARY STRUCTURES Nuclear magnetic resonance (NMR) is an important source for structural data, particularly when one considers its poten- tial for meshing with theoretical modeling and other sources of structural data, such as x-ray crystallography. Its use has been growing explosively. NMR has been applied in aqueous media and, to a limited extent, in environments that approximate those of biological membranes (Braun et al., 1981~. It is most useful when it is necessary to explore changes in preferred structure in response to environmental or structural perturbations. In this sense, NMR can play an important role in extrapolating structural data obtained by other techniques to alternate environments. It is also well suited to the exploration of changes in structure when comparing homologous series of macromolecules, such as a series produced by site-specific mutagenesis (Markley, 1987~. More recently, technological advances have extended the range of applicability to solids, oriented phases, and even total structure determination of molecules in solution. The latter extension has attracted the most attention and has the greatest potential impact on computer-assisted modeling efforts. Examples oftotalstructure determination by NMR are most numerous among relatively small molecules: peptides, oligosaccharides, and nucleotide oligomers. However, several groups have determined peptide or protein struc- ture for molecules in the 5 to 10 kDa molecular weight range (Ar- seniev et al., 1984; Braun et al., 1986; Have} and Wuthrich, 1985; Kaptein et al., 1985; Kline et al., 1986; Williamson et al., 1985~. A brief discussion of the protein examples provides insight into the potential contribution NMR may make to the prediction of macromolecular structure and function in years to come. Metho~lology In most cases, the basis of structure determination by NMR methods is the interatomic distance dependence of the nuclear Overhauser effect (NOE). This is a nuclear spin relaxation phe- nomenon that depends on through space dipolar interaction be- tween magnetic moments centered on different nuclei (usually pro- tons). The NOE shows an inverse sixth power dependence on dis- tance. In principle, Have! et al. (1979) found that the conversion

58 of enough NOE measurements to distance constraints between pairs of protons in macromolecules would be completely equiva- lent to the specification of structure through a set of Cartesian coordinates. The ubiquitous occurrence of protons as the hydro- gen nucleus in chemical structures insures an abundance of NOE data. These data have been gathered and used for a long time in studies of small molecules, but until recently, the sheer abundance of data for macromolecular systems has prevented unequivocal interpretation in terms of macromolecule structure. A protein of 10 kDa will have approximately 1,000 protons, each giving rise to one or more resonances in a proton NMR spectrum. Resolution in conventional spectral acquisitions is in- adequate to identify each resonance, let alone assign it to a pri- mary structure site or acquire NOE data on a significant fraction of the possible 5 x 105 proton pairs. The situation with other types of macromolecules, nucleic acids or oligosaccharides, is less formidable in terms of numbers of protons. However, it is com- plicated because residues in these structures are less chemically diverse and the resulting resonances are less dispersed in NMR spectra and so it is difficult to assign a particular peak to a par- ticular proton. The resolution and assignment problem has been solved with the advent of higher field magnets (currently 14 Tesla) that pro- vide increased chemical shift resolution, as well as by the devel- opment of two-dimensional acquisition techniques that provide multidimensional resolution and greatly improve efficiency of data acquisition (Ernst et al., 1987~. Wuthrich (1986) summarizes the methods for proteins and nucleic acids in his work. Assignment of a resonance to a proton at a particular point in the primary structure site relies on a combination of exper- iments that display through bond scalar connectivities of reso- nances (COSY, or coupling correlated spectroscopy, for example) and through space dipolar connectivities of resonances (NOESY, or Nuclear Overhauser Effect spectroscopy). COSY is important in finding scalar coupling patterns that correspond to sets of spins that are characteristic of a particular type amino acid. For ex- ample, only alanine has three equivalent methyl protons coupled to an alpha-proton. The NOESY experiment is important in link- ing resonances assigned to a given amino acid to resonances of a neighboring amino acid. In cases where the amino acid sequence is known, the linkage of two to three amino acids is often enough

59 to create a segment that occurs only once in the sequence. In principle, this procedure provides a complete sequence specific assignment of NMR resonances. After the assignment ot resonances, the determination of sec- ondary and tertiary structures of proteins proceeds largely on the basis of NOE information. Qualitative analysis of the intensity of the crosspeaks that connect proton resonances involved in an NOE is often enough to characterize secondary structure. For ex- ample, in an alpha-helix, the amide protons of adjacent residues are 2.8 A apart as compared to 4.6 A in a bet~sheet (Wuthrich, 1986~. Because NOE has a steep inverse distance dependence, this leads to strong amide-amide crosspeaks in an alpha-helix but crosspeaks that are a factor of 20 less intense in a beta-sheet (usu- ally unobservable). It is possible to assign residue conformations to known types of turns as well as to more extended secondary structural elements. Since this assignment is sequence specific, it gives valuable information for verifying structure predictions and possibly assessing folding preferences even without a tertiary structure determination. Determining tertiary structure is more difficult. To determine a structure, a limited amount of longer range distance constraint information must be systematically integrated with other con- straints. The most direct approach probably employs a distance geometry search for structures that have interproton distances between the experimentally determined upper and lower bounds (Braun and Go, 1985; Havel et al., 1979~. Altman and Jardetzky (1986) recently developed an expert system that has also succeeded in producing a space-fi~ling representation of protein structure. It is, however, proving desirable to increase the degree to which theoretical constraints play a role in determining structures. This increase has been achieved using both molecular dynarn~cs and molecular mechanics-based programs (Brunger et al., 1986b; Kaptein et al., 1985~. It is desirable to integrate NMR data with theoretical predictions for two reasons. First, NMR data often leave certain regions of the macromolecule poorly defined; for example, there may be little NOE data on certain sidechain con- formations in proteins. Theoretical predictions can help specify the conformation of these regions. Second, use of some experi- mental data seems a viable approach to selecting among multiple minima in the complex energy surfaces that must be searched by theoretical modeling programs.

60 Viability of Approadh As with any new approach to structure determination, NMR- based methods are being tested for viability. The most direct method of evaluation ~ to compare structures determined inde- pendently by the new methodology with an established technique such as x-ray crystallography. Unfortunately, when Braun and coworkers (1986) first determined a structure without the aid of an existing crystal structure (metallothionein, a 7,000 Da pro- tein) the structure varied dramatically from the x-ray structure when it did appear. The reasons for the differences, although still unresolved, are more likely the result of actual structural differ- ences in the samples examined than of a flaw in the methodology. More recent comparisons show excellent agreement of structures determined from real OF simulated NMR data with structures determined by x-ray crystallography; examples include an alpha- amylase inhibitor, a 8,000 Da protein (Kline et al., 1986) and crambin, a 5,000 Da protein (Laue et al., 1985~. Resolution is difficult to define in NMR structures because some aspects, such as the conformation of the backbone, are determined very precisely, while others involving sidechain con- formations are poorly defined. It ~ even possible that the lack of adequate numbers of distance constraints wiD leave some re- gions completely undetermined. In principle, distances can be determined precisely, within 0.01 A, but this is only accomplished when relaxation processes are very well defined and interproton distances are short. Recent estimates for general resolution based on root-mean-square (rms) deviations of heavy atoms in multiple structure solutions of proteins obtained from NMR data suggest resolution to be approxanately 3 ~ (Williamson et al., 1985~. Al- though these average deviations are larger than those usually seen in x-ray data, NMR methods can be used in a variety of media and yield very precise distance information on selected distances. Both of these advantages compensate for the lower precision. The effort required to produce a structure is difficult to assess in this early stage of development. The first few tertiary struc- tures probably required several person-years of effort. However, this is dropping rapidly. Secondary structures have always been produced with far less effort. To produce a tertiary structure of a protein in the 10 kDa range, one month of spectrometer time

61 and six months of analysis ~ a reasonable estimate. Sample re- quirements are modest; 50 mg of a soluble 10 kDa protein. Smaller molecules require a proportionately smaller sample or a quadrat- ically smaller investment in time. It is important to realize that this Is an emerging technology, compared to better established structural methods. Large opportunities are available to improve efficiency and viability. Mutations and Prospects for the Future The major limitation to the above methods appears to be one of accessible macromolecule size. Current applications in which near-complete assignments are made and structures are deter- mined appear to be restricted to proteins of 10 kDa and less. This is due in part to inadequate resolution when thousands of con- necting peaks are involved. It is also due to loss of sensitivity in one of the key spectral assignment data sets (COSY sets). COSY crosspeak intensity is extremely dependent on the ratio of scalar coupling constants to linewidth. The linewidths increase rapidly with molecular weight, leading to lo~ of signal. It is significant that the principal source of distance information, the NOESY experiment, does not have the same degree of sensitivity loss as one increases macromolecule size. If other assignment strategies emerge and resolution of chemical shift can be improved, it will be possible to use NMR methods on larger macromolecules. We believe that improved assignment strategies will emerge. Already significant work has been done replacing normal amino acids with isotonically substituted amino acids in order to assign peaks arising from particular amino acid types (Kainosho and Tsuji, 1982; LeMaster and Richards, 1985~. This strategy by- passes some of the dependence of assignment on NOESY spectra and can be applied to proteins of more than 12 kDa (Kainosho and Tsuji, 1982; LeMaster and Richards, 1985~. Replacement of nor- mal amino acids with amino acids that contain i5N and i3C and the use of indirect detection methods to improve sensitivity also allow use of the increased chemical shift dispersion displayed in spectra of other nuclei. Proteins of 19 kDa (McIntosh et al., 1987) and 23 kDa (Ka~nosho et al., 1987) are under study. In these cases, the isotopic labelings were easily performed because the proteins were obtained from microorganisms. These developments, along with the probable advance in resolution from higher field magnets,

62 make it likely that general structure determination methods may be applied to proteins In the 20 kDa class within the next five years. Application to proteins as large as 60 kDa, where questions are focused on specific sites, already are possible. A rather lim- ited quantity of residue-specific information may be required to improve dramaticaDy the quality of theoretical predictions. A second limitation, to current methodology beyond accessi- ble macromolecule size, stems from the restricted range of measur- able interproton distances, <41. Although this range is adequate to determine structures of short segments, tertiary structures of larger systems are frequently the result of successive application of many short range constraints. This introduces the possibility that significant errors will be propagated. Here again, optimism is justified. The use of paramagnetic labels leads to perturbations of spectra interpretable in terms of distances over separations of more than 10 ~ (Kosen et al., 1986~. A third limitation occurs because the conversion of NOE mea- surements to distances requires assumptions about macromolecule rigidity. Since some portions of macromolecules are not rigid, the distances extracted for those portions are imprecise. To some ex- tent, the imprecision is reduced by the 1/r6 dependence of the NOE. An error of ~ 50 percent in an NOE ratio for a 3 ~ contact converts to error limits of-0.21 and +.33 A. It is nevertheless an important limitation. At present, these potential errors are han- dIed by assigning generous distance constraint limits, rather than by trying to specify distances precisely. In the future, it may be possible specifically to include dynamic information eliminating this necessity. AREAS OF POTENTIAL IMPACT OF NMR Proteins and Peptides It is clear from the above discussion that much of the research into the structure and dynamics of macromolecules, particularly proteins, may be restricted to relatively small members of the class. This limitation is imposed both by the magnitude of the task and the loss of sensitivity as macromolecule size increases. We should consider how these restrictions may affect biological science and what special problems or opportunities may arise in considering smaller members of macromolecule families.

63 Limiting research over the next five years to proteins of molec- ular weight 10 kDa or less may seem highly restrictive. However, a survey of the current protein sequence data base maintained by the Protein Identification Resource shows a surprisingly large frac- tion (20 percent) to be under this size (Barker et al., 1986~. This fraction may be inflated by the ease of handling shorter sequences, but even 10 percent of the current 4,000 sequenced proteins is a very large number to be studied by NMR spectroscopy. The current rate of physical characterization stands at 20 proteins per year by x-ray crystallography and 10 or more by NMR. Given that rate and the likelihood of producing sufficient quantities of many of the sequenced proteins by cloning techniques, human resources and characterization facilities are more likely to be limiting factors than is the number of proteins on which we want information. Beyond the issue of numbers, small proteins and even smaller peptides constitute a physiologically important class. Polypeptide hormones such as atrial naturietic factor, oxytocin, vasopressin, and insulin (subunit) fall in these classes (Walks et al., 19853. Various neurotoxins are small polypeptides, and a number of en- dogenous opioid peptides exist, such as the enkephalins and en- dorphins. One may also ask whether studies of small protein or pep- tide structure might be relevant for the understanding of larger structures. Although some behavior, such as allosteric interaction, is certain to be poorly represented in small molecules, the basic structural considerations are reasonably likely to carry over, and it is likely that fundamental processes such as protein folding can be studied. A large number of proteins are composed of smaller sub- units that largely maintain their structure when isolated. In some single chain proteins, structural domains that can be cleaved and functionally reconstituted without reforming a covalent linkage can be identified (Rose, 1985~. Small peptides present some potentially unique problems for conformational studies. It is not clear that the dominant confor- mations observed in solution, crystals, or simulations are those that are important for biological function. These molecules do interact with receptors, which may select a minor conformer or dictate a unique conformation that is a function of the properties of both molecules. Because most receptors exist in very small numbers, it has only recently become possible to produce enough for physical study. For the coming years, a major challenge is

64 the modeling of systems in which both activator and receptor can change conformation. This challenge relates not only to the under- standing of physiologically important molecules, but also to the design of pharmacologically important molecules. It is possible that a study of relatively flexible polypeptides wait contribute to the development of the methods needed to meet this challenge. Experimental measurements designed to explore minor con- formers should contribute in important ways to the understand- ing and development of modeling methods. An experiment using NMR where this type of study appears possible offers one exam- ple. Small molecules (1-2 kDa) have relatively inefficient pathways leading to nuclear Overhauser enhancements. When these small molecules bind to macromolecular receptors, more efficient path- ways are present. Distance constraints derived from the measured NOEs on molecules that exchange rapidly between bound and free states therefore pertain to the conformation of the bound molecule more than the free molecule (Clore and Gronenborn, 1983~. Such experiments have many limitations of size, binding constants, and rates of exchange, but they still promise to make some inroads into a very difficult set of problems in molecular biology. Nucleic Acide As a class, intact nucleic acids exist as molecular entities in sizes far larger than the proteins with which we typically work in NMR. However, smaller segments appear to display structural characteristics important in biological function. For example, dou- ble helix structures formed from oligomers of 8 to 12 bases in length exhibit variations in backbone torsion angles, base twist, and base tilt angles that characterize helix forms suspected to modulate transcription activity. A dodecamer helix has an effective molecular weight equiva- lent to small proteins. The same two-dimensional NMR methods applied to proteins allows us to characterize nucleic acid struc- tures and explore in detail the factors that lead to interconversion of structural forms. The imino protons involved in the hydrogen bond connecting base pairs exchange slowly with protons in the solvent and are easily resolved in the low field region of a proton NMR spectrum. Sequential assignments are made possible by the proximity of imino protons on adjacent base pairs and the strong cross relaxation peaks that connect these resonances in NOESY

65 spectra. Variation in distances between imino protons and sugar backbone protons for example, He protons on pyrimidine bases and H2' protons on the attached deoxyribose ring-makes it pos- sible to distinguish the type of helix on the basis of the presence or absence of cross relaxation peaks. More quantitative treatments of structure employ the same molecular mechanics, molecular dy- namics, and distance geometry methods used with proteins (Hare and Reid, 1986; Nilsson et al., 1986; Suzuki et al., 1986~. Beyond simple helical structures lies a vast region of struc- tural biology of nucleic acids that has been much less explored. Nucleic acids are important elements of ribosome structure and function. Some ribonucleic acids have even been shown to exhibit enzyme-like activity. Hybrid systems involving proteins and nu- cleic acids are important in synthesis, repa*, and regulation of protein structure and function. In the near future, we should have the potential for structural characterization by NMR of at least parts of these systems. In building a more theoretical basis for the extrapolation of primary structure to three-dimensional structure and function, one key step is the experimental verification of the existence of fundamental structural elements and understanding of the factors that dictate their occurrence. Carbohydrates The carbohydrate moieties of glycolipids and glycoproteins are a third class of biologically important molecules. They are important modulators of communication between and across cell membranes. A knowledge of three-dimensional structure and flexi- bility is essential to understand this modulating function. Because the function of these molecules has been less widely appreciated, they are discussed in more detail later in this report. Most oligosaccharides, if examined in isolation, are smaller than the protein and nucleic acid systems we have been discussing. Structure determination is, however, no less challenging because of the structural diversity of this class of molecules. The number of monosaccharide building blocks ~ intermediate in number between that of proteins and nucleic acids, but each monosaccharide can be linked through any one of several sites, with either of two anomeric configurations. That there are multiple linkage sites also opens the possibility of branching. It is difficult to predict the number of structurally distinct oligosaccharides in an organism,

66 but it is certainly large. Even if we attend only to a single class of oligosaccharide containing-molecules, such as glycosphingolipids, a significant number (more than 150) of primary structures have been determined (Hakomori, 1986~. Few of these structures have experimentally determined tertiary structures, and all evidence indicates that 150 represents a small fraction of the total number that exist. Also, unlike the protein and nucleic acid problems, carbohydrates present a pr~rnary as well as a tertiary structure problem. NMR methodology has contributed to the solution of both and, with some advances, should contribute further (Yu et al., 1986~. Analogies with protein structure determination exist. COSY spectra are important in assigning resonances to particular residue types. NOESY spectra are important in identifying linkage sites, linkage configuration, and sequence. Bottlenecks in terms of man- ual assignment and conversion of cross relaxation data to struc- tures are very similar to those encountered in protein studies (Bush et al., 1986; Yu et al., 1986; Homans et al., 1987; Dabrowski et al., 1986~. Progress has been impeded slightly more because computer modeling programs for oligosaccharides are somewhat less refined than are those for proteins. Oligosaccharides are also likely to be less rigidly structured and so require more attention to proper treatment of motional averaging. Nevertheless, signifi- cant advances are possible, at least in smaller molecules and more sterically constrained molecules of this class. With the establish- ment of a larger experimental data base using methods such as two-dimensional NMR, theoretical predictions of structure from sequence should become possible. DEMAND ON COMPUTATIONAL FACILITIES Improved computational and molecular modeling facilities could advance structure determination using NMR methods in several ways. Processing and analysis of NMR data for a 10 kDa macromolecule is far more time-consuming than is data acquisi- tion. Each phase of this operation could be improved. Data are normally collected as a two-dunensional time domain set and pro- cessing involves Fourier transformation to a frequency domain set. These processes are now handled by array processors associated with instrument computers with a moderate investment in time (tens of minutes). However, alternative methods of processing,

67 including linear decomposition and maximum entropy methods, may be more advantageous in terms of signal to noise and may be more compatible with automating analysis (Lane et al., 1985; Schusshenn and Cowburn, 1987~. These require far more computer time and may become practical only on supercomputers. Automating analysis can be difficult when working directly with a frequency domain data set. The sets are 4 to 16 megawords in size, and the connectivity peaks used in assignment are complex shapes that have both frequency and phase information. Several approaches are being explored to reduce these complex peaks to a few pieces of connectivity information, but it is possible that methods such as linear decomposition, which reduce frequency domain sets to lists of peak frequency and intensity at an early stage, will provide the breakthrough needed in this area. Once resonances have been cataloged, the next step is the as- signment and extraction of spectral characteristics important for the determination of secondary and tertiary structures. This pro- cess is now mostly done by hand. Some efforts are underway to use semiautomated pattern recognition and expert system strategies, but these will require substantial investments in programing and computer hardware (Pfandler and Bodenhausen, 1986~. At present, conversion of the data to a three-dimensional structure is handled by distance geometry or one of the pseudo energy approaches. Most such programs that currently run on supercomputers require one to several hours of computer time per trial structure. At least 5 trial structures should be generated for each data set to explore the constraint boundary conditions. As interest and capabilities for the production of data sets increase, the demand for computer time will become staggering. It is difficult to project if and how much these demands will be offset by improved algo- rithms. An investment in programming is obviously warranted. This investment could be put to best use by acknowledging that, in the future, it may be desirable to accommodate types of data not used today. Some data may come from other NMR methods that are applicable to solids and oriented specimens. Other data may come from entirely different methodologies, such as fluores- cence spectroscopy or tunneling microscopy. Although modeling and energy refinement programs have existed for years, it is clear that attempts to accommodate NMR data have met with some obstacles. Most modeling programs presume the existence of a

68 Cartesian coordinate set similar to that obtained from an x-ray structure. NMR data are most compatible with interatomic dim tances or interactive manipulation of segments of known secondary structure. Present methods, except in a few preliminary calculations, also assume that structure of biomolecules can be described in terms of a single rigid conformer. This is certainly not true. Relaxation of these assumptions will lead to drastically increased computational demands. Some of the problems and opportunities related to this are discussed more fully under the section on molecular dynamics. In summary, current NMR methods to determine structure seem applicable to a variety of biologically unportant molecules of less than 10 kDa. Data production in this clam will be made much easier by improved computational facilities, available high field spectrometers, and attention to the compatibility of model- ing programs with experimental constraints provided by NMR. It is unportant that those working in NMR collaborate with inves- tigators attempting to determine structure with other methods, because such links will increase the structural data base and also enhance testing and theoretical modeling. It is likely that the range of molecules accessible by NMR methods will increase by a factor of two over the next fire years. The rate of data production is likely to increase even more as structure determination protocols are improved and high-field spectrometers become more generally available. Advances in high-temperature superconductors may ac- celerate this process.

Next: 6 Tertiary Structure of Proteins and Nucleic Acids: Theory »

Computer Assisted Modeling: Contributions of Computational Approaches to Elucidating Macromolecular Structure and Function (1987)

Chapter: 5 Tertiary Structure of Proteins and Nucleic Acids: Experimental

Welcome to OpenBook!

Get Email Updates