Read "Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology" at NAP.edu

« Previous: A PRIMER ON PROTEIN STRUCTURE

Page 241 Cite

Suggested Citation:"BASIC INSIGHTS ABOUT PROTEIN STRUCTURE." National Research Council. 1995. Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology. Washington, DC: The National Academies Press. doi: 10.17226/2121.

Page 242 Cite

Page 243 Cite

Page 244 Cite

Page 245 Cite

Page 246 Cite

Page 247 Cite

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

FOLDING THE SHEETS: USING COMPUTATIONAL METHODS TO PREDICT THE STRUCTURE OF PROTEINS 241 individual amino acids in the protein, one can determine which amino acids lie near each other. Based on these constraints, one can use the mathematical technique of distance geometry (Crippen and Havel, 1988) or restrained molecular dynamics with simulated annealing to build a partially constrained structure. (The isotopes 13C and 15N can also provide additional information.) Currently, this approach requires a noncrystalline but highly concentrated protein solution and works only for relatively small proteins (the resonances broaden as the molecule size increases and its tumbling time decreases). BASIC INSIGHTS ABOUT PROTEIN STRUCTURE If a protein sequence contains sufficient information to code for a folded structure, it should be possible to construct a potential energy function that reflects the energetics of an assembling polypeptide chain. In principle, one would ''only" need to find the minimum of this potential function to know the protein's folded state. In practice, this goal has proved elusive. Some early workers defined molecular force fields compatible with the experimentally measured conformational preferences of small molecules (Lifson and Warshel, 1969). Unfortunately, attempts to fold a denatured chain using this approach were unsuccessful (Levitt, 1976; Hagler and Honig, 1978) because multiple local minima along the potential energy surface trapped the folding chain in unproductive conformations (see Figure 9.3). Even with improved search strategies including molecular dynamics and Monte Carlo methods, it has not been possible to find the native structure from a random starting point (Howard and Kollman, 1988; Wilson and Doniach, 1989). This has been called the "multiple minima problem." It remains a critical problem for the conformational analysis of complex molecules. Despite the inability to fold proteins de novo, this approach has proved valuable for studying the behavior of proteins by studying small perturbations around the known structure. Because direct computation is difficult, one approach would be to look for patterns and regularities in protein structures that might simplify the task of prediction. In fact, considerable insight can be gained by simply looking at experimentally determined protein structures. First of all, one

FOLDING THE SHEETS: USING COMPUTATIONAL METHODS TO PREDICT THE STRUCTURE OF PROTEINS 242 Figure 9.3 The multiple minima problem: a two-dimensional schematic of the energy surface of a folding protein. Different starting points lead to different metastable states. Only S2 finds the global minimum. observes that proteins tend to employ certain stereotypical local conformations called secondary structures. The most important are called Î±-helices and Î²-sheet structures and were suggested by Pauling (Pauling et al., 1951) based on first principles. In an a-helix, the chain follows a right-handed spiral with hydrogen bonds between the amino group (NH) of one amino acid and the carbonyl group (C=O) of an amino acid a few steps further along the chain. The result is a stable structure with a sequentially local network of hydrogen bonds (see Figure 9.4A). Î²-sheets offer a different solution to the hydrogen bonding problem. These sheets involve segments of the chain that are sequentially distant but conformationally similar, forming an alternating pattern of hydrogen bonds (see Figure 9.4B). The Î²-strands may lie parallel or antiparallel to one another. In fibrous proteins, repeated amino acid sequences yield elongated Î±-helices like Î±-keratin (or hair) and Î²-sheets like Î²-fibroin (or silk). Globular proteins must contain amino acid sequences that break Î±-helix and Î²-sheet

FOLDING THE SHEETS: USING COMPUTATIONAL METHODS TO PREDICT THE STRUCTURE OF PROTEINS 243 Figure 9.4 (A) An alpha helix. (B) A Î²-sheet: four parallel Î²-strands are shown. Hydrogen bonds exist between oxygen atoms on one strand and nitrogen atoms on the neighboring strand.

FOLDING THE SHEETS: USING COMPUTATIONAL METHODS TO PREDICT THE STRUCTURE OF PROTEINS 244 structure and cause the chain to turn back toward the center of the molecule. Secondary structure provides a useful building block for constructing more complex protein structure (Crick, 1953; Levitt and Chothia, 1976). Proteins are usefully classified by their use of secondary structures: Î±/Î± proteins are structures dominated by Î±-helices (for example, myoglobin); Î²/Î² proteins are predominantly Î²-sheet structures (for example, plastocyanin); Î±/Î² proteins are characterized by the regular alternation of Î±-helices and Î²-strands (for example, flavodoxin); and Î± + Î² proteins are characterized by the irregular alternation of Î±-helices and Î²- strands (for example, lysozyme) (see Figure 9.5). Although the building blocks are common, the connectivity of the chain varies within these folding classes. Molecular biologists have borrowed the term "topology" (inappropriately) to describe the path that the chain takes in joining consecutive secondary structure elements. For example, many proteins contain four a-helices packed one against another to form a square four- helix bundle. With one helix taken as the reference point, the other three helices can be visited in six distinct orders. Moreover, each of these three helices can lie parallel or antiparallel to the reference helix. Thus, 48 motifs are possible. Is there any preference in the arrangements found in nature? By their general structure, a-helices have a dipole moment with partial positive charges near their N-terminus (start) and partial negative charges near their C-terminus (end). If electrostatic considerations are significant, one might expect to see antiparallel arrangements predominate (since opposite charges attract). In fact, a review of available protein structures reveals that 17 of 18 four-helix bundle structures conform to this expectation (Presnell and Cohen, 1989). Of the six possible motifs involving antiparallel arrangements, five have been observed in nature so far, and the sixth is expected to crop up as the database of protein structures grows (see Table 9.1 and Figure 9.6). An important corollary of the study of four-helix bundles is that quite distinct sequences can adopt similar structures: the code for folding is degenerate. Further insight into protein structure is gained by considering the physicochemical properties of the different amino-acid side chains. Some side chains (those called hydrophilic) interact favorably with water, while others (called hydrophobic) do not. For globular proteins, one would expect (Kauzmann, 1959) that the hydrophilic side chains would tend to

FOLDING THE SHEETS: USING COMPUTATIONAL METHODS TO PREDICT THE STRUCTURE OF PROTEINS 245 Figure 9.5 Tertiary structure classes.

FOLDING THE SHEETS: USING COMPUTATIONAL METHODS TO PREDICT THE STRUCTURE OF PROTEINS 246 dominate the exterior of the protein (where it interacts with the aqueous environment) while hydrophobic side chains would occupy the molecule's interior. Richards devised a simple method for defining the "solvent- accessible" portion of a protein by rolling a sphere with a radius comparable to that of a water molecule along the molecular surface (Lee and Richards, 1971). When amino acid residues are categorized in this way, it is indeed found that hydrophobic residues tend to occur on the inside and hydrophilic residues tend to occur on the outside, although the correlation is far from perfect. Solvent-accessible surface area calculations have shed light on the importance of the "hydrophobic effect" in driving protein folding and have proved valuable in dissecting the stabilization of proteinâprotein and secondary structureâsecondary structure inter-actions. Table 9.1 Topologies of Currently Known Four-Î±-Helix Bundles Number of Overhand All Antiparallel Others Connections(s) Left-handed Right-handed (right-handed) 0 Complement C3a Cytochrome b-562 Complement C5a Cytochrome c' Cytochrome b5 Methemerythrin Interleukin 2 TMV coat protein T4 lysozyme 1 Ferritin Phospholipase C (b) Cytochrome P-450cam 2 Human growth hormone NOTE: There are no left-handed topologies for "other" four-a-helix bundles. TMV is the tobacco mosaic virus. In summary, the analysis of protein structures has produced some unassailable conclusions: packing is an important element of protein stability; secondary structure is a common component of protein structure;

FOLDING THE SHEETS: USING COMPUTATIONAL METHODS TO PREDICT THE STRUCTURE OF PROTEINS 247 Figure 9.6 (Top) Two left-handed bundles (side view). Three specific attributes fully describe the topology of a four-Î±-helix bundle. These are the(1) polypeptide backbone connectivity between helices, (2) unit direction vectors of the individual helices, and (3) bundle handedness. In the first bundle there are no overhand connections, and in the second bundle there is one overhand connection. The handedness of a particular bundle is determined using the "right-hand rule" of physics. To determine if a helix bundle is of a particular handedness, orient the thumb of one hand parallel to the first helix or helix A where the positive unit vector stems from N-terminus to C-terminus (and helices A, B, C, and D are the first, second, third, and fourth helices on the path from the N terminus to the C terminus). Helix B should be oriented to the left if it is a left-handed bundle and to the right if it is a right-handed bundle. In the case where helix B is diagonally opposed to helix A, the handedness is based on the position of helix C relative to helices A and B. (Bottom) Schematic representation of the possible antiparallel four-Î±-helix bundles (top view). Bold lines represent connections n front of the page; thin lines represent connections behind the page. Left- handed and right-handed forms of four-Î±-helix bundles have an equal probability of occurrence. Reprinted, by permission, from Presnell and Cohen (1989). Copyright 1989 by S.R. Presnell and F.E. Cohen.

Next: THREADING METHODS »

Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology (1995)

Chapter: BASIC INSIGHTS ABOUT PROTEIN STRUCTURE

Welcome to OpenBook!

Get Email Updates