Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
FOLDING THE SHEETS: USING COMPUTATIONAL METHODS TO PREDICT THE STRUCTURE OF PROTEINS 237 A PRIMER ON PROTEIN STRUCTURE Proteins are constructed by the head-to-tail joining of amino acids, chosen from a 20-letter alphabet. The 20 natural amino acids have a common backbone, but a variable side chain or R-group. The R-groups may be large or small, charged or neutral, hydrophobic or hydrophilic, and conformationally restricted or flexible (see Figure 9.1). It is the physical properties of these R-groups that determine the diverse structures into which a given amino acid chain will fold. Broadly speaking, proteins can adopt fibrous or globular shapes. Repetitive amino acid sequences adopt elongated periodic fibrous structures, with common examples including elastin (skin), collagen (cartilage), keratin (hair), and β-fibroin (silk). This chapter focuses on globular proteins. Figure 9.1 Twenty amino acids: R-groups are shown clustered by functional types: aliphatic hydrophobic, aromatic hydrophobic, hydrophilic, negatively charged, positively charged, and conformationally special.
FOLDING THE SHEETS: USING COMPUTATIONAL METHODS TO PREDICT THE STRUCTURE OF PROTEINS 238 The enzyme ribonuclease, which catalyzes the breakdown of ribonucleic acid (RNA), provides a useful example. The sequence contains 124 amino acids. Under appropriate conditions, the amino acid chain is covalently cross-linked in four locations through disulfide bridges between cysteines in the protein chain. (The amino acid cysteine has a reactive sulfur atom that forms such bridges, which provide the only covalent bonds joining nonneighboring amino acids in the chain.) In a classic series of experiments, Anfinsen et al. (1961) demonstrated that the amino acid sequence of ribonuclease contained enough information to code for the folded structure. Specifically, he showed that ribonuclease lost its enzymatic activity in the presence of a chemical denaturant (which disrupted the protein's structure) but spontaneously regained its activity when the denaturant was removed. Even when the disulfide pairings were scrambled after denaturation, renaturation could occur. Thus, without any outside assistance, the protein could refold. Independent of the starting conformation, the amino acid sequence contains sufficient information to direct the chain to the correct folded structure. Similar experiments have been repeated with many other proteins. This work would suggest that proteins follow an energy gradient from the denatured state to the native state. The free energy difference between these two states favors the folded state, and the height of the activation barrier along the folding pathway governs the rate of chain assembly (see Figure 9.2). Recently, molecular biologists have discovered that some proteins can assist the folding process. These proteins, dubbed foldases, include the chaperonins (Kumamoto, 1991) that prevent proteins from assembling inside an undesirable cellular compartment, prolyl isomerases that increase the rate of the cis-trans isomerization of the amino acid proline (Fischer and Schmid, 1991), and protein disulfide isomerases (Freedman, 1989), which shuffle disulfide bridges. While it is conceivable that these foldases might take a protein to a kinetically trapped final state different from the state of lowest free energy, this seems unlikely. Instead, I imagine that these foldases simply lower the activation barrier to folding into the lowest energy state. In the absence of an appropriate foldase, the height of the activation barrier might be such that in some cases, protein folding will not occur on a biologically sensible time scale.
FOLDING THE SHEETS: USING COMPUTATIONAL METHODS TO PREDICT THE STRUCTURE OF PROTEINS 239 Figure 9.2 Thermodynamics of protein folding: the folding chain must surmount a free energy barrier to move from the denatured to the native state. The native state is more stable than the denatured state by free energy âG. One reason for the tremendous interest in the protein folding problem is that it has become simple to determine the amino acid sequence of large numbers of proteins while it remains difficult to determine the structure of even a single protein. The first protein sequences were laboriously determined by classical biochemical methods (Konigsberg and Steinman, 1977). The proteins in question were isolated, purified to homogeneity, and enzymatically digested into smaller fragments. Amino acids in each such fragment were chemically cleaved, one residue at a time, from one end and from each successive amino acid. Automated methodologies and improved chemistry accelerated this process, but protein sequencing remained a tedious task until molecular biology supplied a different approach (Maxam and Gilbert, 1980). By determining the deoxyribonucleic acid (DNA) sequence of the gene encoding the protein (using methods that were quite rapid), one could infer the amino acid sequence of the protein by simply translating the DNA codons according to the genetic code. The approach is much faster and more reliable than direct protein sequencing. With the advent of this technology has come a flood consisting of tens of thousands of protein sequences.
FOLDING THE SHEETS: USING COMPUTATIONAL METHODS TO PREDICT THE STRUCTURE OF PROTEINS 240 By comparison, the rate at which new protein structures are determined remains a trickle because the structure determination remains a formidable experimental task. X-ray crystallography was the first technique used to determine the structure of proteins (Kendrew, 1963). One must first coax a protein to crystallize with sufficient regularity to diffract X-rays. Then the crystal must be bombarded with X-rays and the X-ray diffraction pattern collected, either on film or with an electronic detector system. In principle, the X-ray diffraction pattern corresponds to a Fourier transform of the electron density D of the crystalâwith the amplitude and phase of the signal at each point corresponding to the amplitude and phase of the corresponding complex Fourier coefficient. Unfortunately, detectors can record only the amplitude, not the phase. Solving for an X-ray crystal thus involves determining the density D from | |, which can be a formidable task. In general, the problem is underdetermined. A mathematical approach is to add constraints (for example, D must be everywhere positive, since it represents a density). An experimental approach is to use additional information from the X-ray diffraction pattern obtained when the protein is crystallized in the presence of a heavy atom (for example, mercury, uranium, or platinum) or anomalous scatterers (for example, selenium) bound to the protein in a covalent or non-covalent fashion. The difference between the original and modified patterns or the patterns as a function of X-ray wavelength provides the missing phase information. Although the approach is very powerful, it requires that the protein architecture not be significantly changed by this molecular perturbation, and it is more successful when several derivatives are available for study (Blundell and Johnson, 1976). Finally, one can start with a good guess at the protein structure. The Fourier transform of this structure yields a set of intensities and phases. The hypothetical structure is rotated and translated until the intensities match the experimental data. If the correlation between the hypothetical and actual structure is strong, then the structure determination can succeed without the need for heavy atom derivatives. More recently, nuclear magnetic resonance (NMR) spectroscopy has been used to determine protein structure (Wuthrich, 1986). Pairs of hydrogen atoms (protons) produce resonances when they lie in neighboring positions in the protein chain or when they lie very close together in space. By determining the correspondence of resonances with