Click for next page ( 19

The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement

Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 18
2 Introduction Life depends on the orderly flow of information among biolog- ical macromolecules. Information is prunarily stored as a linear code in nucleic acids. In general, there is a one-to-one correspon- dence between the amino acid sequence specified by the DNA code and that actually found in the protein. Processing, however, of the DNA itself, the transcribed RNA, or the translated protein fre- quently obscures this correspondence and makes it mandatory to verify the amino acid sequence directly. The amino acid sequence establishes what is called the Primary structure" of a protein. To be biologically active, the amino acids must be folded into a convo- luted three-dimensional structure. There is some debate over the extent to which the final tertiary structure of a protein is governed solely by its primary sequence. The most widespread view fol- lows from experiments by Anfinsen et al. (1961) who showed that bovine pancreatic ribonuclease regained activity from a denatured state without the involvement of any other macromolecule. To(lay, it is generally accepted that all proteins of low molec- ular weight can refold spontaneously. The question remains, then, of whether large proteins use ribosomes or other cellular materials to fold by a kinetically driven mechanism. Thus, two important questions must be considered. First, how is the protein tertiary structure encoded in the linear sequence of amino acids? Second, 18

OCR for page 18
19 is there a set of signals that controls the kinetic process of protein folding? Also, it is becoming increasingly obvious that the cell employs additional codes to signal for protein transport, protein destruction, carbohydrate structure, cell-cell recognition, hormone function, and the like. One place to begin deciphering all these messages that are so directly tied to the function of living cells is to gather information about the molecular structures of the proteins and nucleic acids. PauTing's elegant insights into the importance of hydrogen bonding were the earliest quantitative ideas about the structure of proteins (Pauling et al., 1951~. The models that emerged of helices, sheets, and the structure of collagen still form the basis of our understanding of fibrous proteins. Fibrous proteins are crucial elements in cellular architecture, but the chemistry of cells the molecular synthesis and metabolism that are hallmarks of life is controlled by elaborate enzyme systems whose detailed structural principles remain poorly understood. These proteins are often grouped together under the name of "globular proteins" because of their roughly spherical shape. In spite of the hundreds of x-ray studies of crystals of globular proteins (Perutz, 1965; Perutz et al., 1965) and important efforts at classification (Levitt and Chothia, 1976; Rossmann and Argos, 1977; Richardson, 1981), the most useful mode! of globular proteins remains the fundamental analysis put forward by Kauzmann in 1959. As Kauzmann (1959) noted on thermodynamic grounds, the water-soluble globular proteins are formed from a hydrophobic core and a hydrophilic exterior. In the ensuing years, this idea has been generalized in two ways. First, it is now apparent that globular proteins of more than 100 to 300 residues are constructed from independent domains, each of which is built according to the Kauzmann hypothesis (Wetiaufer, 1973~. Second, membrane spanning proteins appear to reflect similar thermodynamic princi- ples. The hydrophobic environment of the core of the membrane causes the protein architecture to invert, producing a hydropho- bic surface and a charged or polar interior, often in the form of a membrane-spanning channel (Henderson, 1979; Stroud and Finer-Moore, 1985~. Despite the success of the hydrophobic core model, it has not led to a theory detailed enough to offer an atomic description of protein structure. The basic difficulty is the amazing complexity

OCR for page 18
20 of the atomic arrangements in these macromolecules. The three- dimensional structure of globular proteins is governed by a bal- ance of the often contradictory requirements of optunum hydrogen bonding, burial of hydrophobic sidechains, and overall close pack- ing. Thus, their geometry is governed to a significant degree by tertiary interactions, rather than by the simple hydrogen bonding that dominates the fibrous proteins ant} nucleic acid helices. This complexity leaves us with what has frequently been called the "protein folding problem": how to calculate the tertiary struc- ture of a protein to a useful degree of accuracy from the amino acid sequence. This report will address protein structure calculations at some length. But we recognize that there are much broader concerns dealing with the structures of nucleic acids and carbo- hydrate species. For example, for some years after the modeling efforts of Watson and Crick (1953), investigators took for granted the structure of nucleic acids. Recently, there has been renewed interest in the range of structures presented by double helical DNA (e.g. A,B,Z)(Drew and Dickerson, 1981~; by RNA secondary struc- tures; by higher-order packing of these molecules into nucleosomes, chromosomes, and rite osomes; and by the specific interactions of nucleic acids and proteins. As another example, the importance of postbiosynthetic modifications of all biological macromolecules is gaining increasing attention. Although the exact roles of phospho- rylation, acetylation, methylation, and glycosylation are still being worked out, there is no doubt that these modifications frequently constitute the biologically active form of proteins and nucleic acids in living cells. Further, one must recognize the critical influence of environment on three-dimensional structure. The protein sur- roundings may be as simple as an aqueous electrolyte solution or as complex as macromolecular assemblies. Another matter of concern is the actual pathway by which the protein folding takes place. In the cell, this pathway may be influenced or regulated by the proximity of the ribosome itself, by chain cleavage, or by the timing of the addition Of r~rh~h=A-~^= ~ ~ ~,~ ~ aura __ _1~ 1 ~ ~ . _ _ or owner chemical species. Even in the test tube. refolding Ann-~r~ to be a complicated process. ~-o ~ Beyond these questions, the interplay of structure and func- tion suggests that we should try to understand the properties of proteins and how to manipulate them through the amino acid se- quence. We should discourage from the start the view that a single structure exists for each protein. The conformational choices are

OCR for page 18
21 numerous, even in the crystal state (Smith et al., 1986a). Fur- ther, most globular proteins are designed to provide organized internal motions (allosteric effects) as part of their functioning. Thus, one expects mayor conformational flexibility in many of these molecules. Some of this flexibility can be seen as thermally induced fluctuations that can be easily accounted for with normal mode analysis. Other aspects of the conformations freedom are much more complex and can be examined with the tools of molec- ular dynamics and statistical mechanics. In this report we will try to assess our understanding of all these issues. It is almost impossible to overstate the importance of the protein folding problem to the elucidation of structure-function relations. As a purely intellectual exercise, it clearly displays the fascinating complexity of a first-liass scientific puzzle. But the spotlight of attention is directed at protein structural predictions for other reasons. The most impelling is the incredible growth of knowledge in molecular genetics and protein engineering. New sequences are being reported worldwide at a rate of 1 every 10 minutes, while protein crystal structures are determined at a rate of 1 per month. Thus, sequences are being generated between two and three orders of magnitude faster than structures can be de- termined. Even with dramatic improvements in the technology of crystallography and magnetic resonance, the backlog of sequence data will grow rapidly. Further, the need to make rational plans for modification of protein properties is a major issue for all as- pects of the biotechnology industry. A thorough understanding of the growth and development of living organisms depends crucially on a molecular structural and functional description. This under- stand~ing would certainly lead to a revolution in health care and a much firmer grip on the problems of chemical toxicity. How valuable is it to know the structure of a protein? Clearly, models of structures have led to the design of pharmaceutical agents (Goo~ford, 1984 and Hot, 1986) and the engineering of specific properties such as improved stability (Ultsch et al., 1985~. We are at the very beginning of such activities. They will surely become very important with time. A more difficult question is, what can be done with incomplete or lower resolution structures? Their present value is in organizing experimental data and plan- ning new experiments (Cohen et al., 1986b). They can also be refined against new x-ray or nuclear magnetic resonance (NMR) data (e.g. Fitzwater and Scheraga, 1982; Brunger et al., 1986a).

OCR for page 18
22 Primary Structure Secondary Structure 1 Super-Secondary Structure Tertiary Structure Quaternary Structure FIGURE 2-1 Hierarchy of structural descriptions of biological macromole- cules. It is not yet clear if these approx~nate structures, by themselves, can be refined sufficiently to compete with either crystallographic or NMR experiments for all uses. However, for the design of new pharmaceutical agents they are invaluable aids if an experimental structure is not available (e.g. Plattner et al., 1986~. In the broadest terms, the convergence of results from exper- iment and theory yields useful models of protein and nucleic acid structure and function. Given the very large number of sequences expected in the next two decades, we estimate that thousands of these wiD yield useful structural calculations. The body of this report is a review of computer-assisted model- ing of macromolecular structure and function. We have interpreted modeling in broad terms as the use of computers in molecular cal- culations of all kinds. The organization of the report is illustrated in Figure 2-1, which shows how information flows from primary sequence to secondary and tertiary structure models. The role of computers in experimental methods is summarized in the section on tertiary structure. Issues of modifying function and designing new materials are considered next, followed by a discussion of trends in computer hardware.