Click for next page ( 107


The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 106
Functional Aspects of Proteins and Nucleic Acids We turn our attention from structural considerations to the more complex questions surrounding biological function. This section contains discussions of enzyme catalysis, protein design, and ligand/substrate design. CATALYSIS The Theory of Enzyme Catalysis Enzyme catalysis is one of the most crucial en c} certainly the most intriguing aspects of the kinetic behaviors of proteins. The central question about catalysis is, which aspects of protein struc- ture and dynamics cause the often enormous enhancements of rates of reaction over the rates in water? This question is, at best, incompletely answered and at worst very much open. One can not expect that molecular dynamics, with its use of a classical mechan- ical forcefield, will, by itself, provide definitive answers. Empirical potentials in molecular dynan~cs or molecular mechanics calcu- lations are approximations chosen to mode! thermodynamically stable minima in energy surfaces. Catalytic reactions are by their very nature dependent on barriers or maxima in these surfaces. Not even the form of the potential used in most classical calcula- tions would be correct. However, it is reasonable to think that this 106

OCR for page 106
107 problem will eventually be solved by an approach that combines molecular dynamics and quantum mechanical methods. Molecular dynamics techniques can handle the motion of many atoms, and quantum mechanics can be used to represent events along the reac- tion pathway at the catalytic site and in the reactants (substrates), that is, wherever chemical bonds are broken and/or formed. Re- cent studies of simple organic reactions in solution by Jorgensen (in press) and studies of enzyme mechanisms by Warshel (1986) exemplify this combined approach. In work on simple organic reactions in solution, the reaction path has been investigated by a series of ah initio quantum me- chanical calculations of the reactants in vacua in different states of reaction, and molecular mechanical (Monte Carlo) simulation of the salvation of each state. When combined, the results of these two calculations yielded estimates of the Dee energy profile along the reaction coordinate, from which reaction kinetics can be estimated. Although the quantum mechanical calculations did not take into account the response of the reactants to the sol- vent environment, Jorgensen (in press) nevertheless obtained very promising results for three different reaction types: SNI, SN2, and addition reactions. Parallel studies of enzyme mechanisms pose additional prom lems, simply because the systems, including the reacting species, contain many more atoms. Except in a few simple reactions such as catalysis by carbonic anhydrase, the substrates are much larger than the reactants ~ the models chosen by Jorgensen. Also, in many enzyme-catalyzed reactions, a chemical bond forms between enzyme and substrate in an intermediate product of the reaction. In these cases the ah initio quantum mechanics calculation, which by its nature Is currently restricted to systems of a few atoms, will have to be performed on a fragment or fragments of the chemically reacting species. It is not yet clear how this can be done without introducing large errors. Warshe! has introduced the use of a more approximate quantum mechanics method, the ab initio empirical valence bond (EVB) method, into this problem to replace the quantum mechanical component. Although it can handle more atoms, the EVB method initially had the drawback of having to be calibrated by simulations designed to predict known molecular properties. In this instance, acidities of ionizing groups were used. However, this was done successfully, and Warshel and Sussman (1986) and Hwang and Warshel (1987) found that the method

OCR for page 106
108 now can rationalize observations of changes in catalytic efficiency of mutant enzymes. These results emphasize the critical role of stabilization of the reaction's transition state by electrostatic in- teractions. Because of the empirical character of the EVE method and a lack of general experience with it, these conclusions await confirmation through further study with the EVB method and ab initio methods. Given the great interest in the theory of enzyme catalysis, investigators have already begun to apply a combination of ab initio quantum mechanics and molecular dynamucs (Rao et al., 1987~. This will generate new problems to be solved. As noted, a major problem will be encountered for those enzyme reactions in which a chemical bond Is formed between enzyme and substrate in an intermediate step. It will also be necessary to establish the magnitude of the systematic error caused by transfer of a quantum mechanical result obtained without salvation, not only to a solvated situation, but also to a protein active site environment, where the rates are much enhanced. Some very careful work will be required before this approach can he applied reliably to enzyme mechanisms. However, caveats notwithstanding, these studies are worth doing. DESIGNING NEW PROTEIN STRUCTURES The following section discusses the future of protein design, which is one of the key areas of growth in macromolecular model- ~ng. In the same way as I,evinthal's (1966) pioneering studies initi- ated computer-assisted modeling, Richardson's (1981) work on the anatomy and taxonomy of proteins signaled the transition from molecular archeology to molecular design, for protein chemists. We call our drug design project computer-assisted molecular design there are more types of molecules than proteins. Richardson pre- sented a framework for understanding the organization of protein architecture. This framework condenses the observations of pro- tein structure and architecture from individual structure solutions into a form that allows us to think of protein architecture as manipulable. Site-directed mutagenesis of protein structure is a simple way of altering proteins without really altering protein ar- chitecture. The technique of producing chimeric proteins by gross manipulation of gene structure is another early approach to the

OCR for page 106
109 Rlbosomal Translation _ of RNA Into protein RNA transcription and editing DNA gene design and construct/on - \ Protein folding Property analyst of the mature folded protein CAMD - Computer Alded Molecular Design / FIGURE 7-1 The cycle of protein design and expression. manipulation of protein architecture. At this point, the problems involved in designing new proteins are formidable. Consider the process of designing an ordinary protease. Proteases typically have 250 amino acids. Since there are 20 amino acids, there are 20250 possible proteins of length 250. The only sensible way to reduce the number of possible proteins to one design is to use several lev- els of decomposition that specify how portions of the protein are to be organized architecturally, spatially, and functionally. Our understanding of the rules of thumb triggered by the Richardson paper is now evolving rapidly. Protein design cannot be divorced from the issues of protein expression and folding. The complete cycle for the design, ex- pression, and folding of proteins can be represented as in Figure 7-1. To be able to design a protein effectively, one must be able to traverse this design cycle rapidly and often. At present, several important conceptual problems prevent us from completing this cycle at all. Suppose that we could design a hypothetical protein using the architectural concepts. The output would be a three- dimensional mode! of the protein embodying a particular amino acid sequence. Given this hypothetical design, the next task would be to construct a gene. The problem here is that the amino acid sequence of the hypothetical protein, in general, specifies only two of the three bases in each codon of the gene. One way to resolve this issue is to choose a random third base (see Figure 7-2~.

OCR for page 106
110 1 1 1 1 o - - ~o - o - - 0 r 0 / S 0 0 - 0 n - 0 0, s~ s 0 0 c 0 - . - 0 ~0 C ~ cr 0 u, 0 T5 o 0 0 0 a,- ~ ~ 0 \ O ~ ~ . O .D ~ 0 ~0 to ~ 0 ~ o' a, o' 0 ~ 0 o' ~ _ ~5 _ 0 a~ 0 ~0 ~0 _ 0 0 s ~s ~s ~ 00 ~ + + + + + + + + 0 _ 0 _ _ - 0 a, ~0 - 0 ~ ~ ~ _ sO 0 C n. ' ~ ~ ~ 0 ~ o0 ~o 0 c ms 0 ~ 0 _ ,, ~ E s `,, E ~n tn 0 C) 3 - o 0 - - 0 o 1 ~: * , * 1 __ 1 0 ,, 1 o 0 a, s l , o - - Z O: *, , oo 0 C:l 4) z a) a' - L) S0 0 _ ~ . . o 0 0 3 0 a0, s oi ~ s, O 0 0 - O ' E .} n 0 `. s ~ - 1- o s ~ f - 0 . 0 ~ ~ _ ~ tn ~ ~ ' c _ ~' ~ _ ~ 0 _ > ~ ' ~ 0 _ _ co ~ 0 ~ 1 1 1 1 o' 0 - o 0 - 0 0 ' c) o ~ ~ ~ =~ o ~( 1 1++ - 1 1++ '++ 1 1) ++ 1 1 -c~ ',a, C)-0 ~0 QZ ~Z 0 ~a: 0 o Q 0 o :~ ~: d ._ o ._ 4. ._ o 3 o 0 4 - o L' 0 4. C~ o 4, ._ _ ~ oo 0 L L~ P. ~ Y 0 c~ a, C~ ~ ._ _ ~ 0 ~ o ~ s. ~ P4

OCR for page 106
111 Once the DNA of the constructed gene has been transcribed to messenger RNA and edited, there is a linear sequence of three- base codes that, as Nirenberg (1965) has described, exist in 64 combinations of the bases. However, the 64 codons code for only 20 amino acids, so there is, at this point in the cycle, a surplus of information. The rite osome translates the messenger RNA codons into nascent polypeptide. The Anfinsen (1975) experiment involving the denaturation and renaturation of ribonuclease has been used to convince us that proteins fold into their active three-dimensional structure solely on the basis of the information contained in their sequence. Many scientists have been trying for the last two decades to predict the secondary and tertiary structures of proteins from the amino acid sequences alone. Their efforts have met with partial success at best. We lack information about the protein folding portion of the design cycle. The surpluses of information (denoted by pluses in the upper portion of Figure 7-2) and the deficiencies of information (denoted by the minuses) can be abstracted to form the cycle pattern in the lower portion of the same figure. Clearly, our current perception of the design cycle is flawed. Recent experiments show, for example, that a gene that is moved from its native host to another expression vector does not necessarily produce well-folded proteins. Even when the codon utilization statistics of the new expression vector are mimicked, complete protein folding does not necessarily occur. This suggests that third base redundancy may be partially used to control protein folding, especially for complex proteins. Experiments should be designed to explore how third base redundancy influences protein folding. With such data, the in- formation from a hypothetical protein design could be used to properly construct the DNA of a gene. A proper gene would then transcribe and translate properly to yield a polypeptide that folds properly. If these conditions were met, the design cycle abstraction would be as shown In Figure 7-3. If the information flow around the protein design and imple- mentation cycle is preserved, then it should be possible for protein engineers to rapidly traverse this cycle in the design and perfection of novel proteins.

OCR for page 106
112 RNA Translation + + '++ DNA ~ + Gene Design + + + + Protein + + Folding + +~,CAMD + +, Computer Alded Molecular Design FIGURE 7-3 The ideal pattern of flow of information in the cycle of protein design and expression. Computer Representation Computer-assisted modeling of molecules has been evolving since the original work of Levinthal (1966~. His work was the first time that a computer, a PDP-1 from the then-infant Digital Equips ment Corporation (DEC), was used to draw the three-dimensional structure of a small organic molecule. It used one line segment to represent each chern~cal bond. With simple software controls, the molecule could be rotated in space and redisplayed. Similarly, the conformation of the molecule could be changed by rotating one portion of the molecule around a bond that formed an isthmus between it and the remainder of the molecule. The display of the molecule was done in pairs of unages where the unage of one molecule was rotated 5 degrees around the vertical axis. This pro- duced a stereoscopic effect that permitted the three-dimensional structure of the molecule to be perceived without having to rotate it continually. All of molecular graphics has simply been an exten- sion and refinement of these powerful ideas. The number of line segments drawn per second has risen dramatically. Color has been added. Hardware stereo devices have been developed, and very recently powerful array processors have been added to permit the rapid calculation of molecular energetics during modeling. These techniques for display and modeling were developed and refined in a few academic research laboratories. They began to diEuse to biochemical and genetics laboratories in academic and industrial institutions worldwide. Over the past 20 years, the manufacturers of computer and graphics hardware have begun to recognize that molecular graphics and modeling is a substantial market. We are now at the critical point in this respect. Hard- ware manufacturers are now willing to design workstations (i.e. integrated computational and graphics machines for individual

OCR for page 106
113 use) for the molecular modeling and design market. A new class of workstations is expected in the next year. Two members of the class of personal supercomputers (PSCs) have been identified, and collaborations are in place to insure that these machines, when they enter the commercial market, will be fully conditioned ~chem- istry engines". The four functions, molecular energy computation, molecular configuration control, molecular graphics, and reason- ing about molecular structure, will be integrated in one computer system. The PSCs will provide a nearly ideal package for mass dis- tribution of CAMD capabilities. Market forces can be expected to expand the number of different machines and the features that each machine offers. Standards at various levels, an defined by the International Standards Organization (ISO), will permit ex- isting and new program systems to be transported rapidly onto the PSC class members. The standardization efforts will permit a decoupling of the computational support systems (i.e. hardware, graphics, operating systems, and the molecular modeling and de- sign programs) from the intellectual uses of such systems. The existence of standards, however, does not guarantee portable program code. Scientists who write new programs must know about these standards and write programs that conform to them. Commercial organizations that take existing scientific pro- grams should shape them towards the standard style because, in the end, the size of the commercial market will depend on the abil- ity of end users to piece together working systems from components made out of various standard programs. If these standardization efforts succeed, then in the future, the molecular modeling com- munity will be able to routinely make smooth transitions to more powerful computer support systems. Computer graphics representations offer alternative ways of understanding molecular structure and function. They started as the simplest white line drawings on black screens, then progressed to color images, to solid surfaces, to dot surfaces, and to electro- static surfaces. Intergraph three-dimensional representations and white light hologram representations have been developed and used for molecular structure problems. Intergraph is composed of approximately 20 individual photographs where vertical strips are selected from each photograph and composed into one image. The composite image is viewed through a linear fresne! lens. The

OCR for page 106
114 next generation of workstation, the PSC, wiD offer ray-traced im- ages as part of the operating system. In a ray traced image the reflections on a surface are compared by calculating the trajec- tory of light beams from all possible light sources. This produces in the extreme the reflections of one object on another. A truly three-~unensional representation where the molecule would actu- ally occupy three-dimensional space is needed. A breakthrough in a field such as plasma physics is necessary to make this a reality. l~pe~llments to Progress . The central bottleneck to progress in protein design is our inability to predict protein tertiary structure from amino acid sequence. The notion put forth by Anfinsen 25 years ago was that the amino acid sequence alone determines tertiary structure. This notion may be too simplistic, and there may indeed be a higher level code than the Nirenberg nucleic acid to amino acid conversion by the ribosome. Since the Anfinsen conjecture and the experimental detail surrounding it are largely prohibitory in nature, they had the effect of discouraging experunentation in expression and folding of proteins. Scientists who are concerned with protein expression are content with the Nirenberg code and explain away anomalous results because they see no need for any other effect. Scientists concerned with protein folding cannot ex- plain how proteins fold, but then are discouraged by the Anfinsen conjecture from asking for more information from the geneticists. A theory and experiment linking codon utilization in gene struc- ture with the folding of protein structure would be a major step toward reconciling these views. PREDICTING FUNCTION FROM A PREDICTED THRE:~DIMENSIONAI STRUCTURE In principle, the information needed to predict the function of a biological macromolecule is encoded in its thre - dimensional structure. We assume that we must know the three-dimensional structure of a macromolecule before we can fully understand its function. The problem is, how do we decode the rules that govern the relationship between structure and function? A subset of this problem will be discussed below: the prediction of the change in

OCR for page 106
115 the functioning of a protein that results from the binding of a ligand. Recently, the possibility of computer-assisted drug design based on the three-dimensional structure of target biomolecules has received much attention in the scientific literature (Beddell, 1984; Goo~ford, 1984; Hol, 1986~. Those in the field believe that medicinal chemistry is pomed to undergo a revolution as dramatic as the events in the 1950s and 1960s that transformed organic chemistry from a descriptive to a predictive science. Since we are at the beginning of a new age, the many challenges ahead do not diminish the excitement of knowing that the solutions are also on the horizon. We have a sense that, at last, we know what it is that we have to learn and have at least the rud~rnents of the necessary tools at hand. In anticipating this revolution, we are presupposing that we can or soon will be able to predict the functions of proteins from their structures. In particular, we would need to be able to pre- dict the ability of a protein to recognize and bind a ligand and to predict the structure of the Optimum ligand. Beyond that, however, we would need to be able to predict how the protein car- ries out its function and how it recognizes and interacts with other macromolecules to alter its own functions and theirs. Although we have learned much about these topics, there are unanswered questions that we must be able to answer before we will be able to make accurate predictions. Experience in Ligated Design Tom E~eranental Protein Structures One illustration of our current state of achievement is given by work on hemoglobin. Ligands affect the function and properties of hemoglobin in complex ways. Investigators began to attempt to design ligands based on the three-dimensional structure of a protein as soon as such structures were available. In the early 1970s, the group headed by Goo~ford at WelIcome Laboratories in England began to explore the possibilities of ligand design by receptor fit (Beddell, 1984; Goo~ford, 1984~. They used the structure of hemoglobin as determined by protein crystallography and constructed a wire mode! that was hinged so that they could examine both the oxy- and deoxy- states.

OCR for page 106
116 The first studies involved the design of ligands (using mechan- ical models) to fit the diphosphoglycerate (Figure 7-4, compound 1) binding site and then to mimic its function. The investiga- tors used simple concepts of complementary shapes, electrostatic interactions. and possible covalent bonds. The designed com pounds (Figure 7-4, designated compounds 2-4) do indeed mimic the effect of diphosphoglycerate on the dmsociation of oxygen from hemoglobin. Subsequent crystallographic work supported the pro- posed binding mode. In addition, the relative binding energy of various analogues to a number of different hemoglobins was mea- sured for 29 protein-inhibitor combinations. Statistical analysis revealed a highly significant correlation between the strength of binding and the number of covalent and ionic interactions. The use of computer graphics for the design would have accelerated this process since it took three months to construct the Dhv~i~:~ wire mode! of the protein. _ , ,, _ This work was then expanded in an attempt to design a com- pound for the treatment of sickle cell anemia. The goal was to develop a compound that would affect the oxygen-~sociation curve in a way opposite to that of diphosphoglycerate. An in- tensive biochemical, physiological, and structural examination of the problem suggested that a ligand that binds between the alpha subunits of oxyhemoglobin might have the desired effect. Since no natural ligand for this site was known, the ligands were designed from the protein structure alone and designated compounds 5 and 6 (see Figure 7-4). Although the proposed binding mode has not been experimentally verified, the designed compounds did pros duce the expected change in function of hemoglobin. One of the compounds is now in clinical trials for the treatment of sickle cell disease. Thus, using rather primitive tools, the Welicome group was able to predict the effect of a small molecule on the function of a protein. The recent experience of Perutz et al. (1986) emphasizes both the important accomplishment made by these workers and the limits of our molecular understanding. Perutz and coworkers ex- perimentally demonstrated several of the potential binding sites that a molecule might recognize in hemoglobin. Specifically, they solved the crystal structure of eight ligand-hemogiobin complexes and showed that there are at least six different positions on the protein at which a ligand night form a tight complex. Since the ligands were selected on the basis of their perceived structural

OCR for page 106
120 proteinase class (Anonymous, 1986; Boger, 1986). Such models suggested the structures of new inhibitors. The compounds were shown to be potent inhibitors In viva as well as in vitro. Approximate target macromolecule structures have also been used to design new agents. The classic example is the design of captopri! (Figure 7-4, compound 9), an inhibitor of angiotensin- converting enzyme and a clinically successful antihypertensive agent (Petrillo, 1982~. Captopri} was designed from a proposed structure of the substrate when bound to the enzyme. The struc- ture of the enzyme was assumed to be similar to that of car- boxypeptidase A because of mechanistic similarities between the two enzymes. Inferring Binding Sites Much of this section has addressed issues related to detern~n- ing and analyzing the structures of proteins of known sequence but unknown three-dimensional structure. Once these structures are known, detailed studies can be carried out of the relationships between those structures and the corresponding functions of a protein. Proteins express their functions through binding of other molecules, often termed effecters, with or without concomitant transformation of the effecter, e.g., degradation or chemical reac- tions at functional groups. We have discussed structure/activity studies of the binding of effecter molecules to putative sites in a protein of known structure. A separate body of research has focused on a complementary problem: relating the structures of several effecter molecules to one another ~ order to determine information about binding sites, often termed active sites, in pro- teins of unknown structure. When such studies are successful, they can obviously provide structural information about a protein that can be used in conjunction with some of the techniques for structure determunation discussed earlier in this report. The general problem of inferring binding sites can be stated simply. Given a set of molecules that are presumed to bind to the same site in a given protein of unknown structure, we must infer the size, shape, and binding characteristics of the active site. Several problems are subsidiary to this general problem. We mention them here, but a detailed analysis is beyond the scope of this report. For example, one must consider the process of recognition of the effecter molecules prior to the actual binding in

OCR for page 106
121 the active site. One must consider the possibility of conformational changes of both an effecter and an active site during recognition and binding. One must perform very careful studies to ensure that the measured biological or chemical responses for several effecters are In fact due to binding in the same active site. A useful introduction to this research area, with leading references, has been presented (Olson and Kristoffersen, 1979~. At least three approaches have been used to infer the structure of receptor sites: the receptor mapping aproach of Humber et al. (1979~; the active analog approach of Marshall et al. (1979~; the DYLOMMS program of Wise et al. (1983~. These approaches are closely related. Ad follow the same basic principles, but in different ways. All begin with the assumption that similar effecters possess related pharmacophoric patterns, i.e., similar dispositions in three-dimensional space of similar structural features important for binding. Independent studies are used to postulate pharmacophoric patterns of active molecules, generally using the most conformationally rigid molecules to form hypothe- ses. Once such a pattern is assigned, all possible conformations of each effecter are examined to determine if there are low energy conformations that present the pattern. This both tests the hy- pothetical pattern and begins building a set of molecules that can he superimposed based on the pattern. Once superpositions are _~r ~ ~ established, the volume occupied by the molecules can be used ~ ~ ~Molecules of related struc to define the cavity of the active site ture that can yield the pharmacophoric pattern, but that display no activity, can be used to define the wads of the cavity, further elaborating its shape. Recently, practicing medicinal chemists have become enthu- siastic about these uses of computational and computer graphics techniques to compare the three-dimensional structures of ligands that bind to a receptor. They use the common features of the aliened structures to propose tentative maps of the receptor to ~ ~ ~ ~ ~ _ ~ ~ ~ ~ ~ ~ ~ ~ O pography. these maps are then usea so clergy slow ~UtIl~1~a (Chose and Crippen 1985; Hopfinger' 1985; Humblet and Mar- shall, 1981~. These techniques have benefited from the knowledge gained through protein crystallography. In particular, current applications of receptor-mapping methods usually compare the lo- cation of the projection of ligand atoms to possible binding sites,

OCR for page 106
122 rather than identifying the location of the ligand atoms themselves as had been done previously. The final category of computer-a~isted prediction of the bi- ological properties of a smaH molecule ~ also the oldest. This type of methodology, Quantitative Structure/Activity Relation- ships (QSAR) uses statistical or pattern recognition methods to explore the possible relationship between the biological and phyla ical or substructural (presence or absence of certain functional groups) properties of molecules. Given the known utility of QSAR methodology to predict the potency of untested analogues (Hopfin- ger, 1985; Martin, 1981), it is important that the developers of this methodology are actively pursuing the challenge of evaluating the reliability of linear free-energy equations for cases in which the protein structure Is known. In the case of dihydrofolate reductase, several investigators have compared the conclusions from QSAR and molecular graphics modeling of the inhibitors (Blaney et al., 1984~. The conclusions derives! from the two methods agree closely, confirming the proposal that the QSAR equations contain infor- mation about the types of noncovalent interactions between the inhibitors and the enzyme. However, a major advantage of QSAR over other computer-based methodologies is that one can attempt to develop equations for any biological response. For example, equations have been developed for the enzyme inhibition, an- tibacterial, and whole-an~rnal antitumor activity of dihydrofolate reductase inhibitors (Blaney et al., 1984~. Thus, QSAR is a logical complement to the more structure-based computer methodologies. It could be used to mode! the potential whole-animal activity of new ligands and perhaps to search for unanticipated interactions with other macromolecules. Compllter Toole for Ligand Design Tom Three-Dimensional Protein Strllctllre The recent excitement in computer-ass~ted drug design has arisen because scientists now have available the elements of each of the important tools for such an activity. Two types of computer hardware are necessary: high-speed color graphics and affordable but powerful computers dedicated to modeling. In addition, a growing body of data on the three-dimensional structure of pro- teins ~ becoming available, as our understanding increases of some of the relationships between structure and function of proteins.

OCR for page 106
123 Finally, software is also available for the graphics display of the molecules and for modeling the energetics and thermodynamics of the binding. Specialized graphics tools for molecular design have also been developed. Some of these arose from the related activity of docking a known ligand into a protein. The display of the surface of the binding site is more useful for ligand design if it is color- coded to suggest the preferred type of noncovalent interaction at that point in space. For example, through such Replays we can distinguish between surfaces near positively charged, negatively charged, hydrogen-bond accepting, hydrogen-bond donating, and hydrophobic regions of the protein. Another helpful tool used with the graphics display is the immediate read-out of energy values as the ligand is docked into a putative binding site and as bonds in the ligand and/or the protein are rotated to facilitate the docking. Design of ligands at the computer screen Is aided by stereoscopic viewing devices and implements that allow one to move an object being displayed (such as a ligand) in three dimensions while keeping the rest of the display as it was. Experience has shown that molecular mechanics energy minimizations are necessary to evaluate the geometry and energy of the proposed complexes (Pincus and Scheraga, 19793. It was noted previously that one persistent but often hidden problem in ligand design is that a ligand may bind to a protein in a different orientation or at a totally different site than the investigator anticipated. Kuntz et al. (1982) have devised a com- puterized means of evaluating such possibilities based on shape alone. The design of a new ligand molecule is aided by the graphics display of the energetically preferred sites on the protein for in- teraction with various types of possible ligand atoms (Goo~ford, 1985~. Such sites are identified as the energy of interaction of the probe atom at each point on a three-dimensional grid surrounding a protein. The ligand would be designed to interact at as many of these sites as feasible. Once a proposed ligand is designed, its thermodynamics of binding can be predicted with the free-energy perturbation method if it is a reasonably close analogue of a known compound. If there are data on the relative energy of binding of other ligands to the protein, a QSAR or receptor mapping analysis

OCR for page 106
124 discussed above may suggest regions on the target that are con- formationally more flexible than the experunental structure may suggest. QSAR (or at least consideration of physical properties) is expected to also be useful in the design of ligands that will have the appropriate whole-animal properties. Impediments to Ligand Design from Protein Structures Proteins are conformationally mobile. They are not the static structures that the graphics display of the crystal structure sug- gests. For example, molecular dynamics calculations on myogiobin have shown that within 300 picoseconds, 2,000 different conforma- tional minnna are sampled (Elber and Karplus, 1987~. The root- mean-square difference in the location of the atoms in the most different structures is 2 A; this means that many atoms move substantially more than that. Proteins also change conformation when ligands are bound to them; hence, ligand design methodologies must be able accurately to predict such movements. For example, when the antiviral com- pound VIN 52084 ~ bound to the human rhinovirus, 13 residues of the protein undergo measurable conformational change (Smith et al., 1986b). The main chain moves as much as 3 A, the channel to the binding pocket opens to the solution, the isoelectric point of the system changes from 6.9 to 7.l, and the occupancy of Ca++ at a distant point on the virus increases. Conformational responses to ligand binding may be part of the function of the protein. For example, in response to Ca++, the channel-forming proteins of the gap junction between cells show small cooperative rearrangements of the relative orientation of the subunits. This rearrangement results in the narrowing of the di- ameter of the Ca++ channel within the cell by 18 ~ and thus closes the channel to Ca++ passage (Unwin and Ennis, 1984~. During this rearrangement, the conformation of each subunit does not change appreciably, only the orientation of each subunit changes with respect to the others. Conformational responses to ligand binding may form the ba- sis of the selectivity of ligands for very similar proteins. Evidence from crystallography, QSAR, and molecular graphics suggests that conformational changer in the enzyme in response to the bin(ling of ligands is responsible for the selectivity of trirnethoprim for bac- terial dihydrofolate reductases in contrast to vertebrate enzymes

OCR for page 106
125 (Blaney et al., 1984~. In the chicken liver enzyme, a tyrosine residue moves 5.4 ~ in response to the binding of trunethoprim. Since there is no experimentally established three-dimensional structure of a membrane-bound receptor, for this type of protein we depend on indirect observation and inference for our notions about conformation and conformational changes in response to ligand binding. Current concepts of receptor function usually invoke a conformational change as part of the transduction of the signal of a binding event into the ultimate biochemical and physiological response. Thus, it is possible that the regulatory and second messenger binding sites on receptor proteins might become available only in the presence of the ligand. Furthermore, that certain compounds only partially activate a receptor suggests the possibility that a whole family of receptor conformations is available. Thus, to use protein structure design a ligand that influences the action of a protein whose function requires more than one conformation or in which the putative binding site is very flexible, we would like to know the relevant three-dimensional structures of that protein and be able to predict the conditions under which each is stable. In other words, we would find it difficult to predict the function of a new ligand unless we had available structures of these protein conformations. We see this as a problem that will require at least as much study as the problem of finding the global rn~nimum energy structure. The ligand binds to a protein that is part of a system. In solution, a protein is part of a complex with water, ions, and cofactors. Alternatively, it may function while interacting with a membrane. These other species affect the strength of binding of the ligands of interest. For example, trimethoprim binds to dihydrofolate reductase with a lO,OO~fold increase of affinity in the presence of its cofactor compared to its absence. The covalent structures of some proteins are modified during the course of their function. The large family of receptor kineses are responsible for phosphorylatation of receptors as a means of regulating that receptor function. Thus, the addition of a single phosphate group to a protein can dramatically alter its function. Other proteins are not functional until they are structurally modified after their synthesis. For example, sperm binds to its receptors on the egg only if these receptors are glycosylated. Addi- tionally, posttranslational processing can impart subtle variations

OCR for page 106
126 in properties to a protein. For example, it is thought that the benzodiazepine receptor ~ the same protein throughout the brain, but that it is glycosylated to a different extent in different regions of the brain. These differences in glycosylation are reflected in different relative affinities of the receptor for various ligands. Thus, for accurate and realistic models on which to base theo- retical ligand design, we need to be able to include such other species in the calculation. Unfortunately, there are often not molecular mechanics parameters for such cofactors and transition metals. Including these additional ions and molecules increases the complexity and time of the calculation enormously, partly because the number of atoms is increased but more dramatically because the search for the stable arrangement of atoms is much more com- plicated. This ~ the multiple-minimum problem, but with even fewer experimental constraints on the solution of the problem. Fur- thermore, we cannot use traditional molecular mechanics concepts for transition metals because they undergo changes in oxidation and spin states that dramatically affect the optimum geometric arrangement of ligands. To include such ions, we need a combi- nation quantum and molecular mechanics calculation. Although progress has been made in such.calculations (Warshel, 1981; Singh and KoHman, 1986), they still need refinement and testing and tend to be calculations that strain available computers. Thus, we see promise that the tools required wfl} be available, but they are not yet in routine use. There may be more than one binding made for the ligand. The experience with the binding of ligands to hemoglobin and the dif- ferent binding orientations of methotrexate (Figure 7-4, structure 10) and dihydrofolate (Figure 7-4, structure 11) to dihydrofo- late reductase highlight this problem (Blaney et al., 1984~. The method of matching ligand shapes to protein cavities is helpful in predicting such alternate binding modes. However, it is currently limited because it considers only the correspondence of the shape of the ligand and the binding site and not their possible flexibility or electrostatic and hydrophobic contributions to binding energy. In principle, this problem could be solved by examining the rela- tive energy of all potential conformations of the protein and the ligand and all potential relative orientations of the two. As notes] above, for such calculations water and cofactor molecules and am sociate`1 ions should aLso be included. Even if there are only two conformations of the protein each with two binding sites and two

OCR for page 106
127 conformations or enantiomers of the ligand, the problem increases eight-fold! The challenge escalates when we consider that, in drug design, we would like to consider many possible analogues for synthesis. Thus, much more sophisticated techniques for pruning conformational and orientation hyperspace need to be developed before detailed calculations of this magnitude will be possible. Even if we could predict the mode and strength of binding of a ligand to a protein, the effect of such binding on the function of the protein in the cell might not be obvious. The simplest case would seem to be the design of an enzyme inhibitor. If an enzyme is in- hibited, we would expect that fewer substrate molecules would be transformed in a given unit of time. However, this is not necessar- ily true. For example, current evidence is that receptor kineses are present in the cell in high concentrations: the rate of phosphoryla- tion of the receptor is apparently governed by the concentration of the cyclic nucleotide and the conformational state of the receptor and not the level of the enzyme. Inhibition of such an enzyme by even 90 percent might have no observable physiological effect. In other cases, the level of a particular enzymatic activity is regu- lated by feedback control. Inhibition of such an enzyme would be overcome by production of more enzyme. Alternatively, inhibition of an enzyme might simply lead to the presence of higher levels of substrate but the same rate of turn-over of substrate through the biochemical system. The physiological effects of such agents may be impossible to predict. The situation is even more complex in proteins that have multiple domains that control multiple functions. A compound that prevented sickling of hemoglobin S would be useless as a drug if it also prevented oxygen binding or release or if, when bound, it promoted the crystallization of hemoglobin in a different crystal form. A further complication in trying to understand function from structure is that a single protein may Interact with several small molecules and other proteins in a complex regulatory scheme. Dif- ferent subunits of domains of a protein may have different but interrelated functions. For example, ald four subunits of Torpedo californica acety~choline receptor are necessary to elicit a nicotinic response to acety~choline, whereas only the alpha subunit is re- quired for binding the antagonist alpha~bungarotoxin (Mishina et al., 19843. Thus, the structure of the alpha subunit might help in the design of a ligand, but the structure and function of all four

OCR for page 106
128 subunits might be needed to predict whether the compound would be an agonist or antagonist. Other factors might make a ligand useless as a therapeutic agent. When a ligand is administered to an animal, it must survive the metabolic and structural defenses of the anunal in order to reach its proposed site of action at the required concentration. The ligand may be a substrate for any one of many enzymes, some of which appear to have evolved broad specificity in order to metabolize foreign substances and thereby protect the organism from its unpredictable environment. Ultimately, we expect to be able to predict the biotransformations of small molecules from the structures of the enzymes involved, but we cannot do so today. The ligand may also fortuitously bind to other macromolecules in the body and, as a result, may not be available to the target protein. The ligand may have the correct physical and chemical properties to be rapidly excreted into the urine or bile before it has a chance to move to its target. Finally, the ligand may be so slightly soluble that it cannot achieve high enough concentrations in the blood or gastrointestinal tract for it to be distributed to its site of action. Again, we have some informal rules that allow us to attack these problems, but lack the basic knowledge we need to make true predictions. A ligand might also be useless in curing disease because it or one of its metabolites produces toxicity in the animal. To use a ligand as a drug, it must be technically feasible to do so. This means that it must be possible to produce the compound in the required quantities and purity; it must be stable enough to ship to the patient; and an acceptable pharmaceutical form of the compound must be devised. A major advance has been made in the computer-assisted design of pathways for the synthesis of compounds. However, further enhancements would make this too! even more useful. Economic factors also figure into feasibility; if the compound is to be sold, the patentability of the compound, the cost of its manufacture, the cost and effectiveness of competing therapy, and the expected incidence of the disease for which it is effective will also be issues in the decision to market the compound. Other complications may also emerge when one is predicting function or designing ligands from predicted three-dimensional protein structures. First, the confidence in the exact coordinates of the protein structures will be lower. This greater uncertainty

OCR for page 106
129 will complicate the investigation of proposed function or the design of ligands because the exact dimensions of the possible binding sites watt be uncertain, as will the conformation of residues on the surface of the protein. In principle, these questions can be answered using extensive molecular dynamics and minimization calculations. The prediction of function might be straightforward if the unknown protein shows a strong sequence homology with a protein of known structure and function. Another complication with the use of predicted structures is that we may be unaware of posttranslational modifications of the structure. Ultimately, we expect to be able to predict such modifications from the substrate specificities of the enzymes that perform them. However, we cannot do so today. Consideration of protein structures based on DNA sequence may obscure the fact that the protein may function as part of a multisubunit assembly. Multiple subunit proteins are common. To predict the function of such a protein, we must realize that it binds to the other subunits. It is not enough to consider other proteins coded on the same chromosome; the genes that code for the two different protein chains that form the subunits of hemoglobin are located on different chromosomes. Hemoglobin illustrates a further complication in using DNA sequences: there are at least four different variants of the beta subunit. Only one of these is produced in quantity by the organism. Thus, to predict the function of the alpha chain of hemoglobin, we would need to recognize that it functions in a tetrameric structure with two subunits of a different type, and that, of those with which the alpha subunits could bind, only the beta subunit is produced in appreciable quantity. The transcribed protein may have one activity and be trans- formed into a product that has a different activity. Peptide hor- mones usually arise by the limited hydrolysis of a larger protein that circulates in serum. Sometimes the same carrier protein can be cleaved at different sites to produce different peptide hormones. At present, we cannot predict such events. Only when we know the sequence of every peptide hormone would we be able to recognize the potential for a particular protein to be a carrier of a hormone. In summary we do not adequately understand the relationship between the details of the three-dimensional structure of a pro- tein and its function. Without such an understanding, we cannot predict the effect that a bound ligand will have on the function

OCR for page 106
130 of the protein. We lack this understanding partly because three- dimensional structures of proteins have been determined only re- cently, and molecular graphics hardware and software are also newly available to experimental scientists. But in many cases, we do not know the three-~nnensional structure of the protein of in- terest, nor do we have a good idea of all of its functions. We know even less about the relationship between structure and function of carbohydrates, because we have so little structural information on them. This is a problem that wiB not be solved in the short-term. While there are methods to predict the potency of molecules once a structure is suggested, we need better took for molecular design to help the chemist suggest molecules to examine exper- imentally or theoretically. The tools described above are pruni- tive. Although some methods are available to match candidate molecules against proposed shape requirements for binding, it is not possible to also specify the chemical properties of the clesigned compound with existing software. The current methods process a file of three-~nnensional coordinates of candidate molecules; this file is generated from experimental or theoretical studies and so is incomplete. Additionally, we cannot automatically compare a compound proposed by a computer program with those already in the world literature as tested for that activity, nor can we auto- matically detect if the proposed compound is iclentical or similar to compounds known to have some biological activity deleterious to that desired. It is expected that many of these tools will be developer} rather soon.