Click for next page ( 2


The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 1
1 Executive Summary In much of biology, the search for understanding the relation between structure and function is now taking place at the macro- molecular level. Proteins, nucleic acids, and polysaccharides are macromolecules- polymers formed from families of simpler sub- units. Because of their size and complexity, the polymers are capable of both inter- and intramolecular interactions. These in- teractions confer upon the polymers distinctive three-dimensional shapes. These tertiary configurations, in turn, determine the func- tion of the macromolecule. A molecular view of biological function has already led to significant advances. The conceptual breakthrough that led to our present mastery of genetic control of protein synthesis was the discovery that the nucleotide sequence in nucleic acids codes for the amino acid sequence of the protein being synthesized. The Rosetta Stone of molecular genetics was the elucidation of the actual code, whereby the sequence of trimers (codons) in the nucleic acid can be translated to the sequence of amino acids in the protein. Amino acid sequences in proteins can be determined directly through chemical analysis or indirectly by deterrn~ning the base sequence in the parent DNA. These two methods have been used to describe the primary structures (amino acid sequences) of sum stantial numbers of proteins. But to understand the function of a 1

OCR for page 1
2 protein, we need to know more than its primary structure. Except, perhaps, for some structural elements, proteins are not found in nature as simple chains of amino acids. In their biologically active form, they are folded upon theA-nseIves, forming complex three- dimensional structures. Their shape determines their biological activity. Currently, determining the three-dimensional structure of macromolecules depends on the interplay of various experimen- taA and theoretical approaches, particularly x-ray diffraction and two-dimensional nuclear magnetic resonance (NMR), all of which involve the use of computers. At their most fundamental level, computers simply function to solve mathematical equations at very high speeds. Their speed makes it possible to accomplish in fractions of a second tasks that wouIc} take a human orders of mag- nitude longer, even with the assistance of mechanical calculators. Computers are also used to generate graphic representations of the three-dimensional structures of molecules, thereby aiding compre- hension and largely eliminating the need to construct physical models another laborious and time-consumAng process. These graphic representations provide an Amp art ant stimulus to the de- velopment of new ideas about the way macromolecules function. Computers have become so inextricably involved in empir- ical studies of three-~A-rensional macromolecular structure that mathematical modeling, or theory, and experimental approaches are interrelated aspects of a single enterprise. The experimental methods, such as x-ray crystallography, NMR spectroscopy, and mass spectrometry provide the data with which to construct a mathematical mode! that can account for the electron density dim tributions, bond angles, bond energies, and other observed struc- tural properties of the molecule. Conversely, the mathematical mode! must generate a structure that agrees with the experimen- tal data. The interplay between the two As continual; theoretical models are modified repeatedly to improve their fit with experi- mental data, and theoretical results helD in the int.-rr~r^~;~ planning of experiments. ~.. v_^ ram ~ u~uAVAA "eAlu The potential practical applications of these techniques are myriad. When can the Dav`~ff he "Y^~^t^A' wane ___ AL _ _ _` _ _ ~ _ ~ ~ . _ _ _ Azalea neeas In terms of hardware, software, and human resources? To what extent should the effort be centralized? Is present funding adequate or should it be increased? The task of the committee was to examine these questions

OCR for page 1
3 as they relate to two major realms of theoretical molecular biol- ogy. The first is the prediction of tertiary structure from primary structure and other physical/chemical data. The second is the pre- diction of biological activity from tertiary structure and related data and theory. PROTEINS Primary Structure More than 5,000 protein amino acid sequences have been re- ported, most of which were inferred from the DNA sequences that encode them. Although the collection is redundant (same protein from different species) and definitely biased (many human and few plant sequences, for example), several patterns stand out. The most prominent is that the number of different types of protein is not endless. It is clear that most proteins belong to identifiable families, easily recognized by their amino acid sequences alone. But surprisingly, the same families of protein primary structures are showing up in proteins in quite different settings. At this point it is not possible to determine, with accuracy, a three-dimensional structure of a protein using only the amino acid sequence. However, recognition of patterns in structure-sequence correlations holds great promise in this area. Predicting Secondary Structure from Amino Acid Sequences The methods used to predict secondary structure from amino acid sequences have been (1) calculation of the energies of the major conformers for a given sequence, (2) statistical analysis of known structures, and (3) modeling. Although these methods have had some success and should continue to improve with an increasing data base, they have limited accuracy. It is now computationally feasible to calculate energies for conformations of short peptides with or without solvent. The near-term developments in this area are most likely to be incre- mental improvements. Computer speed will continue to increase substantially. Data bases will continue to grow at least linearly. Experiments on the structural consequences of modifying amino acids are beginning to be reported in significant numbers. More powerful statistical and modeling efforts are under development.

OCR for page 1
4 The situation is lem positive for larger peptides and proteins. After 10 years, however, we might well see major improvements in our ability to correlate sequences and secondary structure. To- day's goal of correctly predicting every major feature in a new sequence is a plausible target for that time. However, we will need a major conceptual or computational breakthrough before it will be possible to specify accurately the secondary structural configuration of each amino acid in a protein. Deriving Thre - dimensional Structure and Function Mom Homology Methods for identifying homologies and for determining se- quence alignments and homologies are powerful tools to relate the structures and, potentially, the functions of two or more biopoly- mers. The analyses of patterns in sequences is a problem in sym- bolic, not numeric, computation. I.anguages that support symbol manipulation and pattern matching primitives such as C and LISP are often the languages of choice. Many of the techniques used to infer higher order structural information in patterns such as secondary structural analysis- are empirical. Sequence-based methods alone do not take full advantage of the information available in the primary structures of biopolymers. For example, different nucleotide patterns may code for the same protein sequence; different protein sequences may share very similar function. Predictmg Nucleic Acid Structures Mom Sequence Computer program used to predict RNA conformation from sequence have a limited goal and limited success. The goal Is to calculate secondary structure only- to specify which bases are paired. The procedure uses experimental thermodynamic data on double-strand formation In synthetic RNA oligonucleotides. A dynamic programming algorithm considers all possible base pairs in the RNA and calculates the free energies of the corresponding structures. The free energy of a structure is assumed to be the sum of free energies of its constituent substructures (single-stranded re- gions, double-stranded regions, bulges, hairpin loops, and interior loops). The lowest free-energy structure is the predicted secondary structure.

OCR for page 1
5 Computer programs of the future will require much more de- tailed knowledge of thermodynamics of local regions of an RNA molecule. The extensive thermodynamic data needed to predict secondary structure correctly will most likely come from computer interpolation and extrapolation of limited data measured on syn- thetic oligonucleotides in a few solvents. Proposals of methods to fold possible secondary structures into three-dimensional structures and calculate their energies are in very early stages. Prediction of secondary structures in RNA is at about the same stage as it is in proteins. However, prediction of tertiary structure in RNA is far behind sirn~lar prediction in proteins. Rapid and efficient progress in this area will require: effective methods for crystallizing RNA oligonucleotides and naturally occurring RNA molecules other than transfer RNA; NMR methods that can provide conformations for RNA molecules that contain from 10 to 100 nucleotides; computer programs that can reproduce and extrapolate the experimental results. The higher charge densities In nucleic acids (one per nucleotide) require special care in the correct treatment of solvent and ionic effects in the computer programs. Tertiary Structure from X-ray Crystallography Today, several hundred proteins have been analyzed by x-ray diffraction and their three-dimensional structures catalogued, and their number is growing substantially. This knowledge of molecu- lar structure, together with the amino acid and gene sequence data, enable us to study the mechanisms of action of these proteins at the molecular level. Two-dimensional NMR techniques are a valu- able complement to x-ray diffraction for relatively small molecules (molecular weight less than 10,000), but for the foreseeable future, crystal structure analysis will be the principal experimental source of structural data for enzymes, nucleic acid binding proteins, an- tibodies, and other proteins involved in the immune response or intercellular communication. Determining the three-dimensional structure of a biological

OCR for page 1
6 macromolecule involves several clearly defined steps. First, crys- tals of suitable size and diffraction properties must be prepared. Next, x-ray diffraction data must be collected for these crystals and also, typically, for some heavy atom derivatives of the crystals. These data can then be assembled by a computational process that yields an electron density map. This map must now be fitted with a polypeptide chain of the appropriate amino acid sequence. Be- cause the map is of less-than-atomic resolution and because it also contains errors in the phase determination, considerable skill is required to obtain the best fit. The resulting protein mode! must then be refined to remove as many as possible of the errors present in the map as well as those introduced by the fitting process. Computers play an essential role in most of these steps. The availability of new instrumentation that allows crystal- lographers to produce in a few days data that previously took weeks or months of labor-intensive work is revolutionizing protein crystallography at an opportune time. In recent years, develop- ments in genetic engineering have made it possible to produce large quantities of rare proteins and to use site-directed mutagenesis to answer structural questions. X-ray diffraction experiments have provided structures for double-stranded DNA, protein-DNA complexes, and DNA-small molecule compounds in crystals. Computer modeling is needed to extrapolate these results to more biological environments and to other complexes. An obvious application of such insight is to design more specific and more effective antibiotics. In general, we would like to be able to design molecules that can start or stop the expression of any gene in any DNA. The key to achieving this is computer-a~s~sted calculation used in close collaboration with experimental observation. Nuclear Magnetic Resonance NMR is another important source for structural data, and its use is developing very rapidly. NMR results can be compared directly with theoretical (mathematical) modeling and with struc- tures (lerived from x-ray crystallography. NMR has been applied to macromolecules in aqueous media and to a limited extent in other environments, such as those that approximate biological mem- branes. It is at its best when used to explore changes in preferred structure in response to environmental or structural perturbations.

OCR for page 1
7 In this sense, NMR can be important in extrapolating to other en- vironments structural data obtained by other techniques. It also is well suited to exploring changes in structure when comparing homologous series of macromolecules, such as a series produced by site-specific mutagenesis. More recently, technological advances have extended the ap- plicability of NMR to solids, oriented phases, and even to total structure determination of molecules in solution. The latter use has attracted the most attention and currently has the greatest potential to affect computer-assisted modeling efforts. The major limitation on the use of NMR methods to determine structure is the restriction on molecular weight. Current applica- tions require proteins of MW 10,000 and less. A second limitation stems from the restricted range of measurable distances, less than 4A. A third limitation arises because of underlying assumptions about the rigidity of macromolecules. All these limitations are likely to be overcome in time, but doing so will require advances in NMR methodology and computational capacity. Improved computational and molecular modeling facilities could promote the use of NMR to determine structure in sev- eral ways. Processing and analyzing structural data for macro- molecules of MW 10,000 is far more time-consuming than is ac- quiring the data. Each phase of this operation could be improved. Data are normally collected as a two-dimensional time domain set, and processing involves Fourier transformation to a frequency do- main set. These processes are now handled by array processors as- sociated with instrument computers, with a moderate investment in time (one hour per process). However, alternative methods of processing, including linear decomposition and maximum en- tropy methods, may be better in terms of the signal-to-noise ratio and may be more compatible with automating the analysis. Such methods take far more computer time and may become practical only on supercomputers. Some efforts are underway to use semiautomated pattern recognition and expert system strategies to determine three- dimensional structure, but these will require substantial invest- ments in programming and computer hardware. An investment in programming is obviously warranted. This investment could be used best if we acknowledge that, in the future, we may need to accommodate types of data not used today. Some data may come from other structural methods applicable to

OCR for page 1
8 solids and oriented specimens. Other data may come from entirely different methods, such ~3 tunneling microscopy. Thus, programs need not be directed specifically for use with NMR data, but should, if possible, accommodate structural data from a variety of sources. In summary, current NMR methods of determining structure are applicable to a variety of biologically important molecules that are less than MW 10,000. Data production in this size range will be greatly enhanced by better computational facilities, high field spectrometers, and modeling programs that aim for compatibility with experimental constraints of the form provided by NMR. NMR data should be meshed with data gathered through other methods of determining structure. The range of molecules accessible by these methods is likely to increase by a factor of two over the next five years. The rate of data production Is likely to Increase even more quickly as we unprove structure determination protocols and as high-field spectrometers become more generally available. TERTIARY STRUCTURE FROM THEORY Energy Optimization According to the thermodynamic hypothesis, the amino acid sequence of a protein determines its three-dimensional structure in a given medium as the thermodynamically most stable structure. To identify this structure requires some kind of optimization strategy, which, in turn, requires procedures to generate arbitrary three-dimensional conformations of a polypeptide chain, compute the free energy of the system for each conformation, and then alter the conformation so that it ultimately corresponds to the global minimum of the free energy. Although algorithms are available for minimizing an energy function of many variables, there are no efficient ones to use for passing from one local minimum, over a potential energy barrier, to the next local minimum and ultimately to the global m~ni- mum in a many-dimensional surface. Thus, minimization leads to the nearest local minimum, where the procedure is trapped. This trapping in a local, rather than the global, minimum is referred to as the ~multiple-minima problem." A variety of procedures are

OCR for page 1
9 being developed to overcome this problem, including approxima- tions that initially place the system in a broad potential energy well in which the more sharply defined global minimum lies. Although supercomputers will more adequately cover confor- mational space, workers in this field wid need more time on these machines to achieve greater efficiency. Parallel processing offers a breakthrough, but will require that more software be developed to take advantage of the new hardware. With new hardware and software, it should be possible to surmount the major hurdle cre- ated by the multiple-minima problem. However, bottlenecks may develop as attempts are made to apply procedures that work on 2(}residue segments to proteins containing 100 to 200 residues. Homology Proteins can be categorized by families. Evidence for this comes from protein sequence homology and from the architectural similarity in tertiary structures of homologous proteins as estate fished by x-ray and NMR methods. A family of proteins can be modeled if several conditions are fulfilled. First and most impor- tant, the structure of at least one member of the family must be known. Second, the protein to be modeled must be sufficiently ho- mologous to the known protein. Many proteins have been modeled over the past five years, and the general consensus is that if two proteins share at least 30 percent similarity, it is reasonable to use computer graphics and energy modeling to propose the unknown structure from the known. Molecular Dynamics Molecular dynamics simulations apply Newton's equations of motion to the atoms of one or several molecules. Newton's equations relate three independent quantities: time, conformation (three-dimensional atomic coordinates), and potential energy. Molecular dynamics simulation allows us to estimate theoret- ical mean atomic positions and deviations from the mean; rates of motion and conformation change; and ensemble averages, in- cluding thermodynamic functions such as energy, enthalpy, spe- cific heat, and free energy. Although simple in concept, molec- ular dynamics simulations were not practical until the advent of high-speed computers. The time is approaching when molecular

OCR for page 1
10 dynamics calculations will produce useful predictions of the struc- ture, dynamics, and thermodynamics of proteins, nucleic acids, and complexes of these macromolecules with one another and other molecules. The simulation requires two initial pieces of information: a starting conformation and a potential energy function or forcefield. For a protein, the starting conformation must be firmly based on experunental observation. The forcefield is often identical to that used in molecular mechanics. The forcefield is a very sunple empirical approximation of the underlying physics, which properly should be expressed in terms of quantum mechanics, but is totally unmanageable in that form. Parameters of the forcefields currently in use have been proposed on the basis of various experunental data and, to some extent, on theoretical considerations. Recently developed forcefields for water-water and water- protein interactions permit the simulation of the dynamics of proteins in solution. This capacity is a prerequisite for model- ing events at the protein surface, including most interactions of proteins with other molecules. The o; ~l.~;~- ~r _~l~_.l__ ~. . . _~- =~<~UIVllO V1 lilUl~UUl~E Dynamics OI prOtelnS require careful adjustment of starting configurations and simulation pa- rameters. The limiting factor is always the available computing power. For example, calculation of molecular dynamics simula- tions of the motion of a protein over a 10~9-second time interval takes roughly a month of computer time on a CRAY. Making ad- ditional computer time available to those working in the field will help in the development/application of more detailed forcefields, produce longer simulations, encourage the simulation of larger sys- tems that pose new physical and biological questions, and promote the application of new, more time-consuming dynamics methods to be used to ask different questions about the system. Molecular dynamics simulations show considerable promise of being able to more accurately depict the structures that are proposed on the basis of incomplete information, particularly from two-dimensional NMR. Such reiterations are thought to be the best method of investigating the atomic details of macromolecular motion.

OCR for page 1
11 Thermodynamics Physicists have known, in principle, how to calculate equi- librium thermodynamic properties from molecular dynamics cal- culations for considerable time. Only very recently have these techniques been applied to proteins, but their use has already shifted the emphasis of the molecular dynamics simulation field to calculations of free-energy differences. Several factors explain the great current interest in this application. The most important are the ~nagnitude and precision of available experimental data for a variety of equilibrium-involving biological macromolecules and the unexpectedly excellent theoretical estimates that were and still are produced by the simulations. Progress in free-energy simulations, although potentially very rapid, is severely limited by available computer time. To realize the possibilities already identified will require a radical increase of computer access for molecular dynamics studies. An imme- diate 10-fold increase does not seem an extravagant objective if we duplicate existing hardware that is already programmed and inexpensive. In contrast to the folding problem, the problem of computer modeling of the dynamics of protein interactions can be tackled in a series of small, increasingly complex steps, each of which solves a discrete problem of immediate biochemical interest, yet also adds to our insight and experience with the broader picture. Beyond the need for adequate computer time, two other needs must be met. One is the need for better forcefields, particularly for nucleic acids and carbohydrates; the second is the need for im- proved molecular dynamic techniques designed to overcome some of the intrinsic imperfections of existing forcefields. One possible impediment to this is the apparent trend toward the development and commercialization of proprietary forcefields much like the trend toward proprietary software. This trend seems counter to the best interests of science because it limits access and precludes rigorous testing of results. Solvent Ejects Biomolecular systems function in viva in environments of aqueous solutions or in a membrane. The aqueous environment

OCR for page 1
12 includes solvent as wed as a substantial component that con- sists of various ions. Because of the potentially relatively strong interactions of these components with each other and with the macromolecular species, this environment can contribute substan- tially to the observed state of a macromolecule in solution. The membrane environment differs in that the charged groups and electrically neutral regions are spatially separated. A quantita- tive treatment of biopolymer structure and function cannot be expected to succeed unless we pay attention to the molecular role of the environment. The ability to aclequately test predictions made from theo- retical calculations is an element of overriding unportance in the future of this aspect of modeling. This can occur at two levels: first and most important in comparing theory and experiment and second in comparing results obtained through convenient but ap- proximate theory with those that follow from accurate theoretical treatment. The first level is essential for accuracy and the sec- ond for the future development of viable theoretical methods for increasingly complex systems. Therefore, we should continue to encourage both experiment and theory for both macromolecular and smaller mode] compounds. The evident rapid progress in the ability to describe the en- vironmental aspects of biopolymer systems justifies our optimism that this element of biomolecular modeling will not impede de- velopment of useful predictive methods. For the most challenging aspects, however, we are at least several years away from being able to accurately mimic environmental effects of solution. ANALYSIS AND DESIGN OF DRUGS Central questions regarding function include (1) which aspects of protein structure and dynamics are responsible for the often enormous enhancements of rates of reaction of enzymes over the rates in water and (2) how the signal of the binding of a ligand or quantum of light is transduced into a physiological response such as an increase in blood pressure or a change in mental functioning. The questions are at best incompletely answered. In principle, the information needed to predict the function of a biological macromolecule is encoded in its three-dimensional structure. The problem is how to decode the rules that govern the relationship between structure and function. The tools of

OCR for page 1
13 molecular dynamics promise to offer insight into this problem, but resources must be made available. The prospect of computer-assisted drug design based on the three-dimensional structure of the target biomolecule has recently received much attention in the scientific literature. Many medici- nal chemists believe that their field is poised to undergo a revolu- tion as dramatic as that of the l950s and 1960s that transformed organic chemistry from a descriptive to a predictive science. This revolution presupposes that we can (or soon will be able to) predict the functions of macromolecules from their structure. In particular, this would require predicting whether a protein can recognize and bind a ligand and predicting the structure of the optimum ligand. Beyond that, however, we would need to be able to predict how a protein recognizes and interacts with other macromolecules to alter its own and their functions. These insights will be a direct consequence of the theoretical studies discussed in this report. The design of a new drug from theoretical principles must somehow incorporate the possible interaction of the proposed lig- and with all other macromolecules of the body. Quantitative Structure/Activity Relations (QSAR) is a logical complement to the more structure-based computer methodologies. QSAR is based on computing the activity of molecules from the properties and activities of substituents. This too! can be used to model the po- tential whole-animal activity of new ligands and perhaps to search for unanticipated interactions with other macromolecules. The recent interest in computer-assisted drug design arose be- ^~11C!D Ah. I. t.h~ scientist has available the elements of each of Bum "u You use_ ~,~.,~, the important tools needed tor such an activity. Two types of computer hardware are necessary: high speed color graphics and affordable but powerful minicomputers dedicated to modeling. Data on the three-dimensional structure of proteins are becoming available at an increasing rate as we improve our understand- ing of some of the relationships between structure and function of proteins. Finally, software is also available for displaying the molecules and modeling the energetics and thermodynamics of the binding. Equally important, specialized graphics tools for molec- ular design have been developed. Some of these arose from the related activity of "dockings a known ligand into a protein. We must understand the relation between structure and func- tion if we are to design agents that alter function by changing

OCR for page 1
14 structure. Ultimately, we expect to be able to predict the biotrans- formations of small molecules from the structures of the enzymes involved, but we cannot do so now. COMPLEX CARBOlIYD RATES Complex carbohydrates occur everywhere in animals, plants, and bacteria. The enzymes involved in glycoconjugate biosyn- thesis are glycosy~transferases that catalyze the transfer of sugar residues from the sugar nucleotides to the nonreducing end of a growing carbohydrate chain. The distinction between this process and protein synthesis is key; the latter occurs on a template of mes- senger RNA and is therefore determined by the genetic code for a single structural gene. In sharp contrast, glycoconjugate synthesis is accomplished by adding sugar units in a stepwise manner, with a different enzyme used for each step. The current state of knowI- edge does not indicate that a single DNA sequence determines the primary structure of the complex carbohydrate. It is not yet possible to predict the primary structures of com- plex carbohydrates from DNA sequences, and the three-dimension- al structures of glycoproteins, glycosphingolipids, and other com- plex carbohydrate-containing molecules can never be completely predicted without analyzing the two-dimensional structures of the carbohydrates. A complete understanding of the interactions be- tween carbohydrates and proteins (enzymes, lectins, antibodies, and cell surface receptors) will depend on the generation of accu- rate three-dimensional structures of both kinds of molecules. Of the three major classes of complex biological molecules, we have the least information at the atomic level about the three- dimensional structure of carbohydrates. Because no large car- bohydrates have been crystallized, we have no data on relevant crystal structure, other than data on simple monomers to trimers, upon which to mode] classical or semiempirical quantum mechan- ical calculations. Adequate computer time, including access to appropriate parallel processors IS an important consideration in support of this research. Configurations that consist of more than one macromolecule may interact as a whole in biological phenomena such as catalysis by many enzymes, binding at a cell surface, and signal transduc- tion across cell membranes. Hybrid systems involving complex carbohydrates, proteins and nucleic acids are important in protein

OCR for page 1
15 and nucleic acid synthesis, repair, and regulation. We believe it will be possible in the near future to use structural methods to characterize at least parts of these systems. Computer modeling of such supramolecular structures will be necessary if we are to gain a deeper understanding of how biological materials are organized to carry out complex functional tasks. ROL1: OF COMPUTERS Computers clearly play an essential role at virtually every stage and in virtually every process, theoretical and experimental, in determining the three-dimensional structures and biological ac- tivity of macromolecules. Both personal and laboratory computers and large, mainframe computers are used. Computer-assisted mathematical (theoretical) modeling is augmented by computer-generated three-dimensional graphic rep- resentations of molecules. Modeling from theory is generally tightly coupled to experimental approaches such as x-ray crys- tallography, NMR, and monomer sequencing. The interplay of theory and experiment results in the increasing refinement of the theory-based models, which in turn can be used to predict the behavior and properties of the actual molecules. Availability of computer software and access to computer time and computer-based data banks are often the factors that limit the rate of progress in structural biology. The conclusion is in- evitable that progress toward fuller understanding of macromolec- ular structure and function will be accelerated if computer-based activities receive greater support. The computer facilities of the national laboratories have the attributes required to lead in pro- viding computer-based support for both theoretical and empirical approaches to understanding macromolecular structure and func- tion. Progress in structural biology is likely to produce rapid ad- vances in biotechnology, drug design, toxicology, and medicine, along with advances in understanding of basic biological processes, including heredity and development. Our recommendations for dealing with these issues are de- tailed in Chapter 10 and summarized in the following section.

OCR for page 1
16 CONCLUSIONS The conclusions we draw from our exarn~nation of the facts re- latecI to the questions contained in our charge may be summarized as follows: Important advances in the understanding of macromolecular structure and function have been and will continue to be gained from the application of techniques using computers. To maximize the speed of progress toward a better understanding of protein foldings, macromolecular interactions and functions, impediments and limitations to the effective study of these matters must be identified and dealt with. The areas requiring attention include the need for readily available data banks of protein and nucleic acid sequences as well as model-derived structures; improved ca- pability of and access to supercomputers; provisions of educational opportunities in the area; and use of the most appropriate physical and intellectual resources for the performance of research. Our recommendations for dealing with these issues are cle- ta~led in Chapter 10 and summarized in the following section. RECOMMENDATIONS A radical new policy on data banking of protein and nucleic acid sequences is required. A permanent National Sequence Data Bank should be put in place as soon as possible. A standing ad- visory committee of users should be appointed by a consortium drawn from the National Institutes of Health (NTH), National Science Foundation (NSF), and Department of Energy (DOE). Whether the new facility should be allied with a national labo- ratory, or with the National Library of Medicine, or should be a completely new academic or commercial enterprise remains to be determined. . Support for the archiving of coordinate and model-derived structures should continue. Inclusion of data from new methods of structural analysis should be encouraged. . We recommend in the strongest terms expanding the super- computer initiative, funding of computer networks, improving ac- cess by the scientific community to the existing supercomputer centers at the national laboratories, upgrading those centers, and providing individual research grants for purchasing new comput- ers. DOE should work closely with the NSF and NTH to provide

OCR for page 1
17 the broadest and most versatile computer network system on a national level. . Educational opportunities in structural biology and molecular modeling should be improved. Several mechanisms are available, such as expanding graduate programs through new training grants; increased graduate fellowship and postdoctoral fellow programs; workshops, including formal hands-on training programs in molec- ular dynamics and molecular graphics; and working meetings of independent investigators to address critical limiting aspects of a particular problem. . Innovative and interdisciplinary research proposals in both theoretical and experimental aspects of structural biology should be directly encouraged through the use of existing funding mech- anisms. . We see a special role for the national laboratories. The na- tional laboratories should compete for the National Sequence Data Bank. The national laboratories and DOE have leadership sta- tus in the national computer network. They should increase ef- forts to make supercomputers available to the scientific commu- nity. Research efforts are going forward in molecular calculations and structural biology, with major programs at a few locations. Strengthening these efforts will assist the department's Office of Health and Environmental Research to assess the potential health and environmental effects of chemicals involved in energy pro cesses.