Click for next page ( 18


The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 17
Session I What Are the Needs of Crystallographers? Session Chairman: Philip Coppens 17

OCR for page 17
Computing Needs of the Structural Chemist James A. Ibers Dr. Hamilton has asked me to outline our research group's computational needs and our thoughts on how such needs may best be met. Although I am a structural chemist with considerable computational needs, I must emphasize that structural chemists are a heterogeneous group and that our group's needs and attitudes may not be typical. In outlining compu- tational needs, I will comment also on the operation of the Vogelback Computing Center at Northwestern University. In my position as Chairman of the University Computing Committee over the past four or so years, I have gained some insight into the support problems of a large computer in a private educational institution and I hope information on this subject will be of interest to others. Mainly for the benefit of noncrystallographers, I present in Figure 1 some features of our data-collection operation. I must emphasize that we are indeed data processors, and this fact has implications with respect to remote computing, a subject I will take up presently. Figure 1 indi- cates that we accumulate roughly two crystallographic data sets per month. Picker FACS-l with monochromator and magnetic tape output Down time: Less than 3% Idle time (mostly set-up): 40% Actual data collection: about 60% Typical data rate: 350 reflections/24 hour day Data sets October, 1971 through March, 1972: ll Figure 1 Data Collection Figure 2 provides information on our university computer, on the average cost per structure, and on the breakdown of our crystallographic computations by type. Most of the money is spent on refinement, generally computations involving non-linear least-squares analysis. Again with 18

OCR for page 17
relevance to remote computing, I should add that the UPDATE system of CDC is very convenient for program management, and that we have ceased some years ago to maintain card images of our programs. The fact that our program library is on a permanent file is also important, and assures that all allowed users of the file have access to the same versions of the programs. 1. CDC 6400 with expanded core storage, 64 000- word core, 4 tape units, plotter 2. Program library on permanent file 3. Source library in UPDATE form 4. Average cost per structure: $2500 at $500/hr. 5. Breakdown of crystallographic uses: Program development: 2% Processing of data: 5% Structure solution: 5% Refinement: 807o Error analysis, drawings, etc: 87o Figure 2 Computations Figure 3 indicates the types of problems we have handled recently. Those familiar with our work know that without triphenylphosphine as a ligand to stabilize a transition metal in a low-valence state, we would feel very lost indeed. In this respect our computational needs may not be typical of structural chemists; even with the theoretical dodges we use, such as treating the phenyl rings as rigid groups, our problems tend to be large by ordinary standards. The size of the problem is really de- fined by the number of variables to be determined, and this number also defines the cost, because of the predominance of the refinement procedure over all other computations in terms of time. Two of the eleven problems summarized in Figure 3 exceeded the cap- acity of our CDC 6400. By this I mean the following. In carrying out the non-linear least-squares refinement, one sets up the matrix of the normal equations, which will be of order N x N if there are N variables to be refined. Because the matrix is symmetric, only N x (N+l)/2 cells are needed. On our CDC 6400 we can obtain a maximum of about 140 000 cells for program and data. This proves to be a limitation at about N = 240. 19

OCR for page 17
When we have more than 240 variables we must resort to dubious mathematical tricks to carry out the refinement. These tricks cost much more than the hypothetical solution of carrying out the normal calculations on a machine of similar speed but greater storage capacity. Average number of independent atoms: 15 Average number of phenyl groups: 5 Typical number of variables: 200 Number of variables exceeded 6400 capacity in 2 cases out of ll Figure 3 Problem Type Figure 4 gives an estimate of our sources of computing money for crystallographic calculations. I have divided the money into "hard" and "soft", and it is necessary to define these terms. Hard money comes from the outside. It is the kind of money that computer center directors and university controllers are very fond of. Such money comes from the var- ious funding agencies, from private sources, etc. Soft money is a short name for internal regulation of computing. The process may work something like this: In one manner or another a Dean obtains a certain amount of soft money to be handed out to his department heads. He puts this in his bottom drawer and doles it out to the department heads who in turn may dole it out to various potential users. This process provides both reg- ulation and power. In a typical university, control of money means power and hence the Dean maintains power. But more important, a regulatory system is built in. If Professor X goes through his soft computing money like a shot, he must ask his department head for additional funds. The department head may take funds away from others in the department or he may go back to his Dean for more money. In either case, someone at the appropriate level in the university is going to question Professor X's proclivity to spend computing money. This is as it should be, for the University Computing Committee and similar committees are not in the regulatory business. They simply have no way of assessing the value of Professor X's calculations. Over the years we, at Northwestern University, have devised various schemes for the distribution of soft money, one in particular is this: If Professor X brings in large amounts of hard money, he is rewarded with similar amounts of soft money, and the reward is on a graduated scale. This procedure has worked very well. It has enabled us to make realistic requests to the grant- ing agencies and has assured Northwestern that money allocated to computing in such grants is indeed spent on computing, for the incentive is there to 20

OCR for page 17
spend it to obtain soft money. If one transfers hard computing money at the end of the year to cover debts in the supplies account, then one clearly does not gain additional computing capacity through soft money. Such book transfers were far more common before the incentive system was devised. Figure 4 indicates also roughly the amount of hard dollars that North- western might lose if I switched the bulk of my computations to a CDC 7600 outside. I return to this point later. Hard money: $15 000 Soft money: $30 000 Loss to university computing center if CDC 7600 is used off campus: $10 000 Figure 4 Payment for Crystallographic Computations Figure 5 presents a rough breakdown of the sources of support for our computing center. The University puts in the bulk of the money to run the center. Sponsored research, covered by grants, provides about 25%, and about 10% comes from other outside sources, for example, other academic institutions using our computer. As a non-profit institution we cannot, of course, accept computing from profit-making organizations. The budget for the computing center is based on staff, maintenance, and amortization of the computer. The hourly cost is derived basically by dividing the budget figure by the number of hours available for computing. University 65% Sponsored research 25% Outside sources 10% Annual budget: $1M (including $200 000 amortization) Figure 5 Sources of Support for Northwestern University's CDC 6400 Figure 6 indicates the major types of operations that take place on our computer. These various operations represent diverse needs of diverse individuals, and it is probably impossible to provide for all of them efficiently on a given computer. Moreover, the needs are ever changing: computer-aided instruction was not even with us five years ago. It is clear- ly necessary to consider these operations when projecting computer needs for the next decade on a given campus, and such a projection is difficult. For example, the medical, dental, and law applications are in their infancy, 21

OCR for page 17
but with the big push toward providing better medical and dental care, not only will there be vastly increased uses of computers in these fields, but funding will be available to obtain giant computers. 1. Administrative data processing 2. Library processing 3. Batch processing 4. Number crunching (crystallography) 5. Interactive processing 6. Interactive programming and time- sharing 7. Information retrieval and large-scale data bases 8. Laboratory processing 9. Medical, dental, and law applications 10. Computer-aided instruction ll. Computer graphics Figure 6 Major Needs of Northwestern University Users Some time ago I sat on a Long-Range Computer Needs Committee which fearlessly tackled the question of the projection of computing needs at Northwestern University. The conclusions reached are shown in Figure 7. It is not appropriate here to go through our reasoning. The salient points are that we could not justify a larger machine, and that the number-crunchers and some other users will be expected to go elsewhere for their computing in the future. Our CDC 6400 is currently used about 140 hours per week, and we anticipate that the load it can handle can be extended considerably by the hardware additions indicated in Figure 7. The decision that some users will go elsewhere for their computing is a painful one to a univer- sity, because, generally speaking, those with big computer demands also bring in hard dollars. Nevertheless, there seems to be no choice. 1. Projected usage does not justify bigger machine 2. Administrative processing should remain separate 3. Additions to hardware are necessary to handle many of the tasks (expanded core storage, expanded discs, additional printer and plotter, at a total cost of about $500 000) 4. Number crunchers will have to go elsewhere with consequent loss in income to Northwestern University. Figure 7 Conclusions of Long-Range Computer Needs Committee 22

OCR for page 17
What crystallographers need are faster machines with increased high- speed memory. Figure 8 indicates two of the great advantages of the CDC 7600 over the CDC 6400 for our purposes. On the CDC 7600 one can carry out in 5 to 10 minutes of central processor time what one might have been able to do on the CDC 6400 in 3 hours. Not only is the resultant calcu- lation much cheaper but it has more chance for success, because the sys- tem is not given as long a time for a possible failure. Since furthermore the CDC 7600 has effectively infinite core in its large core memory, an entire new set of crystallographic problems become open to successful at- tack. In this discussion I have limited myself to the CDC 7600 as an example of an extant, giant computer. I presume similar machines are avail- able from other manufacturers, but I am personally unfamiliar with the problems of using such machines. 1. "Infinite" core and hence computations can be made that are not possible on 6400. 2. The 7600 is 30 times faster than the 6400 with consequent savings in dollars. Figure 8 Advantages Of Remote Computation (CDC 7600) The following remarks, appended since the Albuquerque meeting, re- late to some of our experiences with remote computing that have occurred since then. We have recently been performing remote calculations on the CDC 7600 at Lawrence Berkeley Laboratory. Transmission to and from the 7600 is done over telephone lines through the 200 User's Terminal at our Computer Center. We have not carried out large data-processing calcula- tions, but rather have restricted ourselves during this trial period to potential-function calculations that require only modest amounts of data transmission, but do require the speed and capacity of the 7600. Figure 9 is one I showed at Albuquerque. It was my guess concerning disadvan- tages of remote computing. I am now able to comment on some of these points. We have had essentially no compatibility problems. A 6400 UPDATE tape compiled directly on the 7600, and the resultant program, produced answers that were identical with those obtained on the 6400. We have had some difficulties with 7600 control-card instructions, mainly because of the difficulty of finding the right source of information at LBL. We have also had some difficulty obtaining up-to-date information on system changes, although the most important ones are put on a common file which we can ob- tain daily. We have not tried to transmit large data files, nor do I be- lieve that this is feasible. I think it is inevitable in the types of op- erations we do that there be sympathetic cooperation at the other end. We are going to have to mail data tapes and answer tapes back and forth; we are going to have to create permanent files that don't get wiped out on 23

OCR for page 17
Monday mornings, etc. Thus far we have not experienced any difficulty with tape storage at LBL. 1. Machine incompatibilities 2. Remote and unexpected software changes 3. Difficulty of transmitting large data files 4. Difficulty of tape and disk storage 5. Government bureaucracy Figure 9 Problems in Remote Computation On the basis of this limited test I am pleased with the results and am optimistic that remote computing on problems too large or too expensive to handle locally will become commonplace among crystallographers. More- over, as I have tried to indicate, we really have no choice. It is hoped that a symposium such as the present one will not only make our needs known but will facilitate the establishment and efficient use of national or re- gional computing facilities. DISCUSSION Jeffrey: How big is Northwestern University? That is a parameter relating to usage. How many faculty and how many students? Ibers: Northwestern has 6500 undergraduates and approximately 2000 grad- uates on the Evanston campus. The expectation for growth in either of these is zero. We are a private institution and are not planning to expand. We have a large graduate school on the Chicago campus, in Medicine, Dentistry and Law, and these programs will expand under government pressure. But on the Evanston campus, we can estimate com- puting need on the basis of a fixed population. Jeffrey: How many faculty? 24

OCR for page 17
Ibers: There are about 400 faculty members in the College of Arts and Sciences, representing some 75 per cent of the total faculty. Caughlan: What fraction of your usage is administrative and what fraction is educational, including instruction and research of graduate students? Ibers: The administrative computing is done on a separate computer, and so does not affect us. The keeping track of where books are from the library is also done on that computer. The 6400 is used only for re- search and instructional purposes, and the usage is approximately 35 per cent undergraduate, 35 per cent graduate, and 30 per cent faculty. The distinction in the last two areas clearly is not easy because some professors assign computer numbers to individual graduate students, while others use a blanket number for themselves and all their graduate students. But the instructional uses are about one third. Coppens: What is done with faculty members who have no funds for research, do they get time? Ibers: The faculty members who have no funds get soft money up to a point, and as far as I know this point, with rare exceptions, provides them with enough to do what they need to do. A man who has no money from the outside and wants to become a major user of the computer is clearly destined for some talks with his dean. Dewar: I would like to pursue the point you made that the 6400 in its present state will meet needs other than in the one category of number crunching. You mentioned ten years as a possibility. I wonder whether that is likely, because I think the potential for explosion in usage is even greater in some of those other areas, computer-assisted instruction and data-base retrieval, for example. You may be talking about demands for a thousandfold increase in on-line storage, which is conceivable with projected hardware. What are some of the parameters that go into that decision? Ibers: It is difficult to feel confident about projections of computer usage. Nevertheless the Long-Range Computer Needs Committee at North- western did talk extensively with diverse groups of people, many of whom had grandiose schemes for computing in the next decade. Obviously one of the major parameters that went into our thinking was hard money, or rather the lack of it, for extensive computer changes. Perhaps I left the wrong impression in my talk. We did not conclude that only the number crunchers would go off campus to compute; they were simply the most obvious group at the present time. But should we get heavily involved in computer-aided instruction, then money will not be spent at the computer center but will possibly buy renote terminals for a tie-in with the Plato system at Illinois. In any event, I too have been around the computer game for many years, and know your worries when you imply that estimates on computer usage always fall short. 25

OCR for page 17
Nevertheless we think our estimates are based on reasonably hard facts. Dewar: There are other areas where going off campus probably looks very attractive, for example, large scale manipulations with the census files. Ibers: Another example at this time is that we run our Chemical Abstracts searches at Argonne National Laboratory, not because we want to but, as I understand it, because their format is difficult to change away from IBM. So we pay for the Chemical Abstracts searching at Argonne. Williams: I have a question related to funding but in an indirect way. As you know, the government is about to buy 10-12 large machines and I am wondering about the problems of conversion. I've heard, being an IBM 360 user, that the conversion between the CDC 6000 and 7000 series involves quite a change. I wonder if you could comment on the conversion problem, particularly with respect to interconversion be- tween IBM and CDC 7000 series machines. Ibers: The incompatibilities between us and the LBL at Berkeley are no more serious than incompatibilities between one 6000 center and another, and these incompatibilities will lessen as the 7600 at Berkeley insti- tutes more of the Scope system. Jeffrey: At the University of Pittsburgh we have disposed of the soft-money problem by putting the computer on overhead. It raised the overhead by about three per cent, but saved real money by reducing the bookkeeping and bureaucracy of the soft-money operation. Coppens: Does that mean free access for everybody? Jeffrey: Yes. The resources are allocated on a departmental basis, based on what is considered to be the department's justifiable use. The prior- ities and departmental use are programmed into the computer. Responsi- bility for proper use of the computing resource is then placed at the departmental level. Sayre: You have indicated that crystallographers may often want to turn to special off-campus machines for their large number-crunching compu- tations. You have pointed out two associated problems, that of data transmission and that of administrative complexity. What about cost? Do you have a cost figure for the 7600 you are trying to get on to, that can be compared with the $500 per hour you pay on your own campus? Ibers: Yes, and this is part of the problem. The rate of the 7600 at LBL for contract users, people with AEC money, is $600 per hour for a machine that is 30 times faster than one for which we pay $500 per hour. The reasons for this of course have already been pointed out. The AEC 26

OCR for page 17
considers only operating costs in arriving at an hourly charge. They do not consider amortization or the initial expense of buying the machine. So it is a very attractive computer. 27

OCR for page 17
to install two of them, one at each end, error-correcting modems. There are several of these on the market and I'm sure they would eliminate most of your problems. Berman: If we had done that, it would have dedicated us to one computer center, and we haven't yet found the computer center we are willing to make that dedication to. Meyer: There are several commercial firms that plan within the next few years to blanket the country with data transmission networks. DATRAN for one. 36

OCR for page 17
Computing Needs of Protein Crystallographers Keith D. Watenpaugh The growth of crystallography has been closely linked to the growth of computer science and technology in general. As computers became faster and more sophisticated, the rate of growth of crystal-structure determina- tions and their degree of refinement and accuracy increased. In no field of crystallography is this more evident than in protein crystallo- graphy. Protein molecules are at least an order of magnitude larger than those studied in normal crystallography, and along with this large size comes very real problems and experimental limitations associated with the collection and treatment of data. In fact, many analogies may be drawn between the state of the art of protein crystallography now and that of ordinary crystallography 15 to 20 years ago when smaller and slower computers were just coming into use. Computers are important not only in the processing of crystallo- graphic data but also in the collecting of the data. The mass of data necessary to solve protein structures even in moderate detail, as well as protein's almost universal instability (especially under X-ray bom- bardment) , requires automated high-speed data collecting and processing. Presently, either computer-controlled diffractometers or computer-controlled film-scanning densitometers are used for this purpose. This aspect of crystallography is discussed later in this symposium, and I mention it here only because it is an indispensable part of protein crystallography. Also, new, extremely high-speed data-acquisition systems are now being developed (Xuong and Vernon, 1972). Computer application to solving protein structures through phasing by multiple isomorphous-replacement techniques (Blow and Crick, 1959) has become quite routine. Also, improving heavy-atom parameters by alternating cycles of least-squares refinement with cycles of phasing by multiple isomorphous replacement has become standard (Dickerson et al., 1968). However, this method of determining phases cannot usually be extended be- yond 2.0 A resolution (resolution is usually defined as the minimum in- terplanar spacing to which data were collected). A Fourier synthesis (electron-density map) calculated using these phases and fitted by some model giving approximate atomic positions usually is the final step in the structure determination. This is so for a number of reasons, inclu- ding the difficulty of obtaining good higher-resolution data and the limi- tations of present-day computers. Further computer applications at this point may include fitting a model to the electron-density map while maintaining some constraints 37

OCR for page 17
on the model (Diamond, 1971). This allows approximate atomic positions to be calculated, but their accuracy is quite uncertain. Important use of computers has also been made in studying the protein conformations with computer-controlled display systems (Barry and North, 1972). Tollin and Rossman (1966) have described various rotation-function programs. Programs of this type may be used to fit known protein models to the crystallographic data of similar unknown proteins in order to solve rela- ted protein structures without using isomorphous-replacement techniques. However, the most demanding use of computers in the near future is going to be in the refinement of protein structures to produce much more accu- rate models. Since the phased data in even a "high resolution" protein structure do not approach the precision required to resolve individual atomic posi- tions, the current protein models are poor by regular crystallographic standards. The need to improve these models is indisputable. As more and more structures are being determined to 3.0 A, 2.5 A or 2.0 A reso- lution, it is becoming painfully obvious that the models simply are not accurate enough to explain the unusual and unique physical and chemical properties of many proteins. Practically nothing can be said about bond lengths or angles in proteins, and even atomic positions have uncertain- ties of the order of a half an angstrom or more. Following are just a few of the questions that may be answered if more accurate protein structures are obtained, with computer refinement of protein structures playing a primary role: 1. Vallee and Williams (1968) have proposed an entatic state or region of abnormal conditions in the proteins as giving rise to internal activation by geometric and/or oelectronic strain. Stretching a bond by 0.2 A or twisting it through 20 can produce very large changes in ener- gies, yet be entirely missed by present protein X-ray crystallographic analysis. 2. High orientation dependence has also been proposed as contri- buting to the unusual catalytic properties of enzymatic proteins. Strom and Koshland (1970) have suggested that large rate increases may be rea- lized by proper orientation of reacting molecules, and that enzymatic reativity may be due to this "orbital steering" ability. They propose that changes in angles of as little as 10° may produce rate changes of 10 , again outside the range of present protein crystallographic accuracy. 3. Nonplanarity of peptide groups as well as the close proximity of atoms appear to be implicated in the activity of lysozyme (Barry and North, 1972). An accurate structure is required to assess the degree of nonplanarity of its peptides. 4. Chromatium high-potential protein (HIPIP) and bacterial ferre- 38

OCR for page 17
doxins have similar iron-sulfur clusters in 2.2 A resolution electron- density maps, yet their physical properties are very different (Carter et al., 1972; Adman et al., 1972). Ferredoxin has an oxidation-reduc- tion potential of approximately -400 mV whereas that of HIPIP is +350 mV. An accurate description of the cluster and its environment is need- ed to explain the difference. Refinement of Protein Structures Procedures for refining protein structures fall into two classes. In the first are those that attempt to produce the best fit of the model to the electron-density map generated by determining phases through multiple isomorphous-replacement methods. The second includes proce- dures that improve the phases and/or extend the phases to higher-res- olution structure factors to produce a more accurate model than can be derived from the experimentally determined phases. R. Diamond (1971) has written a sophisticated computer program that optimizes the fitting of a model to an electron-density map while maintaining some constraints on the model. Bond lengths are kept con- stant while overlapping densities of neighboring atoms are accounted for. Some interbond angles may either be constrained or allowed to vary. This procedure appears to lead to an improvement of the model with respect to experimental data to 3.0 A or 2.0 A resolution if the electron-density map is reasonably good. However, it must be noted that refinement of a model by electron-density maps has several dis- advantages when errors exist in the data or when atoms are not resolved. Computer requirements for this type of refinement are not particularly large, but the model produced would be considered only a reasonable starting model for refinement according to regular crystallographic standards. A second procedure, which involves refining phases and extending them to higher resolution, is the so-called "direct methods." Use of direct methods is discussed by D. Sayre in this symposium. Application of direct methods to protein crystallography has not proved very success- ful yet, but some promising results have been obtained. However, compu- ting times can be enormous. In the course of the refinement of small structures, it was found that £F syntheses (difference maps) provided advantages over Fourier syntheses in the refinement of structures when atomic positions were not resolved, as is the case when working with two-dimensional data or when there are series-termination errors due to lack of higher-order data. Shifts in atomic positions are proportional to the slope at the assumed atomic positions and can be determined more easily and reliably. Appar- ent thermal motion is also more easily determined. Moreover, gross errors in the model, such as misplaced or missing atoms, can be detected. A 39

OCR for page 17
reasonable analogy may be drawn between refinement of small structures with two-dimensional data and refinement of proteins with three-dimen- sional data when atomic positions are not quite resolved. It is ques- tionable, however, whether unrestricted use of AF refinement is justi- fiable without data near atomic resolution. Computer times required for aF refinements in general are not large. With the advent of large high-speed computers, the most powerful method of refining small structures was by means of full-matrix least- squares. Even on small structures, least-squares can tax the largest and fastest computers. True full-matrix least-squares refinement on a protein structure cannot be reasonably accomplished on today's computers. Reducing the magnitude of the task by neglecting off-diagonal terms, as is sometimes done on small structures, proves disastrous with proteins, since at low resolution the correlation between neighboring atoms is high and cannot be ignored. Further problems may arise because the number of structure factors does not greatly exceed the number of para- meters to be refined and because an appreciable fraction of the crystal may be composed of solvent. The numerous difficulties and limitations associated with the refine- ment methods have prevented people from refining protein structures in spite of the great need to do so. However, it is important to know whe- ther it is possible to refine proteins and to what extent the model may be improved. Refinement of a Protein Structure A brief summary of the refinement of rubredoxin provides a convenient way of describing the magnitude, merits, and limitations of various re- finement procedures (Watenpaugh et^ al. , 1971). An accurate description of the iron-sulfur complex as well as the chain conformation is essential in understanding the unusual physical properties of this protein. Also, it is a good structure on which to test protein refinement because of its relatively small sizg and because significantly observable data exist to a resolution of 1.5 A (approximate- ly atomic resolution). c Atomic positions with which to begin the refinement were picked off a 2.0 A electron-density map phased by multiple isomorphous replacement. Structure factors based on these parameters had an R-index of 0.37. The R-index, defined by R Fo 40

OCR for page 17
is used to measure the discrepancy between the observed and calculated structure factors. Models of small structures with R-index in the vicin- ity of 0.4 can usually be refined. It is convenient to quote computer time and costs of the CDC 6400 computer at the University of Washington to compare the magnitudes of the various steps in the refinement of rubredoxin. The present charges on this computer are approximately $5.00 per minute for the central pro- cessing and a variable amount for input-output and peripheral processing. The X-ray 70 crystallographic computing system supplied by Dr. Stewart was used for all major computing (Stewart, 1967). The structure factor calculation (Fc calculation) over the approximately 500 atoms in the asymmetric unit or 1500 atoms in the cell on more than 5000 observed reflections requires approximately 65 minutes of central processor time and costs $350. 0A AF synthesis following the F calculation, with 2 grid points per angstrom, takes 40 minutes and costs $250. Initial refinement was with AF syntheses for several reasons. The maps provide a constant check on whether sensible corrections are being applied, allow modifications in assignment of peptide residues since the sequence was not known, and keep computing time and costs within reason- able limits. Calculations of the shifts in atomic positions from theAF syntheses were done by hand. Approximately 30 man hours were required at this task of calculating shifts on about 500 atoms per cycle. The AF refinements were sufficiently well behaved to improve the structure significantly and allow identification of additional residues the assignments of which had been questionable at the outset. Four cycles ofAF refinement decreased the R-index from an initial value of 0.37 to 0.22. Since the AF refinements were fairly well-behaved it seemed feasible to attempt least-squares refinement. It is impossible to take full advan- tage of the superior characteristics of full-matrix least-squares refine- ment because of the magnitude of the problem. Even this small protein, with approximately 600 atoms including water molecules in more or less discrete locations, involves 2400 parameters to vary (x,y,z and isotropic thermal parameter). Even to store the unique part of the symmetric matrix would require 3 million words of storage. The computer time to build such a matrix in the course of a refinement is not feasible on currently avail- able computers. The CDC 6400 computer at the University of Washington with a core size of 64000 words is capable of refining 240 parameters. Therefore the matrix was partitioned into blocks of about 240 parameters associated with neighboring atoms and requiring 10 passes through the com- puter to complete one full cycle of refinement. One cycle of refinement requires approximately 17 hours of central processor time and costs about $5400 to $6000 as compared with about $600 and a week of hard work for AF refinement. Bond lengths and angles were calculated after each least- 41

OCR for page 17
squares cycle, as were AF syntheses at various stages to check the be- havior of the refinement. The least-squares refinement behaved sur- prisingly well, with no general tendency for atom positions to oscillate or diverge, in spite of the lack of complete atomic resolution and the not highly overdetermined constraints on the parameters. After four cycles of least-squares refinement, the R-index is 0.126, but some re- gions of the molecule are still poorly defined because of either high thermal motion or disorder. It is now evident from the refinement that there is very significant distortion in the tetrahedral configuration of the iron-sulfur complex. Three iron-sulfurobond lengths may be not significantly different from each other (2.34 A, 2.32 A, 2.24 A with standard deviations of 0.03 A) and agree well with those observed in small crystal structures, while the fourth is short (2.05 A) suggesting an entatic nature for this protein. The more accurate iron-sulfur cluster as well as the more accurate descrip- tion of the surrounding polypeptide should allow for more quantitative theoretical calculations to be performed to explain both electron-transport mechanisms and the energetics of protein folding. Perhaps the most important outcome of this successful refinement of rubredoxin has been to stimulate refinement of other proteins in which better accuracy is required to explain their mechanisms. AF refinements are currently under way on both bacterial ferredoxin and high-potential iron protein at 2.0 A resolution in hopes of explaining their very dif- ferent physical properties. Subtilisin and pancreatic trypsin inhibitor are being refined to better understand behavior of proteases. Refinement is beginning on the triclinic form of egg-white lysozyme, which holds pro- mise of being capable of refinement beyond any other protein currently under study. Suddenly, refinement of protein structures is no longer in the fu- ture but in the present. The limiting factor is not whether proteins can be refined but the computational aspects of refinement. New computer pro- grams designed for the refinement of protein structures, not small cry- stallographic structures, must be forthcoming. For example,AF refine- ment techniques, which disappeared from use on small structures with the advent of high-speed computers, must be reexamined keeping in mind current computer technology and speed. New methods of least-squares refinement are required that take into account the overlap of electron densities of neighboring atoms and allow more nearly diagonalized matrices, so as to increase speed and efficiency of refinement. However, in final analysis, the dynamic growth of protein crystallography is dependent on the increas- ing availability of larger and higher-speed computers. Acknowledgment s I am indebted to Dr. L. H. Jensen for many helpful discussions and to the USPHS for support under Grant GM-l3366 from the National Institutes of Health. 42

OCR for page 17
References Adman, E., L. C. Sieker and L. H. Jensen. 1972. The structure of a bacterial ferredoxin. Amer. Cryst. Assn. Abstr. p. 66. Albu- querque. Barry, C. D. and A. C. T. North. 1972. The use of a computer-con- trolled display system in the study of molecular conformations. Cold Spring Harbor Symp. on Quant. Biol. 36: 577. Blow, D. M. and F. H. C. Crick. 1959. The treatment of errors in the isomorphous replacement method. Acta Cryst. 12: 794. Carter, C. W., Jr., S. T. Freer, N. H. Xuong, R. A. Alden and J. Kraut. 1971. Structure of the iron-sulfur cluster in Chromatium iron protein at 2.25 A resolution. Cold Spring Harbor Symp. on Quant. Biol. 36: 381. Diamond, R. 1971. Real-space refinement procedure for proteins. Acta Cryst. A27: 436. Dickerson, R. E., J. E. Weinzierl and R. A. Palmer. 1968. A least-squares refinement method for isomorphous replacement. Acta Cryst. B24: 997. Stewart, J. M. 1967. X-ray 67: program system for x-ray crystallography. TR-67-58 (NSG-398), Computer Science Center, University of Maryland. Storm, D. R., and D. E. Koshland, Jr. 1970. A source for the special catalytic power of enzymes: orbital steering. Proc. Nat. Acad. Sci. 66: 445 Tollin, P. and M. G. Rossman. 1966. A description of various rotation function programs. Acta Cryst. 21: 872. Vallee, B. L. and R. J. P. Williams. 1968. Metalloenzymes: the entatic nature of their active sites. Proc. Nat. Acad. Sci. 59: 498. Watenpaugh, K. D., L. C. Sieker, J. R. Herriott and L. H. Jensen. 1972. The structure of a non-heme iron protein: rubredoxin at 1.5 A resolution. Cold Spring Harbor Symp. on Quant. Biol. 36: 359. Xuong, N. H. and W. Vernon. 1972. A rapid data acquisition system for protein crystallography. Amer. Cryst. Assn. Abstr., p. 59, Albuquerque. 43

OCR for page 17
DISCUSSION Freer: I wish to comment on the refinement of HIPIP, the high poten- tial iron protein from Chromatium D. We were so encouraged by the progress of the Seattle group that last December we wrote a numer- ical differential-synthesis program which we hoped would mate protein refinement. This procedure has worked amazingly well. A complete refinement cycle consists of an F , a AF map and then automatic calculation of slopes and parameter shifts. For HIPIP, where we're talking about 800 atoms, the F runs 7 minutes, the Fourier 4 minutes, and the differential synthesis about 30 seconds. For a total cost of approximately $600 (for about 3 hours of CDC 3600 time), we reduced the R factor for HIPIP from 34 to 16%. Coppens: That's even lower than doing it by hand, but much faster. Freer: Yes. All I want to emphasize is that since this program has come into being such refinement is becoming practical. Watenpaugh: In starting out, for example, we've used the X-ray system designed for people who are solving lots of different structures and lots of different space groups, but when we come to protein struc- tures we're going to spend a significant amount of time on refine- ment so that it will be very important that we optimize the system for a particular space group and a particular protein. Stewart: People tend to refer to this X-ray system as being mine. The authorship extends over a great many people. In fact, Steve Freer who just spoke is himself one of the original authors. And it was not our intent that the least-squares nor this Fourier program be used for protein-structure analysis. It was written with the idea of space-group flexibility and convenience, and therefore you're paying for this in overhead in a real way. If it's used for these large structures it does cost more, and I think Steve's remarks are especially important. In the old days we really pushed for efficiency rather than convenience. There could be many short cuts made. I'm sorry to say also that I completely obliterated about two years ago our differential synthesis, destroyed all vestiges of its existence, and threw it away, believing it had been an exercise in futility. So that has saddened me a little bit. Dewar: Bearing in mind the whole purpose of this symposium, particularly in terms of looking forward to what could be done with large com- puting facilities, one conclusion I seem to hear is that one really isn't that far away. The costs you quoted are high, but entirely reasonable for the size of structure that's being tackled. Does it seem fair to conclude that even in connection with protein struct- ures, we're talking not about some miraculous new hardware two orders 44

OCR for page 17
of magnitude faster, but about very large existing computers, per- haps 7600's, at the top of the scale? This would be a significant conclusion from the point of view of building a gigantic computer for crystallographers. Coppens: We have heard discussion of the unit cost per man year in computing time; would it be higher for this kind of work? Watenpaugh: Well, just in the refinement I've spent over $30 000 in one year. The amount required for the solution of protein structures has been small, and this is about where protein people have stopped. However, as more proteins are going to be refined, you're going to see an astronomical increase in the amount of computing time on the part of protein crystallographers. Johnson: How much money goes into computer graphics on any protein struct- ure? Watenpaugh: We don't do any, but there are some groups that do, and I imagine quite a bit of time is spent in some laboratories. Freer: We actually have been spending as much on computer graphics as on refinement. Schomaker: Keith, is it true though that you are substantially inhibited in your progress by lack of money? Watenpaugh: Yes. Schomaker: You could have spent $100 000 or $200 000, or at least at that rate? Watenpaugh: Well, we're not even at convergence yet. It's just that we actually have a good start, and we still have fairly low-resolution data. We want to collect data to high resolution. Sayre: I think it should also be noted that rubredoxin is a small pro- tein and that the cost of a least-squares refinement rises approxi- mately at a rate between n and n^ where n is the number of atoms. Coppens: So perhaps there is a need for larger computers. Sayre: Yes. That's the point. Koetzle: There must be some resolution limit beyond which this sort of refinement that you're talking about is not possible. To what reso- lution would you say the data on the protein ought to go before you can initiate this process? 45

OCR for page 17
Watenpaugh: On the Chromatium HIPIP, they're working with 2 A data but I think this is the bare minimum of the resolution at which we can work. It's becoming increasingly obvious that the positions of the atoms are fairly well behaved in proteins, much as they are in small structures, and therefore, with high-speed data-collection techniques to collect the data before our crystals go to pot, we should be able to take lots of proteins to atomic resolution in the future. Jeffrey: I have an idea that differential syntheses are like block- diagonal refinement in their convergence. So you may not gain as much by differential synthesis because the refinement would be less rapid than full-matrix. Freer: The fact is that I can do the differential synthesis on a CDC 3600 with 32 000-word core, and that's what I have now. We're going to try to get on the 7600 at LBL. Being members of the Uni- versity of California, perhaps we might have a better chance. Jeffrey: I seem to remember a paper where someone related differential synthesis to a diagonal matrix refinement. Freer: Yes. I think it's equivalent to least-squares weighted by the reciprocal of the atomic scattering factors. Coppens: This was discussed in a paper by Cruickshank (Acta Cryst. 5_, 511, 1952). Seeman: Can you give the accuracy of your starting model? What's been your actual shift from the model to your current coordinates? Watenpaugh: We've had shifts of over half an angstrom, but the average shifts were probably about two-tenths of an angstro'm or so per cycle in the initial stages of refinement. J. D. H. Donnay: I should like to address my comments to the funding agencies. About twenty years ago, at the Paris International Con- gress of Crystallography, the late Professor Mauguin referred to "those crystallographers who had the courage and the audacity to tackle the protein structures." That was in 1954. What was almost foolhardy at that time is still a job that requires considerable boldness and courage today. It seems to me that funding agencies should know that we as a profession (1) have high respect for the people who work on protein problems and (2) feel that, if a nation is lucky enough to have research people willing and able to solve such problems, it should make sure that they receive full support from their government. 46