National Academies Press: OpenBook
« Previous: Introductory Remarks: Walter C. Hamilton
Suggested Citation:"What are the Needs of Crystallographers?." National Research Council. 1973. Computational Needs and Resources in Crystallography: Proceedings of a Symposium, Albuquerque, New Mexico, April 8, 1972.. Washington, DC: The National Academies Press. doi: 10.17226/18587.
×
Page 17
Suggested Citation:"What are the Needs of Crystallographers?." National Research Council. 1973. Computational Needs and Resources in Crystallography: Proceedings of a Symposium, Albuquerque, New Mexico, April 8, 1972.. Washington, DC: The National Academies Press. doi: 10.17226/18587.
×
Page 18
Suggested Citation:"What are the Needs of Crystallographers?." National Research Council. 1973. Computational Needs and Resources in Crystallography: Proceedings of a Symposium, Albuquerque, New Mexico, April 8, 1972.. Washington, DC: The National Academies Press. doi: 10.17226/18587.
×
Page 19
Suggested Citation:"What are the Needs of Crystallographers?." National Research Council. 1973. Computational Needs and Resources in Crystallography: Proceedings of a Symposium, Albuquerque, New Mexico, April 8, 1972.. Washington, DC: The National Academies Press. doi: 10.17226/18587.
×
Page 20
Suggested Citation:"What are the Needs of Crystallographers?." National Research Council. 1973. Computational Needs and Resources in Crystallography: Proceedings of a Symposium, Albuquerque, New Mexico, April 8, 1972.. Washington, DC: The National Academies Press. doi: 10.17226/18587.
×
Page 21
Suggested Citation:"What are the Needs of Crystallographers?." National Research Council. 1973. Computational Needs and Resources in Crystallography: Proceedings of a Symposium, Albuquerque, New Mexico, April 8, 1972.. Washington, DC: The National Academies Press. doi: 10.17226/18587.
×
Page 22
Suggested Citation:"What are the Needs of Crystallographers?." National Research Council. 1973. Computational Needs and Resources in Crystallography: Proceedings of a Symposium, Albuquerque, New Mexico, April 8, 1972.. Washington, DC: The National Academies Press. doi: 10.17226/18587.
×
Page 23
Suggested Citation:"What are the Needs of Crystallographers?." National Research Council. 1973. Computational Needs and Resources in Crystallography: Proceedings of a Symposium, Albuquerque, New Mexico, April 8, 1972.. Washington, DC: The National Academies Press. doi: 10.17226/18587.
×
Page 24
Suggested Citation:"What are the Needs of Crystallographers?." National Research Council. 1973. Computational Needs and Resources in Crystallography: Proceedings of a Symposium, Albuquerque, New Mexico, April 8, 1972.. Washington, DC: The National Academies Press. doi: 10.17226/18587.
×
Page 25
Suggested Citation:"What are the Needs of Crystallographers?." National Research Council. 1973. Computational Needs and Resources in Crystallography: Proceedings of a Symposium, Albuquerque, New Mexico, April 8, 1972.. Washington, DC: The National Academies Press. doi: 10.17226/18587.
×
Page 26
Suggested Citation:"What are the Needs of Crystallographers?." National Research Council. 1973. Computational Needs and Resources in Crystallography: Proceedings of a Symposium, Albuquerque, New Mexico, April 8, 1972.. Washington, DC: The National Academies Press. doi: 10.17226/18587.
×
Page 27
Suggested Citation:"What are the Needs of Crystallographers?." National Research Council. 1973. Computational Needs and Resources in Crystallography: Proceedings of a Symposium, Albuquerque, New Mexico, April 8, 1972.. Washington, DC: The National Academies Press. doi: 10.17226/18587.
×
Page 28
Suggested Citation:"What are the Needs of Crystallographers?." National Research Council. 1973. Computational Needs and Resources in Crystallography: Proceedings of a Symposium, Albuquerque, New Mexico, April 8, 1972.. Washington, DC: The National Academies Press. doi: 10.17226/18587.
×
Page 29
Suggested Citation:"What are the Needs of Crystallographers?." National Research Council. 1973. Computational Needs and Resources in Crystallography: Proceedings of a Symposium, Albuquerque, New Mexico, April 8, 1972.. Washington, DC: The National Academies Press. doi: 10.17226/18587.
×
Page 30
Suggested Citation:"What are the Needs of Crystallographers?." National Research Council. 1973. Computational Needs and Resources in Crystallography: Proceedings of a Symposium, Albuquerque, New Mexico, April 8, 1972.. Washington, DC: The National Academies Press. doi: 10.17226/18587.
×
Page 31
Suggested Citation:"What are the Needs of Crystallographers?." National Research Council. 1973. Computational Needs and Resources in Crystallography: Proceedings of a Symposium, Albuquerque, New Mexico, April 8, 1972.. Washington, DC: The National Academies Press. doi: 10.17226/18587.
×
Page 32
Suggested Citation:"What are the Needs of Crystallographers?." National Research Council. 1973. Computational Needs and Resources in Crystallography: Proceedings of a Symposium, Albuquerque, New Mexico, April 8, 1972.. Washington, DC: The National Academies Press. doi: 10.17226/18587.
×
Page 33
Suggested Citation:"What are the Needs of Crystallographers?." National Research Council. 1973. Computational Needs and Resources in Crystallography: Proceedings of a Symposium, Albuquerque, New Mexico, April 8, 1972.. Washington, DC: The National Academies Press. doi: 10.17226/18587.
×
Page 34
Suggested Citation:"What are the Needs of Crystallographers?." National Research Council. 1973. Computational Needs and Resources in Crystallography: Proceedings of a Symposium, Albuquerque, New Mexico, April 8, 1972.. Washington, DC: The National Academies Press. doi: 10.17226/18587.
×
Page 35
Suggested Citation:"What are the Needs of Crystallographers?." National Research Council. 1973. Computational Needs and Resources in Crystallography: Proceedings of a Symposium, Albuquerque, New Mexico, April 8, 1972.. Washington, DC: The National Academies Press. doi: 10.17226/18587.
×
Page 36
Suggested Citation:"What are the Needs of Crystallographers?." National Research Council. 1973. Computational Needs and Resources in Crystallography: Proceedings of a Symposium, Albuquerque, New Mexico, April 8, 1972.. Washington, DC: The National Academies Press. doi: 10.17226/18587.
×
Page 37
Suggested Citation:"What are the Needs of Crystallographers?." National Research Council. 1973. Computational Needs and Resources in Crystallography: Proceedings of a Symposium, Albuquerque, New Mexico, April 8, 1972.. Washington, DC: The National Academies Press. doi: 10.17226/18587.
×
Page 38
Suggested Citation:"What are the Needs of Crystallographers?." National Research Council. 1973. Computational Needs and Resources in Crystallography: Proceedings of a Symposium, Albuquerque, New Mexico, April 8, 1972.. Washington, DC: The National Academies Press. doi: 10.17226/18587.
×
Page 39
Suggested Citation:"What are the Needs of Crystallographers?." National Research Council. 1973. Computational Needs and Resources in Crystallography: Proceedings of a Symposium, Albuquerque, New Mexico, April 8, 1972.. Washington, DC: The National Academies Press. doi: 10.17226/18587.
×
Page 40
Suggested Citation:"What are the Needs of Crystallographers?." National Research Council. 1973. Computational Needs and Resources in Crystallography: Proceedings of a Symposium, Albuquerque, New Mexico, April 8, 1972.. Washington, DC: The National Academies Press. doi: 10.17226/18587.
×
Page 41
Suggested Citation:"What are the Needs of Crystallographers?." National Research Council. 1973. Computational Needs and Resources in Crystallography: Proceedings of a Symposium, Albuquerque, New Mexico, April 8, 1972.. Washington, DC: The National Academies Press. doi: 10.17226/18587.
×
Page 42
Suggested Citation:"What are the Needs of Crystallographers?." National Research Council. 1973. Computational Needs and Resources in Crystallography: Proceedings of a Symposium, Albuquerque, New Mexico, April 8, 1972.. Washington, DC: The National Academies Press. doi: 10.17226/18587.
×
Page 43
Suggested Citation:"What are the Needs of Crystallographers?." National Research Council. 1973. Computational Needs and Resources in Crystallography: Proceedings of a Symposium, Albuquerque, New Mexico, April 8, 1972.. Washington, DC: The National Academies Press. doi: 10.17226/18587.
×
Page 44
Suggested Citation:"What are the Needs of Crystallographers?." National Research Council. 1973. Computational Needs and Resources in Crystallography: Proceedings of a Symposium, Albuquerque, New Mexico, April 8, 1972.. Washington, DC: The National Academies Press. doi: 10.17226/18587.
×
Page 45
Suggested Citation:"What are the Needs of Crystallographers?." National Research Council. 1973. Computational Needs and Resources in Crystallography: Proceedings of a Symposium, Albuquerque, New Mexico, April 8, 1972.. Washington, DC: The National Academies Press. doi: 10.17226/18587.
×
Page 46

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Session I What Are the Needs of Crystallographers? Session Chairman: Philip Coppens 17

Computing Needs of the Structural Chemist James A. Ibers Dr. Hamilton has asked me to outline our research group's computational needs and our thoughts on how such needs may best be met. Although I am a structural chemist with considerable computational needs, I must emphasize that structural chemists are a heterogeneous group and that our group's needs and attitudes may not be typical. In outlining compu- tational needs, I will comment also on the operation of the Vogelback Computing Center at Northwestern University. In my position as Chairman of the University Computing Committee over the past four or so years, I have gained some insight into the support problems of a large computer in a private educational institution and I hope information on this subject will be of interest to others. Mainly for the benefit of noncrystallographers, I present in Figure 1 some features of our data-collection operation. I must emphasize that we are indeed data processors, and this fact has implications with respect to remote computing, a subject I will take up presently. Figure 1 indi- cates that we accumulate roughly two crystallographic data sets per month. Picker FACS-l with monochromator and magnetic tape output Down time: Less than 3% Idle time (mostly set-up): 40% Actual data collection: about 60% Typical data rate: 350 reflections/24 hour day Data sets October, 1971 through March, 1972: ll Figure 1 Data Collection Figure 2 provides information on our university computer, on the average cost per structure, and on the breakdown of our crystallographic computations by type. Most of the money is spent on refinement, generally computations involving non-linear least-squares analysis. Again with 18

relevance to remote computing, I should add that the UPDATE system of CDC is very convenient for program management, and that we have ceased some years ago to maintain card images of our programs. The fact that our program library is on a permanent file is also important, and assures that all allowed users of the file have access to the same versions of the programs. 1. CDC 6400 with expanded core storage, 64 000- word core, 4 tape units, plotter 2. Program library on permanent file 3. Source library in UPDATE form 4. Average cost per structure: $2500 at $500/hr. 5. Breakdown of crystallographic uses: Program development: 2% Processing of data: 5% Structure solution: 5% Refinement: 807o Error analysis, drawings, etc: 87o Figure 2 Computations Figure 3 indicates the types of problems we have handled recently. Those familiar with our work know that without triphenylphosphine as a ligand to stabilize a transition metal in a low-valence state, we would feel very lost indeed. In this respect our computational needs may not be typical of structural chemists; even with the theoretical dodges we use, such as treating the phenyl rings as rigid groups, our problems tend to be large by ordinary standards. The size of the problem is really de- fined by the number of variables to be determined, and this number also defines the cost, because of the predominance of the refinement procedure over all other computations in terms of time. Two of the eleven problems summarized in Figure 3 exceeded the cap- acity of our CDC 6400. By this I mean the following. In carrying out the non-linear least-squares refinement, one sets up the matrix of the normal equations, which will be of order N x N if there are N variables to be refined. Because the matrix is symmetric, only N x (N+l)/2 cells are needed. On our CDC 6400 we can obtain a maximum of about 140 000 cells for program and data. This proves to be a limitation at about N = 240. 19

When we have more than 240 variables we must resort to dubious mathematical tricks to carry out the refinement. These tricks cost much more than the hypothetical solution of carrying out the normal calculations on a machine of similar speed but greater storage capacity. Average number of independent atoms: 15 Average number of phenyl groups: 5 Typical number of variables: 200 Number of variables exceeded 6400 capacity in 2 cases out of ll Figure 3 Problem Type Figure 4 gives an estimate of our sources of computing money for crystallographic calculations. I have divided the money into "hard" and "soft", and it is necessary to define these terms. Hard money comes from the outside. It is the kind of money that computer center directors and university controllers are very fond of. Such money comes from the var- ious funding agencies, from private sources, etc. Soft money is a short name for internal regulation of computing. The process may work something like this: In one manner or another a Dean obtains a certain amount of soft money to be handed out to his department heads. He puts this in his bottom drawer and doles it out to the department heads who in turn may dole it out to various potential users. This process provides both reg- ulation and power. In a typical university, control of money means power and hence the Dean maintains power. But more important, a regulatory system is built in. If Professor X goes through his soft computing money like a shot, he must ask his department head for additional funds. The department head may take funds away from others in the department or he may go back to his Dean for more money. In either case, someone at the appropriate level in the university is going to question Professor X's proclivity to spend computing money. This is as it should be, for the University Computing Committee and similar committees are not in the regulatory business. They simply have no way of assessing the value of Professor X's calculations. Over the years we, at Northwestern University, have devised various schemes for the distribution of soft money, one in particular is this: If Professor X brings in large amounts of hard money, he is rewarded with similar amounts of soft money, and the reward is on a graduated scale. This procedure has worked very well. It has enabled us to make realistic requests to the grant- ing agencies and has assured Northwestern that money allocated to computing in such grants is indeed spent on computing, for the incentive is there to 20

spend it to obtain soft money. If one transfers hard computing money at the end of the year to cover debts in the supplies account, then one clearly does not gain additional computing capacity through soft money. Such book transfers were far more common before the incentive system was devised. Figure 4 indicates also roughly the amount of hard dollars that North- western might lose if I switched the bulk of my computations to a CDC 7600 outside. I return to this point later. Hard money: $15 000 Soft money: $30 000 Loss to university computing center if CDC 7600 is used off campus: $10 000 Figure 4 Payment for Crystallographic Computations Figure 5 presents a rough breakdown of the sources of support for our computing center. The University puts in the bulk of the money to run the center. Sponsored research, covered by grants, provides about 25%, and about 10% comes from other outside sources, for example, other academic institutions using our computer. As a non-profit institution we cannot, of course, accept computing from profit-making organizations. The budget for the computing center is based on staff, maintenance, and amortization of the computer. The hourly cost is derived basically by dividing the budget figure by the number of hours available for computing. University 65% Sponsored research 25% Outside sources 10% Annual budget: $1M (including $200 000 amortization) Figure 5 Sources of Support for Northwestern University's CDC 6400 Figure 6 indicates the major types of operations that take place on our computer. These various operations represent diverse needs of diverse individuals, and it is probably impossible to provide for all of them efficiently on a given computer. Moreover, the needs are ever changing: computer-aided instruction was not even with us five years ago. It is clear- ly necessary to consider these operations when projecting computer needs for the next decade on a given campus, and such a projection is difficult. For example, the medical, dental, and law applications are in their infancy, 21

but with the big push toward providing better medical and dental care, not only will there be vastly increased uses of computers in these fields, but funding will be available to obtain giant computers. 1. Administrative data processing 2. Library processing 3. Batch processing 4. Number crunching (crystallography) 5. Interactive processing 6. Interactive programming and time- sharing 7. Information retrieval and large-scale data bases 8. Laboratory processing 9. Medical, dental, and law applications 10. Computer-aided instruction ll. Computer graphics Figure 6 Major Needs of Northwestern University Users Some time ago I sat on a Long-Range Computer Needs Committee which fearlessly tackled the question of the projection of computing needs at Northwestern University. The conclusions reached are shown in Figure 7. It is not appropriate here to go through our reasoning. The salient points are that we could not justify a larger machine, and that the number-crunchers and some other users will be expected to go elsewhere for their computing in the future. Our CDC 6400 is currently used about 140 hours per week, and we anticipate that the load it can handle can be extended considerably by the hardware additions indicated in Figure 7. The decision that some users will go elsewhere for their computing is a painful one to a univer- sity, because, generally speaking, those with big computer demands also bring in hard dollars. Nevertheless, there seems to be no choice. 1. Projected usage does not justify bigger machine 2. Administrative processing should remain separate 3. Additions to hardware are necessary to handle many of the tasks (expanded core storage, expanded discs, additional printer and plotter, at a total cost of about $500 000) 4. Number crunchers will have to go elsewhere with consequent loss in income to Northwestern University. Figure 7 Conclusions of Long-Range Computer Needs Committee 22

What crystallographers need are faster machines with increased high- speed memory. Figure 8 indicates two of the great advantages of the CDC 7600 over the CDC 6400 for our purposes. On the CDC 7600 one can carry out in 5 to 10 minutes of central processor time what one might have been able to do on the CDC 6400 in 3 hours. Not only is the resultant calcu- lation much cheaper but it has more chance for success, because the sys- tem is not given as long a time for a possible failure. Since furthermore the CDC 7600 has effectively infinite core in its large core memory, an entire new set of crystallographic problems become open to successful at- tack. In this discussion I have limited myself to the CDC 7600 as an example of an extant, giant computer. I presume similar machines are avail- able from other manufacturers, but I am personally unfamiliar with the problems of using such machines. 1. "Infinite" core and hence computations can be made that are not possible on 6400. 2. The 7600 is 30 times faster than the 6400 with consequent savings in dollars. Figure 8 Advantages Of Remote Computation (CDC 7600) The following remarks, appended since the Albuquerque meeting, re- late to some of our experiences with remote computing that have occurred since then. We have recently been performing remote calculations on the CDC 7600 at Lawrence Berkeley Laboratory. Transmission to and from the 7600 is done over telephone lines through the 200 User's Terminal at our Computer Center. We have not carried out large data-processing calcula- tions, but rather have restricted ourselves during this trial period to potential-function calculations that require only modest amounts of data transmission, but do require the speed and capacity of the 7600. Figure 9 is one I showed at Albuquerque. It was my guess concerning disadvan- tages of remote computing. I am now able to comment on some of these points. We have had essentially no compatibility problems. A 6400 UPDATE tape compiled directly on the 7600, and the resultant program, produced answers that were identical with those obtained on the 6400. We have had some difficulties with 7600 control-card instructions, mainly because of the difficulty of finding the right source of information at LBL. We have also had some difficulty obtaining up-to-date information on system changes, although the most important ones are put on a common file which we can ob- tain daily. We have not tried to transmit large data files, nor do I be- lieve that this is feasible. I think it is inevitable in the types of op- erations we do that there be sympathetic cooperation at the other end. We are going to have to mail data tapes and answer tapes back and forth; we are going to have to create permanent files that don't get wiped out on 23

Monday mornings, etc. Thus far we have not experienced any difficulty with tape storage at LBL. 1. Machine incompatibilities 2. Remote and unexpected software changes 3. Difficulty of transmitting large data files 4. Difficulty of tape and disk storage 5. Government bureaucracy Figure 9 Problems in Remote Computation On the basis of this limited test I am pleased with the results and am optimistic that remote computing on problems too large or too expensive to handle locally will become commonplace among crystallographers. More- over, as I have tried to indicate, we really have no choice. It is hoped that a symposium such as the present one will not only make our needs known but will facilitate the establishment and efficient use of national or re- gional computing facilities. DISCUSSION Jeffrey: How big is Northwestern University? That is a parameter relating to usage. How many faculty and how many students? Ibers: Northwestern has 6500 undergraduates and approximately 2000 grad- uates on the Evanston campus. The expectation for growth in either of these is zero. We are a private institution and are not planning to expand. We have a large graduate school on the Chicago campus, in Medicine, Dentistry and Law, and these programs will expand under government pressure. But on the Evanston campus, we can estimate com- puting need on the basis of a fixed population. Jeffrey: How many faculty? 24

Ibers: There are about 400 faculty members in the College of Arts and Sciences, representing some 75 per cent of the total faculty. Caughlan: What fraction of your usage is administrative and what fraction is educational, including instruction and research of graduate students? Ibers: The administrative computing is done on a separate computer, and so does not affect us. The keeping track of where books are from the library is also done on that computer. The 6400 is used only for re- search and instructional purposes, and the usage is approximately 35 per cent undergraduate, 35 per cent graduate, and 30 per cent faculty. The distinction in the last two areas clearly is not easy because some professors assign computer numbers to individual graduate students, while others use a blanket number for themselves and all their graduate students. But the instructional uses are about one third. Coppens: What is done with faculty members who have no funds for research, do they get time? Ibers: The faculty members who have no funds get soft money up to a point, and as far as I know this point, with rare exceptions, provides them with enough to do what they need to do. A man who has no money from the outside and wants to become a major user of the computer is clearly destined for some talks with his dean. Dewar: I would like to pursue the point you made that the 6400 in its present state will meet needs other than in the one category of number crunching. You mentioned ten years as a possibility. I wonder whether that is likely, because I think the potential for explosion in usage is even greater in some of those other areas, computer-assisted instruction and data-base retrieval, for example. You may be talking about demands for a thousandfold increase in on-line storage, which is conceivable with projected hardware. What are some of the parameters that go into that decision? Ibers: It is difficult to feel confident about projections of computer usage. Nevertheless the Long-Range Computer Needs Committee at North- western did talk extensively with diverse groups of people, many of whom had grandiose schemes for computing in the next decade. Obviously one of the major parameters that went into our thinking was hard money, or rather the lack of it, for extensive computer changes. Perhaps I left the wrong impression in my talk. We did not conclude that only the number crunchers would go off campus to compute; they were simply the most obvious group at the present time. But should we get heavily involved in computer-aided instruction, then money will not be spent at the computer center but will possibly buy renote terminals for a tie-in with the Plato system at Illinois. In any event, I too have been around the computer game for many years, and know your worries when you imply that estimates on computer usage always fall short. 25

Nevertheless we think our estimates are based on reasonably hard facts. Dewar: There are other areas where going off campus probably looks very attractive, for example, large scale manipulations with the census files. Ibers: Another example at this time is that we run our Chemical Abstracts searches at Argonne National Laboratory, not because we want to but, as I understand it, because their format is difficult to change away from IBM. So we pay for the Chemical Abstracts searching at Argonne. Williams: I have a question related to funding but in an indirect way. As you know, the government is about to buy 10-12 large machines and I am wondering about the problems of conversion. I've heard, being an IBM 360 user, that the conversion between the CDC 6000 and 7000 series involves quite a change. I wonder if you could comment on the conversion problem, particularly with respect to interconversion be- tween IBM and CDC 7000 series machines. Ibers: The incompatibilities between us and the LBL at Berkeley are no more serious than incompatibilities between one 6000 center and another, and these incompatibilities will lessen as the 7600 at Berkeley insti- tutes more of the Scope system. Jeffrey: At the University of Pittsburgh we have disposed of the soft-money problem by putting the computer on overhead. It raised the overhead by about three per cent, but saved real money by reducing the bookkeeping and bureaucracy of the soft-money operation. Coppens: Does that mean free access for everybody? Jeffrey: Yes. The resources are allocated on a departmental basis, based on what is considered to be the department's justifiable use. The prior- ities and departmental use are programmed into the computer. Responsi- bility for proper use of the computing resource is then placed at the departmental level. Sayre: You have indicated that crystallographers may often want to turn to special off-campus machines for their large number-crunching compu- tations. You have pointed out two associated problems, that of data transmission and that of administrative complexity. What about cost? Do you have a cost figure for the 7600 you are trying to get on to, that can be compared with the $500 per hour you pay on your own campus? Ibers: Yes, and this is part of the problem. The rate of the 7600 at LBL for contract users, people with AEC money, is $600 per hour for a machine that is 30 times faster than one for which we pay $500 per hour. The reasons for this of course have already been pointed out. The AEC 26

considers only operating costs in arriving at an hourly charge. They do not consider amortization or the initial expense of buying the machine. So it is a very attractive computer. 27

Crystallographic Computing in a Small Institution Without Large Inhouse Computers Helen M. Berman Introduction In an institution the size of ours, with about four hundred personnel of whom perhaps twenty have any use for computer facilities at all and only four or five need extensive computing capabilities, it is clearly not prac- tical or desirable either to own or to rent and maintain a large computer installation. Among the options that were open to us were to (1) rent or buy a small computer or (2) rent or buy a remote terminal that can access the large computers at computer centers. We chose to rent two types of re- mote terminals, a key punch, and a card sorter. I will describe our facil- ities and try to evaluate them in terms of the needs of a small crystallo- graphy laboratory. A Remote Batch/Intelligent Terminal: Univac DCT 2000 The DCT 2000 operates as a remote arm of a central computer and also has some processing capability. Table 1 lists the specifications of our pre- sent system. It has a line printer, card reader, and card punch which can function both on and off line. We have a Bell Telephone Data Set which modulates and demodulates digital data to go over the analog common carrier. It operates over a public dial-switched network and allows a maximum transmission rate of 2000 bits per second. The connections between the data set, the termi- nal, and the telephone lines are electrical. The data are transmitted in eighty-character blocks. When errors are detected in transmission there is automatic retransmission. We also have a modem that transmits the data over a private four-wire line to one particular computer. Its maximum transmission rate is 4800 bits per second. The principal reason for the increased rate is that one pair of wires allows continuous trans- mission of data while the other pair is used for checking. With public lines we can use only two wires for both processes. The private line is also considerably less noisy and there are fewer automatic retransmissions. Requirements for the Central Computer The computer must have the hardware to accept the fast transmission rate and the software to decode ASC-H and the specially blocked character strings. In practice this limits us to UNIVAC ll08 facilities and some CDC 6600 facilities. IBM uses the EBCDIC code. 28

TABLE 1 PRINTER CHARACTERISTICS Printing Speed: Maximum rate of 250 lines of alphanumeric charac- ters per minute; 60 lines per minute with voice-grade tele- phone line Printing Positions: 128 Paper Speed: 25 lines per second Special Features: Transmit/Receive Monitor, Offline Listing Form Control READER/PUNCH CHARACTERISTICS Cards: Standard 80-column cards Reading Speed: Maximum rate of 210 cards per minute, 75 cards per minute on voice-grade telephone Reading Method: Photoelectric read station Punching Speed: Maximum rate of 75 cards per minute for 80-column punching Punching Method: Two columns at a time Input Hopper Capacity: 1200 cards Output Stacker Capacity: 850 cards CONTROL UNIT CHARACTERISTICS Transmission Method: Block by block Transmission Mode: Half duplex; 2 or A wire (nonsimultaneous; two- way transmission) Transmission Facilities: Voice-grade telephone toll exchange or private line Transmission Rate: 4800 bits per second (private line); 2000 bits per second (switched telephone network) Transmission Code: ASC-II, XS-3 (DLT compatible) Buffer Storage: 256-Character capacity - Two 128-character core memory buffers Translation Capabilities: Card Code/Transmission Code, Hollerith/ ASC-II, Hollerith/XS-3 (DLT compatible) Special Features - Error detection and retransmission, telephone alert, select character capability, short block capability, peripheral Input/Output channel, unattended operation 29

Description of the Actual Operation At present we have contacts with two commercial data processing centers in the Philadelphia area and one university center in New York City. We transmit our data over a voice-grade public telephone line via the card reader. One company gave us a private line to encourage us to ded- icate all our computing to them. Our intensity data are collected on magnetic tape and sent via mail or messenger service to the computer center. As much as possible, we try to store all large data sets and programs on disk files or magnetic tapes so that we do not have large input decks for everyday work. We use the X-ray 70 system for most of our standard crystallographic calculations and use stand-alone programs for special applications. In practice, we do about ten calculations a day via the DCT and divide our work about equally between the two commercial centers. If we wish, we can submit several jobs at once. Almost all output comes over the line printer and card punch. Occasionally, for very large jobs, we have a messenger deli- ver the output. All our CALCOMP plots are delivered. The computers have software such that we can submit a job, terminate, and retrieve the out- put later in the day. The turn around time at the commercial center is usually about one hour. If we wish, we can leave the terminal on line all day and allow the output to come back when it is ready. While it is theoretically possible to leave the terminal completely unattended we almost never do because of printer jams. We run our full-matrix least squares at the university center because of the very low rates for central processor time. However, the transmis- sion rate is extraordinarily slow due to the bad phone connections be- tween Philadelphia and New York City. Furthermore, the long-distance tele- phone rates add to our costs. In Table 2 are listed our costs. It is clear from these numbers that the charges for central processor time are very variable and at the commercial centers very high. However, the other aspects of the service are good and the telephone connections are satisfactory so that at pres- ent we are not inclined to use less expensive but more inconvenient ser- vices. TABLE 2 DCT 2000 terminal (rental and maintenance) $917/month Telephone service $200/month UNIVAC ll08 CPU time, commercial rate $800/hour CDC 6600 CPU time, AEC rate $200/hour Magnetic tape storage $10/month 30

The Simple Input/Output Terminal We have recently experimented with the use of an ordinary input/output terminal with nothing more than a keyboard and acoustic coupler. Our transmission speed is 300 bits per second in ASC-II code via an ordinary telephone line. The purchase cost of such a terminal is about $3000, or $150 per month for rental. The asynchronous eleven-bit character trans- mission is compatible with a large number of computers. At present we access the CDC 6600 at Brookhaven National Laboratory with this terminal. To test this computing arrangement we refined a structure by full-matrix least squares. The data were sent via mail since we do not have a card or tape reader. The least-squares program was modified so as to trim the output considerably. We use the FOCUS system which is a multiple access file handling system. All the appropriate input parameters were stored on files which were edited at various stages of the refinement. The editing features were also used to output selectively the results of the refinement. The turn around time varied from 10 minutes to about three hours. With the present system at Brookhaven we can only submit one job at a time from a particular terminal; we cannot terminate and retrieve the output at some later time. When a job is complete we can make the output files into permanent files and print the full contents later. An Fo, Fc listing for about 1200 reflections takes about half an hour. In practice we can run about two big calculations a day with one terminal. The unreliability of the telephone lines causes us to be disconnected about twice in an eight hour period. Appraisal of Remote Terminals In general we are satisfied with our computing system, with some reser- vations which I shall outline. Certainly in our situation it would be impossible to maintain a large computer, and the nearest large computer center is fifteen miles away. Our choice then is between utilizing a mini- computer to its utmost or using a remote terminal system. At the time the decision was made to change from the IBM 1620 to a more modern system, the only small computer available to the laboratory that was within the budget was the IBM ll30. Programming the ll30 or a mini-computer to do crystallographic calculations was and still is a formidable task, and the laboratory did not have the personnel or inclination to approach the task. It seemed more prudent to use computers that were already programmed for crystallographic calculations, which at the time meant having access to an IBM, Univac, or CDC computer. I outline below the advantages and disadvantages of our system. Advantages 1. The turn around time is quite short and one bypasses the usual delay incurred in waiting for an operator to remove output from the line printer. 31

2. We have tremendous versatility and can access a variety of large computers. We are not committed to any one computer, so that if service deteriorates at a particular center we can easily switch to ano- ther. 3. On the I/O terminal using the FOCUS system at BNL, one can take advantage of the editing features to easily change data cards and shorten the output. For example, if all you want to see is how one atom has re- fined, you can search for that atom alone and by-pass the rest of the output. Debugging of programs also is simpler. One learns very quickly how not to be "card" bound. Disadvantages 1. The public telephone lines are not reliable and as a result the transmission rates are slowed. The telephone rates are high for long distance calls. Leased lines are more reliable but they force us to ded- icate all computing to one center. 2. The commercial computer centers are not stable and we must be on our guard against business failures. 3. The cost for operating time is high and, since we must pay for our computing in real cash, this considerably limits any experimental computing we might contemplate. 4. We do not have a cathode-ray-tube or magnetic-tape reader. Our Concept of an Ideal Computer System For us the ideal system would provide a remote terminal with display capability and enough memory so that we could do small calculations on site. The ideal computer center with which we would communicate should have the hardware and software to handle a variety of remote terminals. It should be fast and have a large core. The center itself should have some programming experts to aid us in some of our computing problems, and a well documented library of essential crystallographic programs that can be easily accessed. Finally, the rates should be low. If all these criteria were met by the center we should probably be able to in- vest in a leased line and thereby by-pass our problems with the public telephone lines. 32

DISCUSSION Baur: ^t would be interesting to get from you the cost per structure, ^"because you really know the amount. Berman: When I saw the graph I was surprised, because our cost comes out much higher. Our computing budget is about $35 000 a year in real money. It's hard to say how many structures we've solved per year because we got our diffractometer only last year. Our total cost of computing is high and I think it could be lower if we didn't have to deal with a commercial company. Baur: There are several universities in the Philadelphia area. Can't you deal with them through your remote terminal? Berman: No. The University of Pennsylvania has a 360 which people on site seem to have trouble using. We can't communicate with the 360 at all. Really our way is the only way we can do it. Coppens: I think we should be careful about talking too much about the cost per structure because your structures may be larger. Berman: Yes. One structure may take much longer to solve and therefore we may have fewer structures. Also we are dealing with very rigid systems. The commercial centers have two priorities. We do overnight computing but it really doesn't save us much. Dewar: In my experience these terrible troubles about telephone lines are confined to the east coast in general, or non-Bell Telephone areas. Certainly we have had no problems whatsoever in the mid-west, which I think is worth mentioning for those who don't have the ex- perience. Secondly, the DCT 2000 is rather a curious choice given your aims. What were the parameters of that decision? There are at least a dozen devices on the market that are fully programable and can communicate with any computer in sight. Berman: Not at the time we made the decision, which was in 1967. I agree that there are many good terminals now - better than the DCT. Dewar: It's much easier to make your end flexible than it is to march around the country trying to make an arbitrary number of computers flexible. Berman: Yes, but to get it taken out and install something now would be quite difficult. I agree with you. Hamilton: Your computing costs are about $10 000 per man year and my averages seem to be about $5000. 33

Berman: Yes, as I said, I think our computing costs are too high and I would like to see something done to lower them, mainly lower CPU rate. Ibers: Personally I want to run my programs at someone else's institution. If I must run their programs, then easy access to modification of those programs is essential. Let me hasten to add that even our own programs must be modified on occasions for a particular problem. Coppens: I understand you can do modifications of programs. Berman: Yes, It's done all the time but it turns out that for routine structure analysis we tend to use X-ray 70 and are happy with it. Yes, we can do any kind of computing we want. Young: There may be another side to the point that Ibers raises, which is, in most of the computing we do we would be happy to use his pro- gram wherever it is. Okaya: Since you have an automatic diffractometer, that means you have a small computer in your laboratory. Berman: Yes. Okaya: Is it possible for you to use this in a kind of time-shared mode? Berman: No. Okaya: It must be much cheaper in the long run than spending so much money. Could you perhaps do all the refinement on that small computer? Berman: We have so few people in the laboratory that if one person devoted all his time doing the programming for that, there would be so much less that we could do of other types of work. Okaya: Perhaps Syntex could make that possible for you. Berman: Yes, that's what I'm hoping. Caughlan: About ten years ago, we used a remote terminal to connect with the UCLA 7094. We had a lot of trouble with telephone lines there too, and this was card input-output. It was very difficult, and this indicates something about distance transmission. Zalkin: I have a question regarding cost. The NYU computer costs $200 per hour and the other commercial ones are much higher. I don't quite 34

understand why you just don't use that one all the time. Berman: Because it's very frustrating trying to get through the tele- phone lines. Zalkin: To New York? Berman: To New York. It would be great to do all our computing at Brook- haven or at NYU. Zalkin: Are most of the commercial outfits close by? Berman: They're in the Philadelphia area. Bernstein: I'd like to comment on the telephone lines. We've also had quite a few problems. We investigated the situation. I spoke to the people who design some of these lines. It seems the ones you are using are voice-transmission lines and were not designed for data transmission. Also, there has been a strike against New York Telephone, and we are assured that things are going to improve. Even though they are voice-grade lines they should transmit data. The company has been working on ways of improving it. Ibers: I have heard that the FCC is considering the problem of broad- band microwave transmission. If such a means of transmission is allowed, then it should materially improve the possibilities of handling large data sets. Does anyone have any knowledge of the situation with respect to microwave transmission? Suddath: Our 370 is tied with Harvard by microwave and my understanding is that it's working out quite well, very high transmission rates. Berman: Just using the leased telephone line makes all the difference in the world. Murphy: We're on the ARPA net. It's a fifty-kilobit line and extremely reliable. If more networks like this could be developed so that smaller institutions might get on, it might really relieve the pro- blem. Wo1ten: A little over 12 years ago I worked for North American Aviation. They had computers at several of the various installations spread out over the Los Angeles area. They were all linked by microwave. Any program could go to any computer that was available at the time and the answer would come back the same way. Dewar: In case people do have trouble with transmission, there exist modems that will solve these problems completely if you can afford 35

to install two of them, one at each end, error-correcting modems. There are several of these on the market and I'm sure they would eliminate most of your problems. Berman: If we had done that, it would have dedicated us to one computer center, and we haven't yet found the computer center we are willing to make that dedication to. Meyer: There are several commercial firms that plan within the next few years to blanket the country with data transmission networks. DATRAN for one. 36

Computing Needs of Protein Crystallographers Keith D. Watenpaugh The growth of crystallography has been closely linked to the growth of computer science and technology in general. As computers became faster and more sophisticated, the rate of growth of crystal-structure determina- tions and their degree of refinement and accuracy increased. In no field of crystallography is this more evident than in protein crystallo- graphy. Protein molecules are at least an order of magnitude larger than those studied in normal crystallography, and along with this large size comes very real problems and experimental limitations associated with the collection and treatment of data. In fact, many analogies may be drawn between the state of the art of protein crystallography now and that of ordinary crystallography 15 to 20 years ago when smaller and slower computers were just coming into use. Computers are important not only in the processing of crystallo- graphic data but also in the collecting of the data. The mass of data necessary to solve protein structures even in moderate detail, as well as protein's almost universal instability (especially under X-ray bom- bardment) , requires automated high-speed data collecting and processing. Presently, either computer-controlled diffractometers or computer-controlled film-scanning densitometers are used for this purpose. This aspect of crystallography is discussed later in this symposium, and I mention it here only because it is an indispensable part of protein crystallography. Also, new, extremely high-speed data-acquisition systems are now being developed (Xuong and Vernon, 1972). Computer application to solving protein structures through phasing by multiple isomorphous-replacement techniques (Blow and Crick, 1959) has become quite routine. Also, improving heavy-atom parameters by alternating cycles of least-squares refinement with cycles of phasing by multiple isomorphous replacement has become standard (Dickerson et al., 1968). However, this method of determining phases cannot usually be extended be- yond 2.0 A resolution (resolution is usually defined as the minimum in- terplanar spacing to which data were collected). A Fourier synthesis (electron-density map) calculated using these phases and fitted by some model giving approximate atomic positions usually is the final step in the structure determination. This is so for a number of reasons, inclu- ding the difficulty of obtaining good higher-resolution data and the limi- tations of present-day computers. Further computer applications at this point may include fitting a model to the electron-density map while maintaining some constraints 37

on the model (Diamond, 1971). This allows approximate atomic positions to be calculated, but their accuracy is quite uncertain. Important use of computers has also been made in studying the protein conformations with computer-controlled display systems (Barry and North, 1972). Tollin and Rossman (1966) have described various rotation-function programs. Programs of this type may be used to fit known protein models to the crystallographic data of similar unknown proteins in order to solve rela- ted protein structures without using isomorphous-replacement techniques. However, the most demanding use of computers in the near future is going to be in the refinement of protein structures to produce much more accu- rate models. Since the phased data in even a "high resolution" protein structure do not approach the precision required to resolve individual atomic posi- tions, the current protein models are poor by regular crystallographic standards. The need to improve these models is indisputable. As more and more structures are being determined to 3.0 A, 2.5 A or 2.0 A reso- lution, it is becoming painfully obvious that the models simply are not accurate enough to explain the unusual and unique physical and chemical properties of many proteins. Practically nothing can be said about bond lengths or angles in proteins, and even atomic positions have uncertain- ties of the order of a half an angstrom or more. Following are just a few of the questions that may be answered if more accurate protein structures are obtained, with computer refinement of protein structures playing a primary role: 1. Vallee and Williams (1968) have proposed an entatic state or region of abnormal conditions in the proteins as giving rise to internal activation by geometric and/or oelectronic strain. Stretching a bond by 0.2 A or twisting it through 20 can produce very large changes in ener- gies, yet be entirely missed by present protein X-ray crystallographic analysis. 2. High orientation dependence has also been proposed as contri- buting to the unusual catalytic properties of enzymatic proteins. Strom and Koshland (1970) have suggested that large rate increases may be rea- lized by proper orientation of reacting molecules, and that enzymatic reativity may be due to this "orbital steering" ability. They propose that changes in angles of as little as 10° may produce rate changes of 10 , again outside the range of present protein crystallographic accuracy. 3. Nonplanarity of peptide groups as well as the close proximity of atoms appear to be implicated in the activity of lysozyme (Barry and North, 1972). An accurate structure is required to assess the degree of nonplanarity of its peptides. 4. Chromatium high-potential protein (HIPIP) and bacterial ferre- 38

doxins have similar iron-sulfur clusters in 2.2 A resolution electron- density maps, yet their physical properties are very different (Carter et al., 1972; Adman et al., 1972). Ferredoxin has an oxidation-reduc- tion potential of approximately -400 mV whereas that of HIPIP is +350 mV. An accurate description of the cluster and its environment is need- ed to explain the difference. Refinement of Protein Structures Procedures for refining protein structures fall into two classes. In the first are those that attempt to produce the best fit of the model to the electron-density map generated by determining phases through multiple isomorphous-replacement methods. The second includes proce- dures that improve the phases and/or extend the phases to higher-res- olution structure factors to produce a more accurate model than can be derived from the experimentally determined phases. R. Diamond (1971) has written a sophisticated computer program that optimizes the fitting of a model to an electron-density map while maintaining some constraints on the model. Bond lengths are kept con- stant while overlapping densities of neighboring atoms are accounted for. Some interbond angles may either be constrained or allowed to vary. This procedure appears to lead to an improvement of the model with respect to experimental data to 3.0 A or 2.0 A resolution if the electron-density map is reasonably good. However, it must be noted that refinement of a model by electron-density maps has several dis- advantages when errors exist in the data or when atoms are not resolved. Computer requirements for this type of refinement are not particularly large, but the model produced would be considered only a reasonable starting model for refinement according to regular crystallographic standards. A second procedure, which involves refining phases and extending them to higher resolution, is the so-called "direct methods." Use of direct methods is discussed by D. Sayre in this symposium. Application of direct methods to protein crystallography has not proved very success- ful yet, but some promising results have been obtained. However, compu- ting times can be enormous. In the course of the refinement of small structures, it was found that £F syntheses (difference maps) provided advantages over Fourier syntheses in the refinement of structures when atomic positions were not resolved, as is the case when working with two-dimensional data or when there are series-termination errors due to lack of higher-order data. Shifts in atomic positions are proportional to the slope at the assumed atomic positions and can be determined more easily and reliably. Appar- ent thermal motion is also more easily determined. Moreover, gross errors in the model, such as misplaced or missing atoms, can be detected. A 39

reasonable analogy may be drawn between refinement of small structures with two-dimensional data and refinement of proteins with three-dimen- sional data when atomic positions are not quite resolved. It is ques- tionable, however, whether unrestricted use of AF refinement is justi- fiable without data near atomic resolution. Computer times required for aF refinements in general are not large. With the advent of large high-speed computers, the most powerful method of refining small structures was by means of full-matrix least- squares. Even on small structures, least-squares can tax the largest and fastest computers. True full-matrix least-squares refinement on a protein structure cannot be reasonably accomplished on today's computers. Reducing the magnitude of the task by neglecting off-diagonal terms, as is sometimes done on small structures, proves disastrous with proteins, since at low resolution the correlation between neighboring atoms is high and cannot be ignored. Further problems may arise because the number of structure factors does not greatly exceed the number of para- meters to be refined and because an appreciable fraction of the crystal may be composed of solvent. The numerous difficulties and limitations associated with the refine- ment methods have prevented people from refining protein structures in spite of the great need to do so. However, it is important to know whe- ther it is possible to refine proteins and to what extent the model may be improved. Refinement of a Protein Structure A brief summary of the refinement of rubredoxin provides a convenient way of describing the magnitude, merits, and limitations of various re- finement procedures (Watenpaugh et^ al. , 1971). An accurate description of the iron-sulfur complex as well as the chain conformation is essential in understanding the unusual physical properties of this protein. Also, it is a good structure on which to test protein refinement because of its relatively small sizg and because significantly observable data exist to a resolution of 1.5 A (approximate- ly atomic resolution). c Atomic positions with which to begin the refinement were picked off a 2.0 A electron-density map phased by multiple isomorphous replacement. Structure factors based on these parameters had an R-index of 0.37. The R-index, defined by R Fo 40

is used to measure the discrepancy between the observed and calculated structure factors. Models of small structures with R-index in the vicin- ity of 0.4 can usually be refined. It is convenient to quote computer time and costs of the CDC 6400 computer at the University of Washington to compare the magnitudes of the various steps in the refinement of rubredoxin. The present charges on this computer are approximately $5.00 per minute for the central pro- cessing and a variable amount for input-output and peripheral processing. The X-ray 70 crystallographic computing system supplied by Dr. Stewart was used for all major computing (Stewart, 1967). The structure factor calculation (Fc calculation) over the approximately 500 atoms in the asymmetric unit or 1500 atoms in the cell on more than 5000 observed reflections requires approximately 65 minutes of central processor time and costs $350. 0A AF synthesis following the F calculation, with 2 grid points per angstrom, takes 40 minutes and costs $250. Initial refinement was with AF syntheses for several reasons. The maps provide a constant check on whether sensible corrections are being applied, allow modifications in assignment of peptide residues since the sequence was not known, and keep computing time and costs within reason- able limits. Calculations of the shifts in atomic positions from theAF syntheses were done by hand. Approximately 30 man hours were required at this task of calculating shifts on about 500 atoms per cycle. The AF refinements were sufficiently well behaved to improve the structure significantly and allow identification of additional residues the assignments of which had been questionable at the outset. Four cycles ofAF refinement decreased the R-index from an initial value of 0.37 to 0.22. Since the AF refinements were fairly well-behaved it seemed feasible to attempt least-squares refinement. It is impossible to take full advan- tage of the superior characteristics of full-matrix least-squares refine- ment because of the magnitude of the problem. Even this small protein, with approximately 600 atoms including water molecules in more or less discrete locations, involves 2400 parameters to vary (x,y,z and isotropic thermal parameter). Even to store the unique part of the symmetric matrix would require 3 million words of storage. The computer time to build such a matrix in the course of a refinement is not feasible on currently avail- able computers. The CDC 6400 computer at the University of Washington with a core size of 64000 words is capable of refining 240 parameters. Therefore the matrix was partitioned into blocks of about 240 parameters associated with neighboring atoms and requiring 10 passes through the com- puter to complete one full cycle of refinement. One cycle of refinement requires approximately 17 hours of central processor time and costs about $5400 to $6000 as compared with about $600 and a week of hard work for AF refinement. Bond lengths and angles were calculated after each least- 41

squares cycle, as were AF syntheses at various stages to check the be- havior of the refinement. The least-squares refinement behaved sur- prisingly well, with no general tendency for atom positions to oscillate or diverge, in spite of the lack of complete atomic resolution and the not highly overdetermined constraints on the parameters. After four cycles of least-squares refinement, the R-index is 0.126, but some re- gions of the molecule are still poorly defined because of either high thermal motion or disorder. It is now evident from the refinement that there is very significant distortion in the tetrahedral configuration of the iron-sulfur complex. Three iron-sulfurobond lengths may be not significantly different from each other (2.34 A, 2.32 A, 2.24 A with standard deviations of 0.03 A) and agree well with those observed in small crystal structures, while the fourth is short (2.05 A) suggesting an entatic nature for this protein. The more accurate iron-sulfur cluster as well as the more accurate descrip- tion of the surrounding polypeptide should allow for more quantitative theoretical calculations to be performed to explain both electron-transport mechanisms and the energetics of protein folding. Perhaps the most important outcome of this successful refinement of rubredoxin has been to stimulate refinement of other proteins in which better accuracy is required to explain their mechanisms. AF refinements are currently under way on both bacterial ferredoxin and high-potential iron protein at 2.0 A resolution in hopes of explaining their very dif- ferent physical properties. Subtilisin and pancreatic trypsin inhibitor are being refined to better understand behavior of proteases. Refinement is beginning on the triclinic form of egg-white lysozyme, which holds pro- mise of being capable of refinement beyond any other protein currently under study. Suddenly, refinement of protein structures is no longer in the fu- ture but in the present. The limiting factor is not whether proteins can be refined but the computational aspects of refinement. New computer pro- grams designed for the refinement of protein structures, not small cry- stallographic structures, must be forthcoming. For example,AF refine- ment techniques, which disappeared from use on small structures with the advent of high-speed computers, must be reexamined keeping in mind current computer technology and speed. New methods of least-squares refinement are required that take into account the overlap of electron densities of neighboring atoms and allow more nearly diagonalized matrices, so as to increase speed and efficiency of refinement. However, in final analysis, the dynamic growth of protein crystallography is dependent on the increas- ing availability of larger and higher-speed computers. Acknowledgment s I am indebted to Dr. L. H. Jensen for many helpful discussions and to the USPHS for support under Grant GM-l3366 from the National Institutes of Health. 42

References Adman, E., L. C. Sieker and L. H. Jensen. 1972. The structure of a bacterial ferredoxin. Amer. Cryst. Assn. Abstr. p. 66. Albu- querque. Barry, C. D. and A. C. T. North. 1972. The use of a computer-con- trolled display system in the study of molecular conformations. Cold Spring Harbor Symp. on Quant. Biol. 36: 577. Blow, D. M. and F. H. C. Crick. 1959. The treatment of errors in the isomorphous replacement method. Acta Cryst. 12: 794. Carter, C. W., Jr., S. T. Freer, N. H. Xuong, R. A. Alden and J. Kraut. 1971. Structure of the iron-sulfur cluster in Chromatium iron protein at 2.25 A resolution. Cold Spring Harbor Symp. on Quant. Biol. 36: 381. Diamond, R. 1971. Real-space refinement procedure for proteins. Acta Cryst. A27: 436. Dickerson, R. E., J. E. Weinzierl and R. A. Palmer. 1968. A least-squares refinement method for isomorphous replacement. Acta Cryst. B24: 997. Stewart, J. M. 1967. X-ray 67: program system for x-ray crystallography. TR-67-58 (NSG-398), Computer Science Center, University of Maryland. Storm, D. R., and D. E. Koshland, Jr. 1970. A source for the special catalytic power of enzymes: orbital steering. Proc. Nat. Acad. Sci. 66: 445 Tollin, P. and M. G. Rossman. 1966. A description of various rotation function programs. Acta Cryst. 21: 872. Vallee, B. L. and R. J. P. Williams. 1968. Metalloenzymes: the entatic nature of their active sites. Proc. Nat. Acad. Sci. 59: 498. Watenpaugh, K. D., L. C. Sieker, J. R. Herriott and L. H. Jensen. 1972. The structure of a non-heme iron protein: rubredoxin at 1.5 A resolution. Cold Spring Harbor Symp. on Quant. Biol. 36: 359. Xuong, N. H. and W. Vernon. 1972. A rapid data acquisition system for protein crystallography. Amer. Cryst. Assn. Abstr., p. 59, Albuquerque. 43

DISCUSSION Freer: I wish to comment on the refinement of HIPIP, the high poten- tial iron protein from Chromatium D. We were so encouraged by the progress of the Seattle group that last December we wrote a numer- ical differential-synthesis program which we hoped would mate protein refinement. This procedure has worked amazingly well. A complete refinement cycle consists of an F , a AF map and then automatic calculation of slopes and parameter shifts. For HIPIP, where we're talking about 800 atoms, the F runs 7 minutes, the Fourier 4 minutes, and the differential synthesis about 30 seconds. For a total cost of approximately $600 (for about 3 hours of CDC 3600 time), we reduced the R factor for HIPIP from 34 to 16%. Coppens: That's even lower than doing it by hand, but much faster. Freer: Yes. All I want to emphasize is that since this program has come into being such refinement is becoming practical. Watenpaugh: In starting out, for example, we've used the X-ray system designed for people who are solving lots of different structures and lots of different space groups, but when we come to protein struc- tures we're going to spend a significant amount of time on refine- ment so that it will be very important that we optimize the system for a particular space group and a particular protein. Stewart: People tend to refer to this X-ray system as being mine. The authorship extends over a great many people. In fact, Steve Freer who just spoke is himself one of the original authors. And it was not our intent that the least-squares nor this Fourier program be used for protein-structure analysis. It was written with the idea of space-group flexibility and convenience, and therefore you're paying for this in overhead in a real way. If it's used for these large structures it does cost more, and I think Steve's remarks are especially important. In the old days we really pushed for efficiency rather than convenience. There could be many short cuts made. I'm sorry to say also that I completely obliterated about two years ago our differential synthesis, destroyed all vestiges of its existence, and threw it away, believing it had been an exercise in futility. So that has saddened me a little bit. Dewar: Bearing in mind the whole purpose of this symposium, particularly in terms of looking forward to what could be done with large com- puting facilities, one conclusion I seem to hear is that one really isn't that far away. The costs you quoted are high, but entirely reasonable for the size of structure that's being tackled. Does it seem fair to conclude that even in connection with protein struct- ures, we're talking not about some miraculous new hardware two orders 44

of magnitude faster, but about very large existing computers, per- haps 7600's, at the top of the scale? This would be a significant conclusion from the point of view of building a gigantic computer for crystallographers. Coppens: We have heard discussion of the unit cost per man year in computing time; would it be higher for this kind of work? Watenpaugh: Well, just in the refinement I've spent over $30 000 in one year. The amount required for the solution of protein structures has been small, and this is about where protein people have stopped. However, as more proteins are going to be refined, you're going to see an astronomical increase in the amount of computing time on the part of protein crystallographers. Johnson: How much money goes into computer graphics on any protein struct- ure? Watenpaugh: We don't do any, but there are some groups that do, and I imagine quite a bit of time is spent in some laboratories. Freer: We actually have been spending as much on computer graphics as on refinement. Schomaker: Keith, is it true though that you are substantially inhibited in your progress by lack of money? Watenpaugh: Yes. Schomaker: You could have spent $100 000 or $200 000, or at least at that rate? Watenpaugh: Well, we're not even at convergence yet. It's just that we actually have a good start, and we still have fairly low-resolution data. We want to collect data to high resolution. Sayre: I think it should also be noted that rubredoxin is a small pro- tein and that the cost of a least-squares refinement rises approxi- mately at a rate between n and n^ where n is the number of atoms. Coppens: So perhaps there is a need for larger computers. Sayre: Yes. That's the point. Koetzle: There must be some resolution limit beyond which this sort of refinement that you're talking about is not possible. To what reso- lution would you say the data on the protein ought to go before you can initiate this process? 45

Watenpaugh: On the Chromatium HIPIP, they're working with 2 A data but I think this is the bare minimum of the resolution at which we can work. It's becoming increasingly obvious that the positions of the atoms are fairly well behaved in proteins, much as they are in small structures, and therefore, with high-speed data-collection techniques to collect the data before our crystals go to pot, we should be able to take lots of proteins to atomic resolution in the future. Jeffrey: I have an idea that differential syntheses are like block- diagonal refinement in their convergence. So you may not gain as much by differential synthesis because the refinement would be less rapid than full-matrix. Freer: The fact is that I can do the differential synthesis on a CDC 3600 with 32 000-word core, and that's what I have now. We're going to try to get on the 7600 at LBL. Being members of the Uni- versity of California, perhaps we might have a better chance. Jeffrey: I seem to remember a paper where someone related differential synthesis to a diagonal matrix refinement. Freer: Yes. I think it's equivalent to least-squares weighted by the reciprocal of the atomic scattering factors. Coppens: This was discussed in a paper by Cruickshank (Acta Cryst. 5_, 511, 1952). Seeman: Can you give the accuracy of your starting model? What's been your actual shift from the model to your current coordinates? Watenpaugh: We've had shifts of over half an angstrom, but the average shifts were probably about two-tenths of an angstro'm or so per cycle in the initial stages of refinement. J. D. H. Donnay: I should like to address my comments to the funding agencies. About twenty years ago, at the Paris International Con- gress of Crystallography, the late Professor Mauguin referred to "those crystallographers who had the courage and the audacity to tackle the protein structures." That was in 1954. What was almost foolhardy at that time is still a job that requires considerable boldness and courage today. It seems to me that funding agencies should know that we as a profession (1) have high respect for the people who work on protein problems and (2) feel that, if a nation is lucky enough to have research people willing and able to solve such problems, it should make sure that they receive full support from their government. 46

Next: What New Developments are in the Wind? »
Computational Needs and Resources in Crystallography: Proceedings of a Symposium, Albuquerque, New Mexico, April 8, 1972. Get This Book
×
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF
  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!