National Academies Press: OpenBook
« Previous: What are the Funding Agencies Doing, and What are Their Plans for the Future?
Suggested Citation:"Computing Centers and Networks." National Research Council. 1973. Computational Needs and Resources in Crystallography: Proceedings of a Symposium, Albuquerque, New Mexico, April 8, 1972.. Washington, DC: The National Academies Press. doi: 10.17226/18587.
×
Page 97
Suggested Citation:"Computing Centers and Networks." National Research Council. 1973. Computational Needs and Resources in Crystallography: Proceedings of a Symposium, Albuquerque, New Mexico, April 8, 1972.. Washington, DC: The National Academies Press. doi: 10.17226/18587.
×
Page 98
Suggested Citation:"Computing Centers and Networks." National Research Council. 1973. Computational Needs and Resources in Crystallography: Proceedings of a Symposium, Albuquerque, New Mexico, April 8, 1972.. Washington, DC: The National Academies Press. doi: 10.17226/18587.
×
Page 99
Suggested Citation:"Computing Centers and Networks." National Research Council. 1973. Computational Needs and Resources in Crystallography: Proceedings of a Symposium, Albuquerque, New Mexico, April 8, 1972.. Washington, DC: The National Academies Press. doi: 10.17226/18587.
×
Page 100
Suggested Citation:"Computing Centers and Networks." National Research Council. 1973. Computational Needs and Resources in Crystallography: Proceedings of a Symposium, Albuquerque, New Mexico, April 8, 1972.. Washington, DC: The National Academies Press. doi: 10.17226/18587.
×
Page 101
Suggested Citation:"Computing Centers and Networks." National Research Council. 1973. Computational Needs and Resources in Crystallography: Proceedings of a Symposium, Albuquerque, New Mexico, April 8, 1972.. Washington, DC: The National Academies Press. doi: 10.17226/18587.
×
Page 102
Suggested Citation:"Computing Centers and Networks." National Research Council. 1973. Computational Needs and Resources in Crystallography: Proceedings of a Symposium, Albuquerque, New Mexico, April 8, 1972.. Washington, DC: The National Academies Press. doi: 10.17226/18587.
×
Page 103
Suggested Citation:"Computing Centers and Networks." National Research Council. 1973. Computational Needs and Resources in Crystallography: Proceedings of a Symposium, Albuquerque, New Mexico, April 8, 1972.. Washington, DC: The National Academies Press. doi: 10.17226/18587.
×
Page 104
Suggested Citation:"Computing Centers and Networks." National Research Council. 1973. Computational Needs and Resources in Crystallography: Proceedings of a Symposium, Albuquerque, New Mexico, April 8, 1972.. Washington, DC: The National Academies Press. doi: 10.17226/18587.
×
Page 105
Suggested Citation:"Computing Centers and Networks." National Research Council. 1973. Computational Needs and Resources in Crystallography: Proceedings of a Symposium, Albuquerque, New Mexico, April 8, 1972.. Washington, DC: The National Academies Press. doi: 10.17226/18587.
×
Page 106
Suggested Citation:"Computing Centers and Networks." National Research Council. 1973. Computational Needs and Resources in Crystallography: Proceedings of a Symposium, Albuquerque, New Mexico, April 8, 1972.. Washington, DC: The National Academies Press. doi: 10.17226/18587.
×
Page 107
Suggested Citation:"Computing Centers and Networks." National Research Council. 1973. Computational Needs and Resources in Crystallography: Proceedings of a Symposium, Albuquerque, New Mexico, April 8, 1972.. Washington, DC: The National Academies Press. doi: 10.17226/18587.
×
Page 108
Suggested Citation:"Computing Centers and Networks." National Research Council. 1973. Computational Needs and Resources in Crystallography: Proceedings of a Symposium, Albuquerque, New Mexico, April 8, 1972.. Washington, DC: The National Academies Press. doi: 10.17226/18587.
×
Page 109
Suggested Citation:"Computing Centers and Networks." National Research Council. 1973. Computational Needs and Resources in Crystallography: Proceedings of a Symposium, Albuquerque, New Mexico, April 8, 1972.. Washington, DC: The National Academies Press. doi: 10.17226/18587.
×
Page 110
Suggested Citation:"Computing Centers and Networks." National Research Council. 1973. Computational Needs and Resources in Crystallography: Proceedings of a Symposium, Albuquerque, New Mexico, April 8, 1972.. Washington, DC: The National Academies Press. doi: 10.17226/18587.
×
Page 111
Suggested Citation:"Computing Centers and Networks." National Research Council. 1973. Computational Needs and Resources in Crystallography: Proceedings of a Symposium, Albuquerque, New Mexico, April 8, 1972.. Washington, DC: The National Academies Press. doi: 10.17226/18587.
×
Page 112
Suggested Citation:"Computing Centers and Networks." National Research Council. 1973. Computational Needs and Resources in Crystallography: Proceedings of a Symposium, Albuquerque, New Mexico, April 8, 1972.. Washington, DC: The National Academies Press. doi: 10.17226/18587.
×
Page 113
Suggested Citation:"Computing Centers and Networks." National Research Council. 1973. Computational Needs and Resources in Crystallography: Proceedings of a Symposium, Albuquerque, New Mexico, April 8, 1972.. Washington, DC: The National Academies Press. doi: 10.17226/18587.
×
Page 114
Suggested Citation:"Computing Centers and Networks." National Research Council. 1973. Computational Needs and Resources in Crystallography: Proceedings of a Symposium, Albuquerque, New Mexico, April 8, 1972.. Washington, DC: The National Academies Press. doi: 10.17226/18587.
×
Page 115
Suggested Citation:"Computing Centers and Networks." National Research Council. 1973. Computational Needs and Resources in Crystallography: Proceedings of a Symposium, Albuquerque, New Mexico, April 8, 1972.. Washington, DC: The National Academies Press. doi: 10.17226/18587.
×
Page 116
Suggested Citation:"Computing Centers and Networks." National Research Council. 1973. Computational Needs and Resources in Crystallography: Proceedings of a Symposium, Albuquerque, New Mexico, April 8, 1972.. Washington, DC: The National Academies Press. doi: 10.17226/18587.
×
Page 117
Suggested Citation:"Computing Centers and Networks." National Research Council. 1973. Computational Needs and Resources in Crystallography: Proceedings of a Symposium, Albuquerque, New Mexico, April 8, 1972.. Washington, DC: The National Academies Press. doi: 10.17226/18587.
×
Page 118

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Session IV Computing Centers and Networks Session Chairman: S.C. Abrahams 97

Some Experiences with Crystallographic Systems James M. Stewart The word "system" especially as used in the context of computers is a very broad one. The factors to be "systematized" must be defined in each instance. Since the introduction of computers, most crystal- lographers have made an effort to systematize their computing usage and from this effort there have been written a great number of useful programs. Some of these programs are used as a collection with common mass-storage definition, common core-storage rules, common data-input conventions, common output conventions and common documentation form- alism. These are the characteristics of an "operating system" as usually supplied with any computer from a PDP-8 to an ILLIAC IV. The complexity of any given system is dictated by the complexity of the computer and the user demands to be allowed on this computer. In 1961 at the University of Washington, D. High, L. H. Jensen, E. C. Lingafelter, B. W. Brown, and others, including myself, reviewed the advent of the IBM 709 and the loss of the IBM 650 and considered the possibility of producing a collection of programs that would have the following features in common: 1. Programs that produce accurate and rapid diffraction analysis. 2. Stylized input formats for characteristic crystallographic data and operations. 3. Careful and detailed documentation. 4. Mass storage files defined. 5. Independence of the local computer operating system, and com- piler. 6. Space group and setting universality. 7. Provision for large range of data set size, independent of high-speed memory (20 000-word minimum for data). 8. Highest efficiency possible consistent with the preceding criteria. The current "X-ray" system then is a result of these design cri- teria. The authors and implementors of the system are now given as an appendix in the documentation for the system (see p. 95 at the end of this paper). Over the years many authors have contributed codes to the system and in each case we have endeavored to bring these codes into conformity with our particular criteria. Also during this time many others have written other systems with different or similar criteria and features. Most of these efforts have been rewarded with the pro- duction of many interesting crystal structure analyses. For purposes of this discussion, I will confine my attention to the criteria that have been most frustrated by the immaturity of the 98

computer field, and to the advantages and disadvantages of a "system" approach. The greatest difficulties have been and are being encountered in our effort to be computer-, operating system-, and compiler-inde- pendent. We have tried to introduce a concept of using a subset of FORTRAN which we call Pidgin FORTRAN in a partially successful at- tempt to achieve this objective. I must say, however, that one tends to become more and more paranoid with each successive FORTRAN compiler that is released. I will not dwell here on all the incredible exam- ples I have seen but will simply give three, in order to give the flavor of the problem. 1. One manufacturer makes the presence of more than "N" com- ments in succession a fatal compiler error (where N is a number small at the pleasure of the compiler writers). 2. One manufacturer has one compiler in which a RETURN statement is required and another in which its presence is a fatal error. 3. One manufacturer has made the statement cause the movement of only one half a word, while A = B always causes "normalization" of B before it is stored in A. Consider the movement of alphabetical or "packed" information. The list is, of course, much longer. There are also frustrations concerning word size, actual hardware configuration, etc. But these have turned out to be minor compared to the problems of operating-system software. It must be emphasized that this problem for us has been reduced to the non-equivalence of the mean- ing of identical looking FORTRAN statements. We believe we have succeed- ed in achieving reasonable interchangeability in spite of these problems, so that we are now able to make blocked magnetic tapes on the UNIVAC 1108 capable of being read, compiled, loaded, libraried, and executed to give the same crystallographic results on a variety of other compu- ters. We also believe that given one of the others we could carry out the same operation. When one speaks of computing costs, the development costs should probably be quoted separately from the structure-solution costs with checked-out programs. The development costs for the X-ray system have been very high. And much of the cost has been in trying to beat the problem of the differences in the various FORTRANS. This problem may become worse because of the rapidly changing computer technology. With 99

terminal operation to remote terminals this feature of mutual compa- tibility may lead to more and more frustrations for crystallographers. Now to advantages and disadvantages of a "system" approach to crystallographic computing: The advantages are mainly those of convenience of use, inter- changeability from computer to computer, slow but steady improve- ment in reliability and function, and the many active users aiding in the check out of the codes. The disadvantages are mainly those of inflexibility of modifi- cation, "black box" effect on users, and some sacrifice in speed for generality (although for many potential users it will usually take a long time to recover the development time due to this sacrifice). Another problem area is that, despite our best efforts to under- stand operating systems of the various manufacturers, there is often an initial delay in implementation due to the size of the system as it is presently (1971) constituted. In summary, I believe that the system approach has something to recommend it for crystallographers who do not have a large enough group to develop and maintain their own set of programs. They are well ad- vised to consider one of the essentially checked out systems in use (1) and adapt it to their computer or use it by a remote terminal. This is especially true for those groups whose main interest is in routine struc- ture analysis. On the other hand for groups with good access to funding and computers and a few in-house programming crystallographers, they too may wish to begin to collect their own libraries. If they are not keen to distribute their "system" they may be greatly aided by the in-house operating-system software. This is, in my opinion, an extravagent way to do crystallographic computing. I believe that groups like this will find, as we have, that when the computer system changes they will have to invest a large amount of time and effort into the conversion of their libraries. I would prefer to see the efforts of this group of talented men directed to the production of better standard codes or new methods of crystallographic computing in an interchangeable form. The funding agencies might notice that much of the programming effort of crystallo- graphers in the past has gone into development and check out of manu- facturers software. This item in most cases is charged to "crystallo- graphic" computing rather than computer software development. (1) On this continent, Busing, Hamilton, Johnson, Larson, Ahmed, Ibers, Marsh, Sparks, and many others have collections of programs or systems in operation or potential operation. 100

In the development of the X-ray system, we must recognize direct and indirect support by many different sources, including NSF, NASA, ARPA, the Army, the Air Force, N1H, AEC, the Computer Science Center of the University of Maryland, the Research Computer Center of the University of Washington, the U.S. Geological Survey, the National Bureau of Standards, and the Science Research Council of the United Kingdom. These sources of support have now become less accessible in a direct way because of the prevailing economic conditions. It is still my hope to be able to maintain, improve, and distribute the X-ray system by whatever means we can find. Appendix Contributors to the X-Ray System The X-ray system has been developed over a number of years with con- tributions from a large number of people. This effort has fallen into three main categories - 1. System Editing - i.e. the writing of the nucleus, maintenance of the programs, the write-up, general organization, and system philosophy decisions 2. Program writing - without which there would be no need for a system 3. System implementation - i.e. the responsibility for providing information for making the system run on specific machines and for checkout of new system releases. Obviously, some program authors have actively contributed in other respects and due acknowledgement of their authorship is given within the program descriptions in Section 1 of this write-up. The affiliation given for each contributor is that appropriate at the time the contribution was made and should not necessarily be considered as current. System Editors Baldwin Dr. J.C. Atlas Computer Lab., U.K. Chastain Dr. R.V. Univ. of Washington, Seattle High Dr. D.F. Univ. of Washington, Seattle Kruger Dr. G.J. CSIR, Pretoria, S. Afr. Kundell Dr. F.A. Univ. of Maryland Stewart Prof. J.M. Univ. of Maryland 101

Ammon Prof H. Alden Dr R.A. Boonstra Dr E.G. Brown Dr B.W. Braun Dr R.L. Busing Dr W. R. De Camp Dr W.H. Dickinson Mr C.W. Dayhoff Dr Margaret Freer Dr S.T. Hall Dr S. Holden Dr J.R. Jarski Mrs Mary A. Jensen Prof L. Keefe Dr W. Kerr Dr Ann Kraut Prof J. Lingafelter Prof E. Levy Dr H.A. Mauer Mr F.A. Mighell Mr A. Martin Dr K.O. Plastas Mrs Linda Santoro Dr A. Schneider Dr. M.L. Takeda Dr H. Zocchi Dr M. Program Authors Univ. of Washington, Seattle Univ. of Washington, Seattle Univ. of Orange Free State Portland State College Univ. of Washington, Seattle Oak Ridge National Laboratory Univ. of Maryland U.S. Naval Ordnance Lab. Natl. Biomedical Res. Foundation Inc. Univ. of Washington, Seattle Mineral Sci. Div., E.M.R., Ottawa U.S. Naval Ordnance Lab. Univ. of Washington, Seattle Univ. of Washington, Seattle Medical Coll. of Virginia Cambridge Univ., England Univ. of California, La Jolla Univ. of Washington, Seattle Oak Ridge National Laboratory National Bureau of Standards National Bureau of Standards Oak Ridge National Laboratory Univ. of Maryland National Bureau of Standards Univ. of Maryland Johns Hopkins Univ. National Bureau of Standards System Implementors Appleman Dr D. Kirchner Dr R. Lenhert Prof P.G. Morosin Dr B. Protherough Mr M. Snyder Dr R. Thomas Mrs Judith M. Watenpaugh Dr K. Wolten Dr G. U.S. Geological Survey Univ. of Washington, Seattle Vanderbilt Univ. Sandia Corporation I.C.L./ Univ. of Surrey M.I.T. Atlas Computer Lab., U.K. Univ. of Washington, Seattle Aerospace Corp. IBM /360 series CDC 6600 XDS SIGMA 7 CDC 6600 ICL 1900 series IBM /360 series Atlas CDC 6600 CDC 6600 Valuable technical assistance has been given by Miss Jean Willis and Miss Stefanie Nucci, both of the University of Maryland. 102

DISCUSSION Dewar: The three examples of the three difficulties you gave with FORTRAN should all have been covered by the ANSI standard. Is it in fact a case that if the ANSI standard properly adhered to by manufacturers that almost all of your problems would go away? Stewart: This is the problem that I aluded to, namely, when you write to a set of specifications for a code, and a programmer takes this up without any real care for what actually happens, he writes exact- ly to the specification so that the contract is fulfilled. He de- livers the contract on time and now you've got this "code." I have seem this many times around the Washington area and in the univer- sity where they send out a contract for a program. The programmer meets the letter of ANSI specifications in every instance. But because there is nothing in the ANSI standards about the word struc- ture, or the idea of what the meaning of an i or a j is, trouble develops. They have defined it to be a 24-bit binary "thing" for that particular machine and so therefore they meet the spirit of the ANSI specification. When challenged the manufacturer stands on the fact that he has completely met the ANSI specifications. We just happen to be so stupid that our Pidgin FORTRAN wasn't pidgin enough to recognize that anyone would ever make a non-equivalence between "words", in a machine. Ibers: Some of us, particularly the younger members of the crystallo- graphic community, forget the disproportionate contribution that AEC-sponsored people have made to our various program libraries. We have the various Oak Ridge programs of Busing, Levy, Johnson and others; we have the Fourier program of Zalkin; we have a variety of programs of Hamilton. These tend to be parts of many program libraries, albeit in highly modified form. The AEC has already made invaluable contributions to crystallography. Thomas: This may be a little facetious but I think it should be said that one of the problems is that some manufacturers are crooks. Stewart: Yes I did. I don't care to repeat it. Lykos; Regarding computer program packages, their standardization, cer- tification, and dissemination, three NSF-supported activities are in progress that may not be generally known and may be of interest here. First, the Quantum Chemistry Program Exchange which flourished at Indiana University with support from AFOSR for many years serving chemists on an international scale. Its range has expanded so that a more descriptive name might be the Computational Chemistry Program Exchange. Now with NSF support, it is in transition toward becoming self-supporting and has been approached to extend its services to the 103

world of crystallography. Second, a software certification thrust led by Professor L. Fosdick, Chairman, Department of Computer Science, University of Colorado, Boulder. He organized and conducted a small conference at Boulder in order to obtain better coordination of several NSF- supported projects in software development and certification. The need for coordinated efforts along those lines became even more evident as a consequence of that Spring 1972 Boulder Conference, and a structured approach is evolving. He could be approached re- garding criteria and procedures for software certification. Third, Professor Frank Harris who is well established as a quantum chemist, an applied mathematician, and a sophisticated user of large-scale scientific computers, is developing a set of computer programs for users of quantum chemical techniques. He is capital- izing on the fact that the University of Utah has a node in bril- liantly conceived ARPA Network by testing the machine independence, reliability, and accuracy of the programs on physically and logically different computer systems via the ARPA Network. Additionally he will recruit a small number of users to access the University of Utah UNIVAC ll08 via terminal in order to test the viability of remote terminal - local "hot phone" augmented access as well. Thus there are specific models and techniques available for computer-software resource sharing should the crystallographers wish to explore them. 104

Interactive Graphics and Remote Computing Edgar F. Meyer At a time when computer technology has been advanced to the point where selected computers can communicate at a rate of a million bits per second (the ARPA Network), we who are concerned with crystallographic computing could well consider the conditions under which remote computing would be advantageous or necessary. The case of a laboratory without access to a local, large computer is an evident candidate for remote computing. An equally good case can be made for operations such as those of Dr. Ibers at Northwestern Univer- sity, where he needs the large memory available in one of the latest generation computers to refine 400-500 variables simultaneously. Per- haps Professor Jensen at the University of Washington would care to com- ment on the advantages of having a computer currently 10 to 30 times faster than his own to reduce the 10 hours per cycle needed for one refinement operation on his protein structure. The above three cases may generate some discussion but I would like to turn attention especially to the broad, intermediate area of routine crystallographic computing in a laboratory with access to several local computers, from the 4000-word minicomputer running the diffractometer to the local computing center. Campus politics aside, what arguments can be raised for remote computing on a latest generation computer with a special support facility for crystallographic computing? And what pit- falls can be forseen? First, let me clarify the usage of "remote" by indicating that some type of terminal to the distant computer is implied, rather than mailing off your deck to a good friend (with a large computing budget) at X Uni- versity. The quality and cost of service of the telephone system in this country is a topic of much discussion in the popular computer maga- zines these days, with Ma Bell on the defense. You can rent an ASR 33 Teletype with an acoustic coupler for $65/month (maintenance included). This produces printed text at a rate of 10 characters per second, which is suitable for printing R factors, slow for coordinates, and unthink- able for Structure Factor tables. Telephone rates at night are about 20-25C a minute. Several types of terminals are available for roughly similar prices that will operate at 30 characters per second. These include both hard copy and modified television master displays for alphanumeric text. I propose that a case can be made for a considerably more effi- cient crystallographic computing system from dial-up terminals than is currently provided by the average campus computing center. People who buy time-sharing services may buy better service, but my first supporting 105

argument is that instead of all the effort required for each group to develop and maintain its own computer library, this library could be maintained on a regional or national basis, provided I: Specialized software support. The trend to larger computers with multi-user capabilities may make it hard for the computing center to replace their 360 or 6600 with a 7600, ASC or 370, both for reasons of funding and limited demand. Yet these larger computers can handle Professor Ibers1 500 variable matrix and reduce Professor Jensen's time per refinement cycle by a factor of 30 (10 hours to 20 minutes), provided II: Greatly increased capacity. One of the useful results to come out of a conversational time- sharing system like Project MAC has been the ability of users to borrow routines and, in general, to interact with each other through the compu- ter. Thus, III: the "Critical Mass" required for smaller laboratories to become viable could be reduced through increased interaction of rou- tines and crystallographers. Finally, a subject of some interest to me involves IV: Storage, retrieval, and three-dimensional display of structural information. Mrs. Kennard in Cambridge is continually adding to a library of over 4000 structures taken from the scientific literature. The "Protein Data Bank" at Brookhaven National Laboratory is gathering coordinates of macromolecules at various levels of resolution as they are submit- ted. The first set of protein structure factors has been submitted. A method of referencing protein Fourier maps needs to be devised. I feel strongly that a low cost, three-dimensional graphics display with interactive capability would be an ideal component to a remote terminal, especially. Now I shall raise some counter-questions: I. How can one adequately gesticulate over the phone when the software has been changed and one's program no longer works? II. a. Will disc storage be available for each user? b. How will large listings be handled? c. How can listings be obtained rapidly? d. What will the response be at peak times? e. Is an interactive, conversational system practical? f. What is a workable upper limit to the number of crystallographers and groups using a given system? g. What reasonable usage limits can be assigned to groups: (1) solving and refining structures, and (2) developing new techniques? III. a. Will sufficient safeguards be provided to protect privileged files? b. Since card decks will not be the usual form of data and programs, will a long-term file retrieval system be available? c. Currently, many groups doing crystallography on a low-keyed level have been able to find support locally. How accessible will support 106

be for the "gentleman" crystallographer? d. What provision will be made for marginal cases; that is, who qualifies as a crystallographic user? e. Having recently experienced a funding upheaval, wouldn't it be safer to hammer along locally than to face a potential discontin- uity in the support curve? IV. a. Where do I sign up for my own display terminal? b. Who will fix it for me when it goes down? c. What terminal configuration has the optimum capability/cost ratio? d. How flexible can the configuration be for the requirements of individual laboratories? e. How will the transfer of huge (megaword) files be handled? The purpose of these remarks is to point out some of the technical possibilities available today. Beyond this, some of the pitfalls mentioned can serve as a starter for the creative pessimist. I conclude with a comment on a criticism I have heard in Europe that American crystallographers are too many and too disperse: better a few good groups than one everywhere. I suggest firstly that even synthetic organic chemists are doing crystallographic analyses; its use is not elitist, but its advancement may be. Secondly, of the mil- lion-plus known organic compounds, the 4000-plus in Mrs. Kennard's lib- rary leave some significant work to be done. And finally, with a link to an available computing service (plus provision for diffractometer data), practically every chemistry department could reference and contribute to the growing library of structural data. Then national meetings could be reduced to the afternoon outing and banquet; we could all keep in "touch" over our terminals. DISCUSSION Xuong (referring to a stereoscopic display of structures demonstrated by Prof. Meyer on a cathode-ray terminal): What is the display in color for and how much would it cost to buy a duplicate? Meyer: The display is colored for the reason that it gives you the 107

three-dimensional effect in two colors and you get the stereo separation by viewing through a colored screen for each eye. You could also use the color for getting color tonality in the molecule. The second question of cost might better be answered by Dr. Sparks. Sparks: $37 500. Anonymous: One could use the Tektronix or some other storage scope. There are a lot of new displays coming on the market. Meyer: That point is well taken. The technology is moving along quite rapidly. The devices we're using now might well be out of date in a few years, but the point is that in my laboratory I have to do with what I have right now, and what I have shown you is a usable device. There is, for example, under consideration quite a reduction in cost and we hope to take advantage of this. The price will ultimately drop. One does not have to use the disc to drive the display, for example. Among questions that of course have to be held uppermost are the quality, the utility, and even the eye strain. You get fatigue if you sit there in front of the tube all day, but the fact is that the system works. 108

Some Thoughts on the Role of Hierarchical Computing and National Networks in Protein Crystallography Steven T. Freer and Nguyen Huu Xuong Protein crystallographers are caught in the bind between reduced or stagnant computing budgets and ever-increasing computational needs. Indeed, many of us find our research slowed and restricted by lack of computing funds. This situation is somewhat paradoxical because many computers throughout the nation are now utilized at only a fraction of their capacity. It is apparent that our computational needs cannot be fully satisfied without a new approach that will enable us to util- ize more effectively existing computational resources. We believe that hierarchical computing is such an approach. A hierarchical computing system is a network of special-purpose computers linked together so that several ascending levels of inter- connected hardware and software can be effectively shared among many users. At the lowest level of a typical hierarchy are several minimal minicomputers each of which is dedicated to controlling a specific ex- periment. The minicomputers are linked to a traffic control computer that provides sophisticated input/output and large bulk storage. A large amount of money is saved by sharing I/O and bulk storage equip- ment, which can be extremely expensive and is usually much under-used- At the next level of the hierarchy, the traffic control computer is linked to a more powerful computer that can satisfy the computational requirements of the users. Ultimately, the traffic controller would be connected to a national network that could provide instant access to any desired computer anywhere in the United States, thereby allowing the user to select the computer best suited for each facet of his re- search project. At UCSD, hierarchical computing is playing an increas- ingly vital role in protein-structure determination. The reason for this is that all facets of protein crystallography, from data collection through display of the solved structure, require extensive use of differ- ent types of computers: dedicated minicomputers for control of data- acquisition systems, large and powerful number crunchers for the cal- culations associated with structure determination and refinement, and special computers for dynamic display and manipulation of molecular models. The protein crystallographic computing system that we are trying to develop is shown in Figure 1. The lowest level of the hierarchy will consist of three data-collecting devices: an automatic diffractometer, precession cameras in conjunction with an automatic film scanner, and a multireflection diffractometer (Xuong and Vernon, 1972), plus an inter- active model-building and coordinate-measuring device. Each of these instruments will be controlled by a minimal minicomputer interfaced to an IBM 1800 computer, using standard CAMAC modules (EURATOM, 1969). Peripheral equipment associated with the IBM 1800 traffic controller in- cludes high-speed disc drives with thirty megabytes of storage, a line 109

110

printer, a card reader/punch, a magnetic-tape drive, 2 keyboard/type- writers, a storage tube with interactive device,and a CALCOMP plotter. A Meta-4 computer, also linked to the IBM 1800, will be used as a local data-reduction processor for jobs that contain too much data or require too fast response time to be economically transmitted to a remote com- puter. At a higher level of the hierarchy, the IBM 1800 will be linked to a regional or national computing network. The capital investment, as well as operational and system program- ming expenses, for all equipment except that dedicated to protein crystal- lography will be shared among seven research groups within the department of chemistry; this makes our local operation very cost-effective. A block diagram of the entire proposed system is shown in Figure 2. A sig- nificant portion of this system is already in existence. To illustrate the practicality and advantages of hierarchical com- puting, we shall describe the portion of our system dedicated to col- lecting protein intensity data with an automatic diffractometer. The diffractometer is controlled by a dedicated PDP-8 computer which, through a fast communication link that can transmit data at a maximum rate of 100 000 words per second, is connected to the IBM 1800 situated 1000 feet away in another building. All of the PDP-8 programs are stored on the 23ll disc drives associated with the IBM 1800. The PDP-8 monitor can request the loading of various programs as they are needed during data collection, and these requests are usually satisfied within a frac- tion of a second since the IBM 1800 is operating under the MPX time- share system. Raw intensity data are screened by the PDP-8 in order to detect any slippage of the crystal and, after some preliminary data re- duction (done in parallel with data collection of the next sequential reflection), the intensity measurements are passed in blocks to the IBM 1800 to be stored on the 2311 disc. This simple example of hierarchical computing allows a small PDP-8 with but. 4000 words of core and no magnetic- tape or disc drive to operate the diffractometer as it it were a consider- ably more sophisticated computer with a large memory and disc storage. Once a crystal is mounted and aligned, the system is capable of measuring data for days at a time without operator intervention and the output of intensity data is handled easily by the IBM 1800 peripheral I/O equipment. We were happy to find that the IBM 1800 time required is very small: less than 15 minutes per day. Our successful construction and use of the PDP-8 to IBM 1800 hierarchy has convinced us of the power, convenience and economy of hierarchical systems. In the proposed scheme, our local hierarchical system will provide reliable rapid data collection and preliminary data processing, while access to a national computer network would be the best way to get the necessary computing power for protein structure determination and refine- ment. There is little doubt that within this decade a viable national computing network will revolutionize computing activity in the U.S. At the present time, the ARPA Network is the most highly developed network. lll

M cfl § 4-1 a fN CO 60 <0 CM 0) 112

The NET itself consists of small message store-and-forward computers called IMF's (Interface Message Processors), and wide-band AT&T leased telephone lines. HOST computers are connected to the NET through their local IMP. Each IMP is connected to at least two other IMF's, which insures the existence of alternate transmission routes between any two HOST computers. Communication between two HOST computers is handled automatically by these message processors which also select the optimum transmission route, depending on existing traffic. The ARPA Network now contains about 24 sites with 37 HOST computers. These computers include representatives from all the major U.S. compan- ies and range in size from a small PDP-ll to the huge CDC 7600 with the addition, in the near future, of the ILLIAC IV. In short, the ARPA Network links a group of heterogeneous computers distributed nationwide, in such a way that every local resource is available to any computer in the network. A geographical representation of the network is shown in Figure 3, which is taken from an article by Roberts (1971). The reader is referred to both this article and an article by LeGates (1971) for a comprehensive description of the philosophy and operational details of the ARPA Network. The five characteristics of the ARPA Network that are meaningful to users are: (1) easy access to a wide variety of com- puters, (there are now more than 15 different types of computers on the net); (2) negligible communication error rates (less than the error rates within a local computer); (3) rapid end-to-end response time (within 0.1 sec- ond); (4) fast data-transmission rate (about 80 000 bits per second); and (5) low cost of data transmission (less than $1 per megabit). A national computing network will help the protein crystallographer in three ways: (1) it will enable him to handle new and exciting research problems by providing him access to the most advanced hardware, software, and data bases available, (2) it will decrease his computing costs by providing the optimum computer for each job and also by giving him access to computers subsidized by agencies sponsoring his research, and (3) it will make his life easier by eliminating the traumatic upheavals that occur with the periodic change of computers at his institution. As a mat- ter of fact, since a network would bring about competition between com- puter centers, service in general should be upgraded. In addition, the redundancy of hardware within the network should considerably reduce re- search delays caused by extended computer down time. In conclusion, we would emphasize that hierarchical computing, both at the local level, through sharing resources among many research groups, and at the national level, through connection to a continental computer network, is a practical way for protein crystallographers to satisfy their ever-increasing computational needs while at the same time main- taining a realistic computing budget. The necessary technology is already developed; what remains is the psychological and political acceptance of such interdependent resource sharing by the funding agencies, the re- search institutions, and by the scientists themselves. 113

114

We gratefully acknowledge the help of John Cornelius, Richard Alden, Kent Wilson, Wayne Vernon, and Joe Kraut in the formulation of the ideas presented here. References EURATOM, "CAMAC", a Modular Instrumentation System for Data Handling", European Atomic Energy Community Report No. EUR 4100e, (1969). LeGates, j.C. "The ARPA Network Technical Aspects in Nontechnical Language", EDUCOM Interuniversity Communications Council (1971). Roberts, L.G. "A Forward Look", Conference on Computers in Chemical Educa- tion and Research, DeKalb, Illinois, p. 7-5, (1971). "A Rapid Data Acquisition System for Protein Crystallography", Xuong, Ng. H. and W. Vernon, Abstract J10, ACA Albuquerque Winter Meeting, (1972). DISCUSSION Berman: What does one have to do to get on the ARPA Network? Freer: It depends on whether the ARPA Network will let you on. You first have to get the ARPA Network to sell you an IMP and they sell for fifty thousand dollars. Then you have to arrange for the cost of the computers with the various host computers as well. I don't think your question is a realistic one quite yet. The question is, "what has to be done now to make the net truly operational on a national level?" Anonymous: In your local computing hierarchy wouldn't it have been just as convenient and a lot cheaper to connect the PDP-8 to a magnetic- tape drive rather than go to the expense of interfacing it to the IBM 1800? ll5

Freer: No. We gave a lot of thought to this problem. We debated for a long time whether to buy a magnetic-tape drive or a disc drive, or a reader/punch for the PDP-8. We were also considering the pos- sibility of buying more core storage. It turned out to cost about $2000 for the PDP-8-to-IBM 1800 interface whereas a magnetic-tape drive and interface would have cost us about $8000, So, for about a quarter of the price of a magnetic-tape drive alone, we gained the use of all these I/O devices when we hooked into the hierarchy. Of course we also gained an effective increase in core size and computing power as well. Xuong: I hear rumors that the ARPA Network is looking for another agency to take it over. Is NSF or NIH going to take it? Dewar; I've heard talks on the ARPA Network many times now and I've al- ways asked the question, what is actually going on in the net, and the answer as far as I can gather is so far nothing. Is that still the situation or is there an appreciable amount of real traffic? Lykos: I thought I had spoken to that point earlier. Computer network- ing technology has far outstripped the realization of its potential for resource sharing supportive to research. As far back as the 1970 ACM Meeting, Sidney Fernbach (in charge of the massive Lawrence Livermore Laboratory computer complex interfaced via an intra-site network) re- marked in response to a direct question about the ARPA Network that one does not create a network just to have a network. The expres- sion "The ARPA Network is a solution looking for a problem" has be- come a cliche amongst its detractors. As a matter of fact the bril- liantly conceived ARPA Network, which was designed to be an experi- ment and a demonstration in computer networking, constitutes a major challenge to researchers to discover how computer networking can en- hance the conduct of research. Three researchers present at this meeting have just embarked on a highly relevant project. Walter Hamilton at Brookhaven is gen- erating a protein-structure data bank, to be accessed from remote terminals which generate 3-D images. The terminals are being designed by Edgar Meyers at Texas A&M, with the cooperation of Helen Berman at Philadelphia who will play the devil's advocate through critical use of the evolving system from one of the prototype terminals. The ARPA Network was discussed briefly earlier today. Some cost/ performance figures and related considerations may be of interest. The ARPA Network is: 1. Telephone 'lines' leased from the telephone company 24 hours a day, 7 days a week, spanning the country with a bandwidth of 50 ll6

kilobits per second currently linking 29 nodes, each with at least two lines connected, with an annual telephone bill of $800 000. To place that in perspective, one voice-grade (2400 bits per second) line leased around the clock with cross-country dial-up access costs about $25 000 per year. Thus the cost of 32 single WATS (Wide Area Telephone Service) dial-up lines (from one point to anywhere in the continental USA, or, from any point to a given point) is about the same as the telephone bill for the much larger data-carrying capacity of the ARPA Network. 2. Interface Message Processors or IMPs which are small com- puters constituting the network nodes. Depending on how many host computers and/or local terminals need to be connected, IMF's cost $53 000 to $117 000. It is essential to the integrity of the ARPA Network that the iKP, hardware and software, be modifiable by the Network Manager only. ( IMPs can be rented for about $1500 per month). 3. The on-line Network Information Center which compiles a listing of resources and facilities available on the network and maintains a journal facility enabling transfer of messages among various user terminals. Currently it is based at the Stanford Research Institute. 4. Network management and maintenance currently being hand- led by Bolt, Beranek and Newman. 5. Any computer intended to serve as a host interfaced to an IMP needs interface hardware, about $10 000 purchase, and software modification to conform to host protocol. 6. Users of the net establish accounts with the host computers of interest. The net facilitates remote access to a variety of hosts. The Office of Science and Technology has taken the attitude that the ARPA Network is no longer an experiment but must be considered oper- ational and therefore no longer appropriate as an ARPA project. Al- though the ARPA contractors who use the ARPA Network disagree with that position, bid specifications are being prepared such that some outside agency can take over and operate it. It would seem that a consortium of universities similar to consortia operating National Laboratories would be an appropriate agency but, so far, none has come forward. On a pay-as-you-go basis (rental of IMP, prorata share of tele- phone bill), participation in the ARPA Network costs about $30 000 (1) per year exclusive of host computer use costs. (1) "Networks for Higher Education", EDUCOM, 1972, pp. 7-12 and pp. 63-64 117

At the moment a three-fold load increase could be accommodated by the ARPA Network without a noticeable degradation in service. Calvert: What's the capital tied up in this network right now? Lykos: I have heard as an estimate that $10 000 000 has been spent for design, development, and implementation to create the ex- periment called "ARPANET". Capital in the IMPs is about $3 000 000. 118

Next: Summing Up: Robert B.K. Dewar, Allen C. Larson, R.A. Young »
Computational Needs and Resources in Crystallography: Proceedings of a Symposium, Albuquerque, New Mexico, April 8, 1972. Get This Book
×
 Computational Needs and Resources in Crystallography: Proceedings of a Symposium, Albuquerque, New Mexico, April 8, 1972.
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

READ FREE ONLINE

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!