Read "Computer Assisted Modeling: Contributions of Computational Approaches to Elucidating Macromolecular Structure and Function" at NAP.edu

« Previous: 8 Structure and Function of Complex Carbohydrates

Page 144 Cite

Suggested Citation:"9 Hardware." National Research Council. 1987. Computer Assisted Modeling: Contributions of Computational Approaches to Elucidating Macromolecular Structure and Function. Washington, DC: The National Academies Press. doi: 10.17226/1136.

Page 145 Cite

Page 146 Cite

Page 147 Cite

Page 148 Cite

Page 149 Cite

Page 150 Cite

Page 151 Cite

Page 152 Cite

Page 153 Cite

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

9 Hardware Four functions are essential to computer modeling of molecules: molecular energy computation configurational control graphics ~ reasoning Until recently, the standard hardware configuration of a VAX and an Evans and Sutherland display terminal could only achieve the second and third items. Molecular energy calculation on a VAX ~ very slow, although these computers were used to clevelop the programs. The advent of Cray-type supercomputers connected by national communications networks has given scientists access to more computer power for molecular energy calculations. More recently, the development of special purpose array processors made it possible to have in the laboratory computational power roughly comparable to the supercomputers. Reasoning about molecular structure until recently could be done only with special purpose machines which run the programming language LISP. As the power of computers available to individual scientists increases we expect that these four functions will be brought to- gether. The early VAX computers (for example the 11/780) typi- cally provide 0.5 megaflop (million floating point Instructions per 144

145 second) and I.0 MIPS (million instructions per second). Typical array processors provide 100 megaflop while typical LISP ma- chines provide 2.0 MIPS. In the last years it was necessary to have one each of these types of machines ~ order to have reasonable amounts of computational power for the four molecular modeling functions. The next generation of computer described as a per- sonal supercomputer (PSC) will have between 40 and 60 megaflops of number crunching power and between 15 to 20 MIPS of general (i.e. logical) computational power. With this level of numeric and logical computational power available in the next year at a scien- tific workstation there will be little need for separate machines to perform special functions. The nationalsupercomputers, however, already in place and operational, constitutes a very real scientific resource. As sci- ent~sts learn that the sup ercomputers can effectively carry out molecular energy calculations, these machines will be used to their fullest capacity. However, the technology of the supercomputers is advancing rapidly, and the manufacturers promise that systems with three orders of magnitude more computational power will be available in the next few years. While the supercomputers grow more powerful, the power of workstations and the PSCs is also increasing. Current worksta- tions have the power of VAXs, but lack the capacity to run all four functions simultaneously. As the PSCs emerge, they will offer a combination of capabilities that will make it possible to run all four functions at once. The PSCs should create the possibility of a new computational and graphic plateau: 1988 - 1995: personal supercomputer 1977 - 1987: E ~ S display coupled to a MicroVAX IT 1970 - 1976: Tektronix display coupled to a DEC system-10. The Tektronix display and a scientific mainframe gave us the first plateau seventeen years ago. On this plateau it was possible for many scientists to view and manipulate molecules. The VAX computers, and more recently the even less expensive MicroVAX IT computers coupled to an Evans and Sutherland Replay, have established over the last ten years a plateau of graphic capability which has enabled scientists to go over from the physical modeling of macromolecules to completely electronic modeling. The PSCs expected to emerge in the next years will permit scientists to compute and to visualize molecules in much more powerful ways.

146 Using the PSCs, it should be possible to shape molecular mod- els easily using joystick controls, creating stereo color graphics in multiple modes of representation, while doing energy calculations and molecular reasoning. The only foreseeable problem with the supercomputers is that scientists' appetites for energy calculations may exceed the computational capacities of the PSCs. Configu- rational control should make it possible to sketch protein models. Using collections of rules, we should be able to use molecular rea- soning to generate and evaluate large numbers of possible mode! states. Because of the rapidly changing technology of computers, dim plays, workstations, and PSCs, national effort should be directed to guaranteeing that these devices conform to the various levels of standards of the International Standards Organization (ISO). Standardization in the United States is achieved by interested parties working together in committees under the auspices of agen- cies and organizations such as the National Bureau of Standards, American Society for Testing and Materials (ASTM), Institute of Electrical and Electronics Engineers (IEEE) or ISO. Considerable standardization at the level of the computer operating system must be done to make the ISO mode} work. Hardware vendors must choose between product uniqueness for Ales and market develop- ment, and intervendor product compatibility. Compatibility has many benefits. Adherence to the standards will make it possible to move programs quickly and easily from one device to another, as well as making it possible to construct a complete system from components supplied by many vendors. The ISO motley has several levels, represented below: 1. Ethernet 2. TCP/IP communications protocol 3. NFS- Network File System 4. UNIX operating system 5. VAX/VMS and Cray FORTRAN compatibility 6. X-windows 7. DIAI.OG-like application program window and functional- ity specification The Ethernet originated at the XEROX Palo Alto Research Center. The TCP/IP protocol was developed for the DARPAnet, operated for the Department of Defense, and 80 is in the pub kc domain. The NFS was developed by SUN Microsystems and

147 placed in the public domain. Bell Laboratories developed UNIX. VAX/VMS FORTRAN was originated by Digital Equipment Cor- poration (DEC). X-windows originated at the Massachusetts In- stitute of Technology where they were developed to specify a machine-independent windowing system. DLALOG is an Apollo product that is a first attempt to answer the question of how to write high level mouse-driven applications programs in a high level specification language. Standards are really the key to future progress in molecular modeling. If all investigators adhere to the ISO standards, then it will be possible to mix various workstations and special pur- pose computers on a laboratory network. Adherence to standards should lower the price of equipment to end users by enlarging the market. Similarly, with adherence to the standards, it will be pos- sible to send and receive molecular structure data sets all over the world using global communications networks such as BITNET, CSnet, DARPAnet, Japan Universities net (JUnet), and Com- monwealth Scientific and Industrial Research Organization net in Australia (CSTROnet). Special purpose computers offer many possibilities for molec- ular modeling. Over the years, the National Institutes of Health (NTH) has funded facilities that developed molecular graphics, computation, and control devices. The control systems laboratory at Washington University Medical School developed the MMSX molecular display. The molecular graphics laboratory at the Uni- versity of North Carolina at Chapel Hill has been instrumental in exploring the development of a variety of stereo, configurational control, and display devices. The molecular graphics laboratory at Columbia University is in the process of developing FASTRUN, a special purpose computer attached to a ST-100 array proces- sor that boosts its molecular dynamics power by a factor of 10. The molecular graphics laboratory at the University of California at San Francisco Medical School has developed stereo and color representation techniques. Special and general purpose graphics devices are increasingly easy to produce. General Electric in Research Triangle, North Car- olina has produced a very fast surface graphics processor that can be used to display different types of objects, including molecules. At least one of the PSCs will have a sphere graphics primitive embedded in a silicon chip. Every effort should be made to en- courage the development of special purpose processors. However,

148 these processors should be required to adhere to the emerging computer standards, so that they can be easily integrated into existing laboratory networks. The last few years have seen the emergence of array processors for laboratory use. The ST-IOO array processor from Star Tech- nologies, Inc. has been programmed by microcoding to produce molecular dynamics calculations at a rate comparable to a Cray XMP. The ST-100 is rated at peak 100 megaflops, while the sus- tained calculation rate ~ about 30 megaflops. The ST-100 costs about one-thirtieth of the Cray XMP-48. The FASTRUN device currently under development in the laboratory of Cyrus Levinthal at Columbia University will increase the power of the ST-100 by a factor of 10 from 30 average megaflops to 300 average megaflops. Floating Point Systems Inc. is discussing the delivery of a 10 pro- cessor FPS-264 system with a peak of ~ gigaflops. Multiple process machines could be added to this list, including the hypercube ma- chines from Intel and NCUBE. All are laboratory machines. The power of supercomputers will obviously be increasing at the same approx~nate rates. A very strong relationship exists between the architecture of a special purpose computer and the structure of the scientific prom lem to be solved. The question is, how much computational power does molecular modeling really need? The protein folding problem seems to be the gauge of this question, since molecular dynam- ics programs calculate atom position charge in 1o-~5 "coed time steps. If proteins really take minutes to fold, then computation will have to go from 10-~5 to 102 seconds. The most powerful array processors available today make it possible to calculate and examine molecular trajectories three orders of magnitude longer than hitherto possible. Extending these trajectories an acIditional three orders of magnitude Knight bring us to the range where ap- propriate protein-folding actions can take place. There is some indication that if amino acids were synthesized at the rate of one per microsecond, then folding would be possible. Then, computing would only have to range from 10-~5 to 10-5 seconds. This would be seven orders of magnitude less computing. If this estimate Is close to correct and computing power increases at a rate of 50 percent per year, then current computer processor development will give us the necessary amount of power in 5 to 10 years.

149 CENTRAL VERSUS DISTRIBUTED CO}D?UT~G The National Science Foundation (NSF) supercomputer ini- tiative again brings to the forefront the relationship between cen- tral computational services and distributed or personal services. Proponents of centralization argue that certain types of very large calculations are available only on centralized machines. The per- sonal computer revolution showed how profoundly scientists re- spond to decentralized computation. The capabilities of personal machines increase at the same pace as the supercomputers, but the baseline machines are a market of 105 to 106 machines, whereas the supercomputers are a market of 102 to 103. Special purpose boards added to the baseline machine can raise its capabilities for specific functions (i.e., energy calculation, sequence comparison, or graphics) to levels approaching those of supercomputers. The distribution of personal computation is driven totally by market forces and Is not subject to centralized planning. Scien- tists buy laboratory computers with funds previously allocated for glassware. Postdoctoral students returning to their country of origin bring their personal computers. Floppy disks containing data files and even whole books form a new type of currency in countries operating centrally planned economies. These modes of behavior form a valuable dichotomy. We need a balance between centralizing and decentralizing efforts. Individual scientists can participate in the planning and use of national supercomputers, while simultaneously helping to specify and buy smaller machines for their personal and laboratory use. COMPUTER UTILIZATION IN THE NEXT 5 TO 10 YEARS In the next 10 years, workstations will become ordinary scien- tific tools, like pocket calculators and balances. The workstations will become more popular with scientists as they acquire larger, faster, and more complex working programs; better graphics; more storage and access to other computers; and new data sources. A few years ago, only specialists searched DNA sequence data bases; now, because many workers have PCs in their laboratories, almost all molecular biologists search these data bases. Workstation use is likely to follow the same pattern. Now, molecular graphics techniques are used only by departmental or laboratory specialists. In years to come, as all workstations begin

150 to acquire adequate graphics capabilities, all scientists will rou- tinely do molecular graphics, modeling, and energy calculations. One of the strongest effects In the computer marketplace the trade-oE between constant dollar and constant performance. Because computer power is doubling every two to three years, the manufacturers tend to supply their customers with new models that cost the same but have increasing computational power. A customer, then, can expect to purchase a given level of computa- tional power for a decreasing amount of money. Twenty years ago, one needed a DEC PDP-10 to search pro- tein or DNA data bases, while 10 years ago one used DEC PDP-lls or DEC VAXs. Now, one can use an IBM PC or one of its many clones to do the same job. In several years, one should be able to do DNA sequence searches on a pocket machine. The brevity of the computer design and manufacture cycles has begun to overtake our ability to use these machines adequately. Twenty years ago, both manufacturers and consumers could rea- sonably expect a computer to sell and be worth buying for about 10 years; today, a given level of computational power has a life cycle of 3 years. The cycle length appears to be shortening even further in the sense that special purpose boards can be added to a small general purpose machine to make it functionally equivalent to a machine that costs up to 100 times as much. Why buy a Cray when a PC with a special purpose board will do the same thing? The cure for this problem will probably be a balance of market forces favoring the small mass distribution computers. PCs will rise In power to be general purpose workstations. THE NATIONAL SUPERCOMPUTER NETWORK The national supercomputer initiative sponsored by NSF allo- cates available computer time by a peer-review process. Individual scientist's requests for time must meet granting requirements of quality of the proposed work and size of allocation. From the sci- ent~t's viewpoint, the supercomputer network must perform tasks that cannot be done either in the laboratory or at local institu- tions. Since the network communication rates are 9,600 BAUD, only a limited amount of data can be passed between the scientist and the supercomputer. Essentially, this means that only batch computing can be run on the supercomputers. Large jobs run in the batch mode of computing are only one form of computing.

151 The highly interactive forms of computing and graphics available on workstations will be even more competitive with the super- computer network when the next generation of high performance workstation, the PSC, becomes available. The use of national supercomputere can be left to the dis- cretion of individual scientists as it ~ in this country or the use of these resources can be mandated. The ability to mandate use depends on the type of the economy or pattern of interaction be- tween scientists and the government. The Australian scientists are also in the midst of this type of central planning (personal communication, 1987, trip to Australia). The government wants scientists throughout Australia to use the centralized supercom- puter by paying for the use with funds from the scientists' grants; the scientists see this as a form of taxation. The market forces in Australia will probably dominate when the scientists realize that superior computing and graphics performance can be obtained by purchasing a machine. Once a machine is in a department or lam oratory, the problem of centralized national supercomputer access and allocation is essentially ended. LOCAL AREA NETWORKS Molecular modeling in the future will probably be done on local networks of computers and displays. For the past 5 to 10 years, advanced scientific laboratories have had one or more mini- computers. Five years ago, laboratory officials, for the most part, took the first hesitant steps to link these computers in a network. In the last two or three years, networking of laboratory computers has become much more common. Laboratory networks contain computers acting as hosts for terminal and computational servers for other workstations. The workstations range in power from the smallest PC to powerful PSCs. As computers age and are replaced because they no longer work or are too expensive to maintain, they will be replaced by networks of a variety of computers and displays. DATA BASE USE Access to molecular structure and sequence data bases through global communications networks is an opportunity that will be available in the near future. Currently, most data bases are ups dated by magnetic tape every three to six months, including the

152 DNA sequence data bases at the Los Alamos National Laboratory and at European Molecular Biology Laboratory (EMBL) In Hei- delberg, the protein sequence data base at the National Biomedical Research Foundation (NBRF) in Washington, D.C., the protein structure data base at the Brookhaven National Laboratory, and the small organic molecule crystal structure data base at Cam- bridge University. Generating tapes for institutional and random scientific users is becoming an increasing burden for the data base operators. The global scientific networks are organized in such a way that it ~ possible for the data base operators to send out one copy of the update and have that copy spread throughout the entire scientific community. For those scientific users who need a particular molecular structure data set for display or further modeling, the global sci- entific networks are ideal sources of information. Only recently, the Brookhaven protein structure file was tested at the National Research Council in Ottawa. A simple mad! request to a BITNET server at the National Research Council produced one or more of the protein structure data sets in a few minutes. The small molecule organic crystal structure file from Cam- bridge University in England ~ being used by scientists for molecu- lar modeling and calculation. The Cambridge crystal file provides an ideal data source for ligand conformations. The data file and a search program have been available on the international commer- cial computer network for the past 15 years. Technology moves so fast that even while this report is being prepared the panorama with respect to data bases distribution has changed. For several years 54 inch laser disks have been on the market for audio. Now this highly developed consumer technology has been applied to the storage and retrieval of molecular structure data. Each laser disk, which costs about $2,000 to master and $10 to reproduce, can hold a complete update for the DNA sequence, protein se quence, protein structure and small molecule data files. The laser disk and associated software will be produced by a small starting company associated with the University of Wisconsin (Fred Blat- tner, DNAstar, Inc. at the University of Wisconsin, 1987, personal cornmun~cat~on). COMPETITIVENESS America has a world recognized ability to transfer ideas from

153 their development in an academic setting to practice by the for- mation of a small commercial enterprise. Then by the infusion of capital in several stages these small companies can be trans- formed into stable industrial corporations. These corporations are then able to consume the supply of trained scientific personnel produced by the universities. The position of the United States in the world economy is changing very dramatically at present, and certainly will continue to change in the next 5 to 10 years. Our overall competitiveness will be determined by our ability to form Inks between previously separate activities. It is already clear that biotechnology as an offshoot of our national expertise in molecular biology wall be increasingly determined by the way we use com- puters in computational chemistry, macromolecular modeling, and the design of proteins. We are in the midst of two revolutionary tendencies: genetics and silicon. Computational chemistry is the glue that will bring these tendencies together in a stable form.

Next: 10 Conclusions and Recommendations »

Computer Assisted Modeling: Contributions of Computational Approaches to Elucidating Macromolecular Structure and Function (1987)

Chapter: 9 Hardware

Welcome to OpenBook!

Get Email Updates