Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 144
9
Hardware
Four functions are essential to computer modeling of molecules:
molecular energy computation
configurational control
graphics
~ reasoning
Until recently, the standard hardware configuration of a VAX
and an Evans and Sutherland display terminal could only achieve
the second and third items. Molecular energy calculation on a
VAX ~ very slow, although these computers were used to clevelop
the programs. The advent of Cray-type supercomputers connected
by national communications networks has given scientists access
to more computer power for molecular energy calculations. More
recently, the development of special purpose array processors made
it possible to have in the laboratory computational power roughly
comparable to the supercomputers. Reasoning about molecular
structure until recently could be done only with special purpose
machines which run the programming language LISP.
As the power of computers available to individual scientists
increases we expect that these four functions will be brought to-
gether. The early VAX computers (for example the 11/780) typi-
cally provide 0.5 megaflop (million floating point Instructions per
144
OCR for page 144
145
second) and I.0 MIPS (million instructions per second). Typical
array processors provide 100 megaflop while typical LISP ma-
chines provide 2.0 MIPS. In the last years it was necessary to have
one each of these types of machines ~ order to have reasonable
amounts of computational power for the four molecular modeling
functions. The next generation of computer described as a per-
sonal supercomputer (PSC) will have between 40 and 60 megaflops
of number crunching power and between 15 to 20 MIPS of general
(i.e. logical) computational power. With this level of numeric and
logical computational power available in the next year at a scien-
tific workstation there will be little need for separate machines to
perform special functions.
The nationalsupercomputers, however, already in place and
operational, constitutes a very real scientific resource. As sci-
ent~sts learn that the sup ercomputers can effectively carry out
molecular energy calculations, these machines will be used to their
fullest capacity. However, the technology of the supercomputers
is advancing rapidly, and the manufacturers promise that systems
with three orders of magnitude more computational power will be
available in the next few years.
While the supercomputers grow more powerful, the power of
workstations and the PSCs is also increasing. Current worksta-
tions have the power of VAXs, but lack the capacity to run all
four functions simultaneously. As the PSCs emerge, they will offer
a combination of capabilities that will make it possible to run all
four functions at once. The PSCs should create the possibility of
a new computational and graphic plateau:
1988 - 1995: personal supercomputer
1977 - 1987: E ~ S display coupled to a MicroVAX IT
1970 - 1976: Tektronix display coupled to a DEC system-10.
The Tektronix display and a scientific mainframe gave us the
first plateau seventeen years ago. On this plateau it was possible
for many scientists to view and manipulate molecules. The VAX
computers, and more recently the even less expensive MicroVAX
IT computers coupled to an Evans and Sutherland Replay, have
established over the last ten years a plateau of graphic capability
which has enabled scientists to go over from the physical modeling
of macromolecules to completely electronic modeling. The PSCs
expected to emerge in the next years will permit scientists to
compute and to visualize molecules in much more powerful ways.
OCR for page 144
146
Using the PSCs, it should be possible to shape molecular mod-
els easily using joystick controls, creating stereo color graphics in
multiple modes of representation, while doing energy calculations
and molecular reasoning. The only foreseeable problem with the
supercomputers is that scientists' appetites for energy calculations
may exceed the computational capacities of the PSCs. Configu-
rational control should make it possible to sketch protein models.
Using collections of rules, we should be able to use molecular rea-
soning to generate and evaluate large numbers of possible mode!
states.
Because of the rapidly changing technology of computers, dim
plays, workstations, and PSCs, national effort should be directed
to guaranteeing that these devices conform to the various levels of
standards of the International Standards Organization (ISO).
Standardization in the United States is achieved by interested
parties working together in committees under the auspices of agen-
cies and organizations such as the National Bureau of Standards,
American Society for Testing and Materials (ASTM), Institute of
Electrical and Electronics Engineers (IEEE) or ISO. Considerable
standardization at the level of the computer operating system must
be done to make the ISO mode} work. Hardware vendors must
choose between product uniqueness for Ales and market develop-
ment, and intervendor product compatibility. Compatibility has
many benefits. Adherence to the standards will make it possible
to move programs quickly and easily from one device to another,
as well as making it possible to construct a complete system from
components supplied by many vendors. The ISO motley has several
levels, represented below:
1. Ethernet
2. TCP/IP communications protocol
3. NFS- Network File System
4. UNIX operating system
5. VAX/VMS and Cray FORTRAN compatibility
6. X-windows
7. DIAI.OG-like application program window and functional-
ity specification
The Ethernet originated at the XEROX Palo Alto Research
Center. The TCP/IP protocol was developed for the DARPAnet,
operated for the Department of Defense, and 80 is in the pub
kc domain. The NFS was developed by SUN Microsystems and
OCR for page 144
147
placed in the public domain. Bell Laboratories developed UNIX.
VAX/VMS FORTRAN was originated by Digital Equipment Cor-
poration (DEC). X-windows originated at the Massachusetts In-
stitute of Technology where they were developed to specify a
machine-independent windowing system. DLALOG is an Apollo
product that is a first attempt to answer the question of how to
write high level mouse-driven applications programs in a high level
specification language.
Standards are really the key to future progress in molecular
modeling. If all investigators adhere to the ISO standards, then
it will be possible to mix various workstations and special pur-
pose computers on a laboratory network. Adherence to standards
should lower the price of equipment to end users by enlarging the
market. Similarly, with adherence to the standards, it will be pos-
sible to send and receive molecular structure data sets all over the
world using global communications networks such as BITNET,
CSnet, DARPAnet, Japan Universities net (JUnet), and Com-
monwealth Scientific and Industrial Research Organization net in
Australia (CSTROnet).
Special purpose computers offer many possibilities for molec-
ular modeling. Over the years, the National Institutes of Health
(NTH) has funded facilities that developed molecular graphics,
computation, and control devices. The control systems laboratory
at Washington University Medical School developed the MMSX
molecular display. The molecular graphics laboratory at the Uni-
versity of North Carolina at Chapel Hill has been instrumental in
exploring the development of a variety of stereo, configurational
control, and display devices. The molecular graphics laboratory
at Columbia University is in the process of developing FASTRUN,
a special purpose computer attached to a ST-100 array proces-
sor that boosts its molecular dynamics power by a factor of 10.
The molecular graphics laboratory at the University of California
at San Francisco Medical School has developed stereo and color
representation techniques.
Special and general purpose graphics devices are increasingly
easy to produce. General Electric in Research Triangle, North Car-
olina has produced a very fast surface graphics processor that can
be used to display different types of objects, including molecules.
At least one of the PSCs will have a sphere graphics primitive
embedded in a silicon chip. Every effort should be made to en-
courage the development of special purpose processors. However,
OCR for page 144
148
these processors should be required to adhere to the emerging
computer standards, so that they can be easily integrated into
existing laboratory networks.
The last few years have seen the emergence of array processors
for laboratory use. The ST-IOO array processor from Star Tech-
nologies, Inc. has been programmed by microcoding to produce
molecular dynamics calculations at a rate comparable to a Cray
XMP. The ST-100 is rated at peak 100 megaflops, while the sus-
tained calculation rate ~ about 30 megaflops. The ST-100 costs
about one-thirtieth of the Cray XMP-48. The FASTRUN device
currently under development in the laboratory of Cyrus Levinthal
at Columbia University will increase the power of the ST-100 by a
factor of 10 from 30 average megaflops to 300 average megaflops.
Floating Point Systems Inc. is discussing the delivery of a 10 pro-
cessor FPS-264 system with a peak of ~ gigaflops. Multiple process
machines could be added to this list, including the hypercube ma-
chines from Intel and NCUBE. All are laboratory machines. The
power of supercomputers will obviously be increasing at the same
approx~nate rates.
A very strong relationship exists between the architecture of a
special purpose computer and the structure of the scientific prom
lem to be solved. The question is, how much computational power
does molecular modeling really need? The protein folding problem
seems to be the gauge of this question, since molecular dynam-
ics programs calculate atom position charge in 1o-~5 "coed time
steps. If proteins really take minutes to fold, then computation
will have to go from 10-~5 to 102 seconds. The most powerful
array processors available today make it possible to calculate and
examine molecular trajectories three orders of magnitude longer
than hitherto possible. Extending these trajectories an acIditional
three orders of magnitude Knight bring us to the range where ap-
propriate protein-folding actions can take place. There is some
indication that if amino acids were synthesized at the rate of one
per microsecond, then folding would be possible. Then, computing
would only have to range from 10-~5 to 10-5 seconds. This would
be seven orders of magnitude less computing. If this estimate Is
close to correct and computing power increases at a rate of 50
percent per year, then current computer processor development
will give us the necessary amount of power in 5 to 10 years.
OCR for page 144
149
CENTRAL VERSUS DISTRIBUTED CO}D?UT~G
The National Science Foundation (NSF) supercomputer ini-
tiative again brings to the forefront the relationship between cen-
tral computational services and distributed or personal services.
Proponents of centralization argue that certain types of very large
calculations are available only on centralized machines. The per-
sonal computer revolution showed how profoundly scientists re-
spond to decentralized computation. The capabilities of personal
machines increase at the same pace as the supercomputers, but the
baseline machines are a market of 105 to 106 machines, whereas
the supercomputers are a market of 102 to 103. Special purpose
boards added to the baseline machine can raise its capabilities for
specific functions (i.e., energy calculation, sequence comparison,
or graphics) to levels approaching those of supercomputers.
The distribution of personal computation is driven totally by
market forces and Is not subject to centralized planning. Scien-
tists buy laboratory computers with funds previously allocated
for glassware. Postdoctoral students returning to their country
of origin bring their personal computers. Floppy disks containing
data files and even whole books form a new type of currency in
countries operating centrally planned economies.
These modes of behavior form a valuable dichotomy. We
need a balance between centralizing and decentralizing efforts.
Individual scientists can participate in the planning and use of
national supercomputers, while simultaneously helping to specify
and buy smaller machines for their personal and laboratory use.
COMPUTER UTILIZATION IN THE NEXT 5 TO 10 YEARS
In the next 10 years, workstations will become ordinary scien-
tific tools, like pocket calculators and balances. The workstations
will become more popular with scientists as they acquire larger,
faster, and more complex working programs; better graphics; more
storage and access to other computers; and new data sources. A
few years ago, only specialists searched DNA sequence data bases;
now, because many workers have PCs in their laboratories, almost
all molecular biologists search these data bases.
Workstation use is likely to follow the same pattern. Now,
molecular graphics techniques are used only by departmental or
laboratory specialists. In years to come, as all workstations begin
OCR for page 144
150
to acquire adequate graphics capabilities, all scientists will rou-
tinely do molecular graphics, modeling, and energy calculations.
One of the strongest effects In the computer marketplace
the trade-oE between constant dollar and constant performance.
Because computer power is doubling every two to three years, the
manufacturers tend to supply their customers with new models
that cost the same but have increasing computational power. A
customer, then, can expect to purchase a given level of computa-
tional power for a decreasing amount of money.
Twenty years ago, one needed a DEC PDP-10 to search pro-
tein or DNA data bases, while 10 years ago one used DEC PDP-lls
or DEC VAXs. Now, one can use an IBM PC or one of its many
clones to do the same job. In several years, one should be able to
do DNA sequence searches on a pocket machine.
The brevity of the computer design and manufacture cycles
has begun to overtake our ability to use these machines adequately.
Twenty years ago, both manufacturers and consumers could rea-
sonably expect a computer to sell and be worth buying for about
10 years; today, a given level of computational power has a life
cycle of 3 years. The cycle length appears to be shortening even
further in the sense that special purpose boards can be added to a
small general purpose machine to make it functionally equivalent
to a machine that costs up to 100 times as much. Why buy a Cray
when a PC with a special purpose board will do the same thing?
The cure for this problem will probably be a balance of market
forces favoring the small mass distribution computers. PCs will
rise In power to be general purpose workstations.
THE NATIONAL SUPERCOMPUTER NETWORK
The national supercomputer initiative sponsored by NSF allo-
cates available computer time by a peer-review process. Individual
scientist's requests for time must meet granting requirements of
quality of the proposed work and size of allocation. From the sci-
ent~t's viewpoint, the supercomputer network must perform tasks
that cannot be done either in the laboratory or at local institu-
tions. Since the network communication rates are 9,600 BAUD,
only a limited amount of data can be passed between the scientist
and the supercomputer. Essentially, this means that only batch
computing can be run on the supercomputers. Large jobs run in
the batch mode of computing are only one form of computing.
OCR for page 144
151
The highly interactive forms of computing and graphics available
on workstations will be even more competitive with the super-
computer network when the next generation of high performance
workstation, the PSC, becomes available.
The use of national supercomputere can be left to the dis-
cretion of individual scientists as it ~ in this country or the use
of these resources can be mandated. The ability to mandate use
depends on the type of the economy or pattern of interaction be-
tween scientists and the government. The Australian scientists
are also in the midst of this type of central planning (personal
communication, 1987, trip to Australia). The government wants
scientists throughout Australia to use the centralized supercom-
puter by paying for the use with funds from the scientists' grants;
the scientists see this as a form of taxation. The market forces in
Australia will probably dominate when the scientists realize that
superior computing and graphics performance can be obtained by
purchasing a machine. Once a machine is in a department or lam
oratory, the problem of centralized national supercomputer access
and allocation is essentially ended.
LOCAL AREA NETWORKS
Molecular modeling in the future will probably be done on
local networks of computers and displays. For the past 5 to 10
years, advanced scientific laboratories have had one or more mini-
computers. Five years ago, laboratory officials, for the most part,
took the first hesitant steps to link these computers in a network.
In the last two or three years, networking of laboratory computers
has become much more common. Laboratory networks contain
computers acting as hosts for terminal and computational servers
for other workstations. The workstations range in power from the
smallest PC to powerful PSCs. As computers age and are replaced
because they no longer work or are too expensive to maintain, they
will be replaced by networks of a variety of computers and displays.
DATA BASE USE
Access to molecular structure and sequence data bases through
global communications networks is an opportunity that will be
available in the near future. Currently, most data bases are ups
dated by magnetic tape every three to six months, including the
OCR for page 144
152
DNA sequence data bases at the Los Alamos National Laboratory
and at European Molecular Biology Laboratory (EMBL) In Hei-
delberg, the protein sequence data base at the National Biomedical
Research Foundation (NBRF) in Washington, D.C., the protein
structure data base at the Brookhaven National Laboratory, and
the small organic molecule crystal structure data base at Cam-
bridge University. Generating tapes for institutional and random
scientific users is becoming an increasing burden for the data base
operators. The global scientific networks are organized in such a
way that it ~ possible for the data base operators to send out
one copy of the update and have that copy spread throughout the
entire scientific community.
For those scientific users who need a particular molecular
structure data set for display or further modeling, the global sci-
entific networks are ideal sources of information. Only recently,
the Brookhaven protein structure file was tested at the National
Research Council in Ottawa. A simple mad! request to a BITNET
server at the National Research Council produced one or more of
the protein structure data sets in a few minutes.
The small molecule organic crystal structure file from Cam-
bridge University in England ~ being used by scientists for molecu-
lar modeling and calculation. The Cambridge crystal file provides
an ideal data source for ligand conformations. The data file and a
search program have been available on the international commer-
cial computer network for the past 15 years. Technology moves so
fast that even while this report is being prepared the panorama
with respect to data bases distribution has changed. For several
years 54 inch laser disks have been on the market for audio. Now
this highly developed consumer technology has been applied to
the storage and retrieval of molecular structure data. Each laser
disk, which costs about $2,000 to master and $10 to reproduce,
can hold a complete update for the DNA sequence, protein se
quence, protein structure and small molecule data files. The laser
disk and associated software will be produced by a small starting
company associated with the University of Wisconsin (Fred Blat-
tner, DNAstar, Inc. at the University of Wisconsin, 1987, personal
cornmun~cat~on).
COMPETITIVENESS
America has a world recognized ability to transfer ideas from
OCR for page 144
153
their development in an academic setting to practice by the for-
mation of a small commercial enterprise. Then by the infusion
of capital in several stages these small companies can be trans-
formed into stable industrial corporations. These corporations are
then able to consume the supply of trained scientific personnel
produced by the universities. The position of the United States in
the world economy is changing very dramatically at present, and
certainly will continue to change in the next 5 to 10 years. Our
overall competitiveness will be determined by our ability to form
Inks between previously separate activities. It is already clear that
biotechnology as an offshoot of our national expertise in molecular
biology wall be increasingly determined by the way we use com-
puters in computational chemistry, macromolecular modeling, and
the design of proteins. We are in the midst of two revolutionary
tendencies: genetics and silicon. Computational chemistry is the
glue that will bring these tendencies together in a stable form.