Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.

Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 47

Session II
What New Developments Are in the Wind?
Session Chairman: William R. Busing
47

OCR for page 47

New Computational Techniques, Particularly For Refinement (1)
Carroll K. Johnson
The two principal numerical techniques used to refine crystal structures (2)
are the Fourier transform method and the method of linearized least-squares.
The following remarks will be restricted to the least-squares approach;
however, significant developments are also occurring in the Fourier field,
the Fast Fourier Transform algorithm being used to decrease computing time
substantially.
An important preliminary for any crystal structure refinement is
the selection of an appropriate mathematical model for the structure under
study. The selection is usually influenced by the following three consi-
derations.
1. What is the relative importance to the investigator of the dif-
ferent types of information that can be obtained from a structure refine-
ment?
2. Are there any unusual problems involved, such as major disorder
in the structure or poor quality diffraction data?
3. Are the available computer hardware, program software, and
computing budget adequate to handle the proposed refinement?
Ideally, consideration number 1 is of greatest importance, and the
refinement model should be based on the particular type of chemical or
physical information that the investigator wants to gain from the structure
refinement. There seem to be two different areas of interest to crystal-
lographers doing crystal structure analysis. The first area concerns the
geometrical properties of the idealized configuration of point atoms (i.e.,
metrical properties such as distances and angles), and the second area
concerns the elucidation of atomic density function properties such as
electron density.
There are two different schools of thought concerning what is the
best method to use in refining a crystal structure. These two schools may
be termed the free-model school and the constrained-model school.
The free-model school reasons that we should refine a structure in
the least restrictive way possible, with independent parameters for each
atom so that the final results are unbiased by preconceived chemical con-
cepts incorporated into the model. The most commonly used model, with 3
positional parameters and 6 anisotropic temperature factor parameters for
each atom, is an example of an unconstrained model.
The constrained-model school argues that we should put as much chem-
ical information as possible into the model so that the variables to be
48

OCR for page 47

determined are reduced to the basic parameters of direct interest to
the investigator. Examples of constrained models are the rigid-body
model, the segmented-body model, and the models which force chemically
symmetrical groups to be geometrically symmetrical even though they are
not crystallographically equivalent. Such constraints can be applied
to both positional and thermal-motion parameters.
The majority of the crystallographers seem to follow the free-
model school of reasoning. The advantage of the unconstrained model is
its simplicity and easy direct application to a wide variety of problems.
A disadvantage is often the large number of variable parameters that
must be handled when crystal structures of even modest complexity are
refined. For example, a full-matrix refinement with anisotropic thermal
parameters for a 45-atom structure will involve at least 406 variables
and will require 82 621 words of core storage for the least-squares
matrix alone.
The economic importance of the least-squares calculation is empha-
sized by the survey taken by Dr. Hamilton for this symposium. The sur-
vey shows that 80 to 90% of the computing time used by U.S. crystallo-
graphers is spent in the structure-refinement step. Furthermore, the
greater part of this computer time is used in forming the matrix of the
least-squares normal equations; consequently, it is often worthwhile and
sometimes essential to approximate the matrix by an alternate matrix
requiring less computer time and less computer memory.
Table 1 lists some old and some new methods for approximating the
crystallographic least-squares matrix. The principal approach used to
minimize computer core requirements is to omit as many off-diagonal terms
as possible, thus transforming the full matrix to a sparse matrix. The
block-diagonal matrix with one atom per block is the most commonly used
sparse-matrix approximation although further reduction is possible. Diag-
onal matrix approximations are of little value for general crystallographic
refinement because of the oblique coordinate systems used for trigonal,
monoclinic, and triclinic crystals.
TABLE 1 Approximations For The Crystallographic Least-Squares Matrix
1. Sparse matrix approximations
(a) Block diagonal with one atom per block
(b) Cross-word puzzle (block diagonal + first
neighbor interaction terms)
TABLE 1 continued
49

OCR for page 47

TABLE 1 continued
2. Recycle and update approximations
(a) Use the same full matrix unchanged for
several cycles
(b) Recalculate only the block-diagonal sub-
matrices and simply rescale the rest of
the old full matrix
(c) Recalculate only the matrix elements
influenced by parameters which undergo
appreciable shifts
3. Analytical matrix approximations
An untried but seemingly logical extension from the one-atom block-
diagonal matrix is the "cross-word puzzle" matrix where all interaction
terms between close-neighbor atom are added to the block-diagonal matrix.
It is well known that close-neighbor atoms have a greater least-squares
interaction than distant-neighbor atoms. The cross-word matrix would
have to be stored by blocks and inverted with a partitioned-matrix inver-
sion scheme.
The Recycle and Update Procedures listed in Table 1 assume that the
complete matrix has been calculated and stored once and that the changes
in it from cycle to cycle are small. The option of using the same matrix
unchanged for several cycles was available in the original Busing and
Levy least-squares program for the IBM-704. Unfortunately, there is no
recorded evaluation of the usefulness of this approximation; however,
most of the verbal reports received indicate erratic behavior. There are
several rather obvious modifications of the Recycle Procedure which might
prove to be useful. For example, the atom-block-diagonal submatrices
might be recalculated each cycle and the rest of the matrix simply rescaled
by the new over-all scale factor. Alternatively, an algorithm might be
devised whereby the only matrix elements to be updated would be those in-
volving parameters that shifted appreciably in the preceding cycle.
The final method listed in Table 1 utilizes a completely different
approach to reduce computer time. It replaces the time-consuming numer-
ical summations over the thousands of reciprocal lattice points by anal-
ytical integrations. The results on analytical matrix approximations pre-
sented here are from my own work; however, I recently learned that Pro-
fessor Verner Schomaker at the University of Washington has derived in-
dependently a number of related results.
There are five factors which are functions of the scaled reciprocal-
50

OCR for page 47

lattice vector t^ (i.e. t_ = 2trh) in each term of the sum for any particular
matrix element. The five factors for a centrosymmetric structure with
refinement based on F are listed in Table 2.
TABLE 2 Factors In The Pl Crystallographic Least-Squares Matrix Sums
Which Are Functions Of The Reciptocal Lattice Vector t, = 2ffh*
(1) F*(t)/o*[F£(t)]
(2) f (t)f (t)
m — n ~
(3) exp{- y t'[(b + b )/2772]t}.
L* """-" j^~m —.11 ~~~~
(4) t.t. or t.t.t. or t.t t. t g ; (i,j,k,je = 1,2,3)
1J 1 J K IJKi
(5) Tcos[t'(x - x )] ± COS[t'(x -f x )]•
sin"*' •**M «**n ' sin «»* •»•!« ~n
x , XH are positional vectors and b , b are anisotropic
thermal-motion matrices for atoms m and n.
The first factor in Table 2 contains the calculated squared structure
factor divided by the variance of the observed squared structure factor.
This factor can be eliminated from the list by making the following approx-
imation:
Approximation 1 - The magnitude of the calculated squared structure
is assumed to be proportional to the variance of the observed squared
factor. Consequently, the ratio F^/o^(Fp) becomes a constant for
all reciprocal lattice points.
Approximation 1 is completely valid for the special case where variances
are based on counting statistics alone with no correction for background
counts.
The second factor in Table 2, a product of atomic scattering factors
51

OCR for page 47

for atoms m and n, may be replaced by an analytical expression.
Approximation 2 - The product of two atomic scattering
factors is assumed to be approximated adequately by a short sum
of spherical Gaussian functions.
Sums of 3 to 5 Gaussian functions currently are used successfully to re-
place scattering- factor table- look-up procedures in crystallographic
programs. The same tabulated Gaussian coefficients could be used in a
double summation; however, a more efficient procedure is to fit new
Gaussian coefficients directly to the scattering factor product, taking
care to make the fit acceptable for the entire range of
The third factor in Table 2, the product of anisotropic Gaussian
temperature factors for atoms m and n, presents no difficulty. The
fourth factor t.t....t is the n degree product of the components for
the three-dimensional reciprocal lattice vector t. The 2n , 3 , and
4th degree products occur in position-position, position-thermal, and
thermal-thermal matrix elements respectively. The fifth factor is a
product of trigonometric terms which can be rewritten as a sum of trig-
onometric terms with arguments containing inner products of t with an
interatomic vector between atoms m and n. When the periodic properties
of the trigonometric functions and the crystal lattice are considered,
this factor is seen to contain the Patterson vectors between all atoms
of types m and n in the crystal.
By incorporating approximations 1 and 2, we can write a simplified
equation for any element in the crystallographic least-squares matrix L.
For example, for space group Pl, the equation for the interaction term
relating the i component of the position vector .x^ for atom m and the
j component of the position vector for atom n (i,j = 1,2,3) is
L(xm'xn} = K EZ'EV exp(-£'M t/2) cosCt/^) (1)
with the matrix M defined as
SB
M = 0 G-1 + (b
In these equations, K is a constant, £ is an interatomic vector between
atoms on crystallographic sites m and n (i.e., x - x , JXL, - Jc , x + g ,
and -jc - x ), J)m and b are anisotropic temperature factor matrices for
atoms m and n, G~^ is the contravariant metric matrix, and QJ.,,/3 are
52

OCR for page 47

coefficients in the Gaussian expansion for the scattering-factor product
for atom pair m,n.
The main step in the approximation procedure involves replacing
the summation over £ in Eq. (1) by an integration over £.
Approximation 3 - We assume that enough reciprocal-lattice points
are included in the reciprocal-lattice summation to justify the
replacement of the summation by an integration without including
higher-order correction terms.
Approximation 3 is based on the classical Euler-MacLaurin summation for-
mula suitably generalized to the three-dimensional case. The one-
dimensional Euler-MacLaurin summation formula is
m b
f(a + kh) = I /f(t)dt + |[f(b) + f(a)J +-.. , (3)
k=0 a
where h = (b - a)/m. The higher-order terms (not shown) involve powers
of h and odd-order derivatives of f at the limits a and b.
A number of special cases now occur, depending upon the integration
limits. In the simplest case, the integration extends over all of the
reciprocal space and we obtain the result,
Lfx1 v^h • K-' V V r inf~i« r M-II < '„ i
MW '~ K i/Z/V-IMI H^Cjr.M-MexpC-E M-'y/2) (4)
with H.-CyjM-1) = z.z. - Mr1., and z = M-1V.
^•J — i J ij ~ ~ -c.
The tensor component H..(Y.M ) is a second-order three-dimensional Hermite
T_ 1 '^ dc
polynomial. Corresponaing formulas for the position-thermal and the
thermal-thermal interaction have the same form as the position-position
interaction equation shown in Eq. 4 except that the second order H.. is
replaced by the third and fourth order polynomials H^K and H-jiW respectively.
Equation 4 represents the asymptotically limiting situation which
is approached only when the entire reciprocal-lattice data set is inclu-
ded. A more common experimental practice is spherical truncation of the
data set at some fixed value of |t|. An exact solution for this general
case with anisotropic temperature factors and spherical truncation is
quite difficult, but some success has been achieved with empirical correct-
ion factors applied to Eq. 4. If the temperature factor for each atom is
isotropic and the truncation is spherical, the finite summation over t in
53

OCR for page 47

Eq. 1 can be replaced by an integration which has an anlytical solution
involving Legendre polynomials and error functions.
Equations 1 and 4, which are specialized for space group Pl, can
be generalized to include any centrosymmetric space group by incorporating
a double summation over the symmetry operation for the space group. The
task of obtaining an analytical formulation for the noncentrosymmetric
space groups is less straightforward because the first factor given in
Table 1 (i.e., F£ (t)/cr {jo^t)]) is replaced by three terms. For example
in the noncentrosymmetric space group Pl, the matrix element for a position
position interaction may be written
p i ° °
2A B
-i'| t/2>cos[(xm-xn)'t]
A2 - B2
1tjexp(-t'F t/2)cos[(xn+xn)'t] (5)
-t £ t/2)sin[(5iB+xn)'t]l
where A and B are the real and imaginary parts of the structure factor
and F " A + B « The problem is to predict the behavior of the factors
(A§ - B^/Qo2^ + B*)] and A^/fa2^ + B2)] . Intuitively, it seems
that the terms containing these factors may tend to integrate to zero
if the entire reciprocal lattice is included and if the structure is a
"random structure"; however, the conjecture has not been proven. The
integral behavior of these terms for a real structure with a truncated
data set seems rather unpredictable.
Evaluation of the analytical matrix technique is underway. With
favorable conditions (i.e., low crystallographic symmetry and extensive,
but finite, diffraction data) the computing time required to form the
matrix has been reduced by an order of magnitude. For cases where the
symmetry is very high and the data collected are not extensive, there
may be no saving of time.
The principal testing of the procedure to date has been for an
application rather different from least-squares refinement. We use the
inverted analytical matrix to calculate the complete variance-covariance
matrix for a published structure without computing structure factors or
their derivatives. The only data needed to generate the analytical matrix
are the structural parameters, a matrix scale factor, and a truncation
parameter. The latter two parameters can usually be obtained quite
54

OCR for page 47

accurately from the standard deviations, which usually are published
with the crystal-structure paper.
In addition to the matrix approximations described above, there
are also possibilities for saving computer time by utilizing some special
redundancy properties of the full crystallographic least-squares matrix
in space group Pl. The basic approach is simply to examine the equations
for the elements in the matrix. An example is for a hypothetical structure
displaying space group symmetry Pl, with two atoms (m and n) in the asym-
metric unit. If we write out the equation for the 171 supposedly unique
elements in the symmetric 18 by 18 matrix, ^, for positional and aniso-
tropic thermal parameters, we quickly discover that considerable redun-
dancy is present, only 103 elements actually being unique. The remaining
68 elements are simple multiples of other elements. For example, we find
1919 1199 1 •} 9 ^ 23 IT
that L(b^2,bm 2) = 4L(b* ^bj 2) and L

OCR for page 47

DISCUSSION
Sparks: Concerning the calculation of correlation coefficients from
position parameters and thermal parameters, I always thought that
wouldn't work if you had a situation where you had refined a set
of data having not only the termination problem you mention but
also a lot of weak reflections that are just left out of the data
set. It would seem to me this would tend, in some peculiar way, to
bias the results.
Johnson: The numerical agreement between the variances calculated from
the regular inverted least-squares matix and those calculated from
the inverted analytical matrix is usually quite good for any data
set. The agreement for the covariances become much better if the
data set is fairly extensive. Missing reflections may present a
serious problem if the data set is quite sparse, but we have not
examined this aspect numerically or theoretically.
Templeton: This sounds like magic until you think about it, but there's
a way of restating it which I think makes evident what you're doing.
If you have a published structure, and use this to calculate structure
factors, this is a data set more or less like the experimental data
set, more to the structure's right and less to the structure's wrong.
From that data set you select, depending on your knowledge, which
reflections have been left out. For example commonly people say, "We
observed 1600 reflections of which 400 were zero." If you simply
chop off the 400 smallest ones then you would have a very good replica.
One thing I noticed in your suggestion that one leave out matrix ele-
ments not affected by temperature, it is not evident how you know which
ones these are because they're not necessarily just the ones labeled
by the subscripts of the parameter that is shifted.
Johnson: I have to admit that I have not thought this through in detail
and cannot at present describe an algorithm that would keep track of
the major changes from cycle to cycle.
Templeton: Then part of my next question is, how do you know which ones
they are, because all of the derivatives include in them the structure
factor?
Johnson: But we have in this formulation eliminated the structure factor,
but you're right, very good point. However, the structure factor was
eliminated in the analytical formulation and perhaps, as an approxima-
tion, the same reasoning could be extended to this approach.
Hamilton: Would you predict that this may be the answer for people who
are refining protein structures and that they should really be think-
ing seriously about this method?
56

OCR for page 47

Johnson: I must admit I harbor the fond hope that the analytical approach
might someday be applicable to protein refinement. Unfortunately, I
cannot at present see how to handle the non-centrosymmetric problem
properly. Luckily, you scheduled me in a session where I can discuss
what should be done and not necessarily what can be done. I think
it is certainly worthwhile to put some additional effort into this
approach to see if it might be a feasible solution for proteins.
Busing: You say this speeds up the computation of the matrix by perhaps
a factor of 10. If the number of observations and parameters is very
large, does this method become even more favorable?
Johnson: The approximation improves as the number of observations increases
and we are in best shape if the data set includes everything that can
possibly be measured. In this case we also obtain our maximum time
advantage. Additional parameters may also improve the time advantage
because the sum over the Patterson vectors converges rapidly as a
function of interatomic separation; consequently the long vectors can
safely be omitted from the summation.
57

OCR for page 47

The Role of the Minicomputer in the Crystallography Laboratory
Robert A. Sparks
It is universally recognized that the minicomputer plays an important
role in the crystallographic laboratory. The semi-automatic and auto-
matically controlled diffractometers have offered welcome relief for
crystallographers who previously had to measure large amounts of data
manually.
As with most computer-controlled instruments, early programs
tended to be written to collect data in much the same way that the cry-
stal lographer would have used to operate the instrument manually. Later
programs have taken advantage of the flexibility of the computer to
perform tasks that would have been virtually impossible to perform with
the manual or semi-automatic diffractometer.
Thus, it is now possible to:
1. Automatically center reflections.
2. Sample the profile for each reflection during data collection.
3. Choose different scan speeds dependent on the intensity of
the reflections.
4. Search for peak maxima for all but the weakest reflections.
5. Measure reflections at many azimuthal angles about the dif-
fraction vectors.
6. Measure regions of reciprocal space in a three-dimensional
fashion to obtain diffuse-scattering information.
7. Automatically redetermine the crystal orientation if the
crystal should move during data collection.
8. Obtain information about crystal quality and crystal symmetry.
All of this can be done with a slow computer having a minimal amount
of core (4000 words). For subsequent processing of the collected data a
magnetic tape drive is desirable. Magnetic tape is chosen because it is
an inexpensive means of storing large amounts of data in formats that
are universally recognized by large and small computers.
As collection methods become more complex one soon realizes that
the limiting factor is the amount of core available. Thus, it is becoming
fairly common for computer-controlled diffractometers to have more than
4000 words of core or to have a disk for program overlays.
For reasons of economy, manufacturers of computer-controlled dif-
fractometers have chosen the least expensive computers that can easily
be interfaced to the many control and acquisition functions of the diffrac-
tometer. Almost all commercial instruments are of the one instrument-
one minicomputer type. Other configurations, such as several instruments-
66

OCR for page 47

one medium-size computer or one instrument-one minicomputer-communication
link-large computer, have the possible advantage that more computer capa-
bility becomes available for the diffractometer for at least part of the
time. However, such approaches are expensive because they are almost
always one of a kind. The advantage of the one instrument-one minicomputer
is that the development cost, of which a large part is for software, can
be spread over many identical instruments.
Although the diffractometer experiments are slow, the inexpensive
minicomputers used to control the diffractometers are not slow. The state
of computer technology is such that computers with memory-cycle times of
about one microsecond and execution times of one to three microseconds
for most commands are no more expensive to build than computers that are
one-half or one-tenth as fast. Therefore, the minicomputer is used to
perform some calculations that do not use the diffractometer-control
features of the computer. Thus, the minicomputer is used to determine
indices for reflections, best least-squares unit-cell parameters, and
Lorentz-polarization factors. All of these tasks can be performed by the
large computer but are more conveniently done by the minicomputer.
Some crystallographers have used the small computer for tasks more
traditionally performed on the large computer. Thus, Eric Gabe uses the
PDP-8 which controls his Picker diffractometer to do structure-factor
calculations. Shiono for many years has used the IBM ll30 to do almost
all types of crystallographic calculations. For the most part, however,
crystallographic computations are done on the most powerful computer
available. Why this is so is illustrated in the first two columns of
Table 1, which compar_e the characteristics of the Nova 1200, used to
control the Syntex Pi Autodiffractometer, with the characteristics of
the CDC 6600, one of the most powerful computers used for crystallo-
graphic calculations. Although core speeds are not very different, the
CDC 6600 can achieve effective speeds of up to 100 nanoseconds because
the memory has been divided into independent blocks of 4096 words each.
In every other respect the CDC 6600 is a much more powerful computer.
Because of the large core memory, all structure factor data and the nor-
mal equations of a least-squares program can be resident. Because of
the fast instruction registers, tight loops can be executed with no need
to continually reference slow core. Because of the many arithmetic units
and addressing and indexing registers, many operations can take place
simultaneously. Not of least importance is the fact that CDC has an ex-
cellent FORTRAN compiler which makes efficient use of all this sophisti-
cated hardware.
On the other hand, the minimal Nova 1200 configuration does not have
enough core and is so slow that many of the important crystallographic
programs would be virtually impossible to run for all but the smallest
structures. I believe, however, that the most serious limitation is the
unavailability of compilers of higher-level languages producing progams
to minimize amount of core needed at run time. This deficiency has
67

OCR for page 47

meant that the crystallographer has not easily been able to tailor
his data-collection programs to meet his requirements.
The minicomputer industry however, is, advancing rapidly. The
industry is extremely competitive and prices for all parts of the hard-
ware are decreasing at a phenomenal rate. New innovations - for example,
semiconductor memory - are introduced into minicomputers almost simul-
taneously with introduction in large computers. Good compilers with
full FORTRAN IV capability are available for computers with larger core
memories (12 000 words or more for the NOVA computers). Disk operating
systems that are as flexible and easy to use as those found on large
computers are now arailable. Finally, fast floating-point hardware is
now optional from some of the minicomputer manufacturers and also from
several independent firms.
The third column of Table 1 lists the characteristics of a system
that would satisfy all or almost all of the crystallographer's computing
needs. In addition to the basic 4000-word NOVA with a magnetic tape
drive required for the PT Autodiffractometer, this system has an additional
12 000 words of core, 131 000-word fixed-head disc, and floating-point
hardware. Software consists of FORTRAN IV and a Disc Operating System.
Crystallographic data-processing programs would be written in FORTRAN IV.
Programs would reside on magnetic tape reels and be loaded on to disc
when needed. Large programs would consist of several overlays. Large
arrays would also reside on disc and be brought in to core one sector
at a time. Diffractometer programs would also be written in FORTRAN IV
but using machine-language subroutines for driving the goniometer axes,
reading the encoders and sealer, opening and closing the shutter, etc.
Table 1 Comparison of Nova 1200 and GDC 6600
Nova 1200
with Structure
Minimal
Nova 1200
Determination
Package
CDC 6600
Magnetic Tape Drive
many
1
1
Core Speed
i.Ons
1.2 MS
1.2jis
Word Size
60 bits
16 bits
16 bits
Core Size
131 000 words
4000 words
16 000 words
131 000-word disk
Table 1 continued
68

OCR for page 47

Table 1 Comparison of Nova 1200 and CDC 6600
CDC 6600
Minimal
Nova 1200
Nova 1200
with Structure
Determination
Package
Operand, Addressing,
and Indexing Registers
Fast Instruction
Registers
Floating Multiply
Arithmetic Units
FORTRAN IV &
Operating System
24 4
8 (60 bits each) None
(60 bits) 2 ms (32 bits)
10
Very Good
No
None
15.6 MS (32 bits)
24.2 jxs (64 bits)
Good
Almost all crystallographic programs could be run on such a system.
It is hard to justify the cost of a plotter for the infrequent use
crystallographers would make of it. Therefore, it would probably be most
economical to generate plotting information on magnetic tape on this sys-
tem and then have the actual plotting done at central facilities. Fourier
maps would be generated on magnetic tape and either printed at central
facilities or printed on the slow printer by the NOVA 1200. At ten char-
acters per second a large Fourier map could take several hours to print.
In many cases, good peak-picking programs exist that eliminate the need
to print the maps.
There is no question that such a system is feasible for almost all
crystallographic calculations. The structure of vitamin B12 was solved
and refined on a computer with a configuration closer to the basic NOVA 1200
than to the system proposed here. Indeed much of the philosophy of disc
(or drum) usage and external plotting and printing of large files is
identical to that used on the large computers of 5 and 10 years ago.
Time-sharing of data collection and data processing presents prob-
lems not associated with the amount of core or the arithmetic processing
speed, but rather with the allocation of peripherals to the two tasks.
Data collection must have the magnetic tape drive available for output of
the intensity data. Therefore, production of a Fourier map could not be
done simultaneously with data collection. However, Fourier calculations
are quite fast (except for printing) and interrupting data collection for
the few minutes necessary to generate the map and output it to magnetic
69

OCR for page 47

tape is not a serious limitation. Happily, the least-squares calculation
which takes the bulk of time for structure determination requires the
magnetic-tape drive only for the brief time necessary to dump all the
data onto disc. After this, several operations could be performed with-
out using the magnetic-tape drive and could be effectively overlapped with
data collection.
The proposed system in the crystallographer's laboratory is clearly
more convenient than a centralized computing facility. It is, in most
cases, also more economical.
The inner loop of a least-squares program (namely the generation of
the normal equations) was written in FORTRAN and executed on a number of
different computers. The program is shown in Figure 1 and the results of
the test in Table 2. The FORTRAN compilers that produce the most effi-
cient code were used on the CDC 6600 and IBM 370/155. Because the floating-
point hardware is fairly new for the NOVA machines, the FORTRAN compiler
has not yet been modified to produce code for this feature. Reasonable
substitutions were made in the assembly listing generated for the soft-
ware floating-point version in order to produce the "FORTRAN-like" code.
The hand-optimized version was an assembly language program written to be
executed as efficiently as possible. If the matrix is large enough to
require that it be stored on disc, the data-channel transfers would in-
crease the NOVA 1200 times in this example by about 0.8. If 64-bit
floating-point numbers are required, an increase of about 257o is required
for the NOVA 1200 times.
Table 2 Comparison of Time for Least-Squares Inner Loop
time
CDC 6600 0.93 s (60-bit words)
IBM 370/155 7.5 s (32-bit words)
HP 2100 A 150 s (32-bit words)
(Hardware multiply/divide
Software floating-point)
HP 2100 A 29 s (32-bit words)
(Hardware floating-point)
Nova 800
(Software floating-point) 206 s (32-bit words)
Table 2 continued
70

OCR for page 47

Table 2 Comparison of Time for Least-Squares Inner Loop
time
Nova 800
(Hardware floating-point)
FORTRAN-like code generation 16.8 s (32-bit words)
Hand-optimized code 13.2 s (32-bit words)
Nova 1200 360 s (32-bit words)
(Software floating-point)
Nova 1200
(Hardware floating-point)
FORTRAN-like code generation 24.2 s (32-bit words)*
Hand-optimized code 17.5 s (32-bit words)
Calculated from Nova 800 performance.
Typically, we at Syntex use about one hour of CDC 6600 computer
time for a structure with 40-50 non-hydrogen atoms in the asymmetric
unit. If the FORTRAN test in Figure 1 is typical, the "FORTRAN-like"
time on the NOVA 1200 would be 26 hours for this same structure. This
amount of time is small compared to typical data collection times of
one to two weeks. Even without overlap of data collection and data
processing there would not be a serious deterioration of diffractometer
usage. With simultaneous least-squares calculations and data collection,
diffractometer servicing will be negligibly affected.
N = 64
NREF = 100
M = N + 1
MM = M + 1
DO 6001 IP = 1, NREF
DO 20 I = 1, M
20 DV(I) = I *IP*0.9
K = 1
DO 5001 J = 1, N
B = DV (J)
IF (B.NE.O) GO TO 5002
K = K + MM - J
Figure 1 FORTRAN test program (continued)
71

OCR for page 47

GO TO 5001
5002 DO 5003 L = J, M
A (K) = A (K) + DV (L) *B
5003 K = K + 1
5001 CONTINUE
6001 CONTINUE
Figure 1 FORTRAN test program (continued)
Even though the NOVA 1200 with floating point hardware is 26 times
slower than the CDC 6600 and 3.2 times slower than the IBM 370/155, turn-
around time will in many cases favor the dedicated computer because it
is located in the crystallographer's laboratory.
Another important feature of the small dedicated system compared
to the large very fast computer is that it is impossible on the former
system to find out one day that an error made by a student has exhausted
the year's computer budget.
Because of the above arguments, Syntex has decided to make avail-
able to customers a Structure Determination Package which would consist
of a 131 000-word fixed head disc, 12 000-word core, and floating point
hardware for those who already have a Pl Autodiffractometer or AD-l Auto-
donsitometer, and a stand-alone unit consisting of a NOVA 1200, a 131 000-
word fixed head disc, 16 000-word core, floating point hardware, and a
magnetic tape drive for those who do not have the Syntex instruments.
Software will consist of a FORTRAN IV compiler modified to make efficient
use of the floating point hardware, a Disc Operating System modified to
allow time-sharing of data collection with data processing, machine lan-
guage subroutines for the diffractometer, FORTRAN versions of the current
diffractometer programs, and FORTRAN programs properly broken up into
overlays for the basic crystallographic programs. The user will be able
to add his own FORTRAN or assembly language programs to the library. At
this early stage, it looks quite probably that the selling price would
be $30 000 for hardware and software for the attachment to existing in-
struments, and about $45 000 for the stand-alone option. First deliveries
are scheduled for the second quarter of 1973.
The comparison of the cost of the system proposed here compared
with existing costs at centralized computing facilities is difficult to
make. University computing centers may charge the scientist anywhere from
nothing up to the actual cost of the computing service, depending on what
other sources of funds are available to the centers. Commercial rates
are set to provide a profit for the company providing the service, but
are usually complex functions of CPU time, amount of core used, amount of
72

OCR for page 47

input and output, and job priority. In Palo Alto the Control Data Center
provides the most economical service for crystallographic type problems.
Syntex pays about $1000 per structure for their service. Clearly, for
us, the break-even point would be 30 structures for the $30 000 attachment
or 45 structures for the $45 000 stand-alone configuration.
In conclusion, whether crystallographers would be inclined to buy
the Syntex package or whether they would wish to buy directly from the
computer manufacturers and provide their own software, I believe that
serious consideration should be given to the small dedicated computer.
Not only does it provide the desirable features of FORTRAN data-collection
programs and the convenience of having one's own computer, but it also
provides, in many cases, a substantial cost saving compared to the cen-
tralized computer approach used by most crystallographers today.
DISCUSSION
Young: If you are doing full-matrix least-squares, what is the maximum
number of parameters you can handle with this sort of adorned mini-
system?
Sparks: It turns out to be the same figure Jim Ibers quoted, 240, be-
cause the disc size is 131 000 words. A good suggestion by Mike
Murphy is that instead of using a fixed-head disc as we are here,
we ought to be using a movable-head disc which costs quite a bit
less for the amount of disc space that would be available. Then
the capacity would be something like two million words.
Young: If you put all those core packages and discs plus an extra arith-
metic unit on the mini-computer, why do you bother with putting a
diffractometer on it?
Sparks: I've given you the choice. $45 000 or $30 000.
Young: No. My point is that what you've done is build a separate com-
puter system, and the fact that the diffractometer is hooked on is
incidental.
73

OCR for page 47

Sparks: It does give some capability for the collection programs that
we do not now have. A couple of years ago you made a strong point
that we ought to be writing these collection programs in FORTRAN.
Lowrey: Professor S. H. Bauer at Cornell University has an extensive
system for electron diffraction that is built around the PDP-8
and he has made extensive use of cathode-ray-tube display. He is
able to search his electron diffraction data and his radial distri-
butions and look at very fine portions. With respect to Fourier
maps, instead of having to print them out you can set up a graphic
interaction display for picking out the things you want. Bauer is
able to do a great deal of electron diffraction using solely the
small computer. He considers the advantage is that not only is it
cheap but it is under his direct control so that he can run all
night and have a guarantee of getting his programs back, and not
have the problems of priorities on commercial computing systems.
Sparks: We also sell a three-dimensional display.
Ibers: Two points might be kept in mind. (1) It is easier to get com-
puting money in a grant than it is to get $45 000 to buy a small
computer. (2) It may be possible to sneak small computers into
laboratories throughout a campus by claiming that these computers
are controlling experiments, but their presence makes computer cen-
ter directors very nervous, for good reason. If the small computer
proliferates throughout the campus you are in trouble. Suppose we
have 20 computers of the type you have discussed. In effect a
million dollars has been spent and it has not benefited the central
computing facility at all. For the good of the university community
it might have been more reasonable to put that million dollars into
the central facility. In any event there are obviously political
problems that are by no means negligible.
Sparks: Yes. I am aware of this. My feeling is that the instrument
ought to be treated as having a very special application. It is
not by any means a general purpose computer.
Fritchie: Do you have any idea what the annual maintenance costs are
on this $45 000 system? Computer alone perhaps?
Sparks: I do not have that figure. What is it on the diffractometer?
Dewar: It will be around 77o.
Coppens: What is the capacity of the system? In other words, how many
crystallographers can it handle?
Sparks: It depends on how productive those crystallographers are. It's
74

OCR for page 47

better to say, how many structures could you reasonably hope to
do on a system like this. We think that for a 40-50 atom struc-
ture it would take twenty-six hours for the structure determination.
It certainly takes quite a bit longer to collect the data. So
really, you are still limited by the amount of time it takes to
collect the data.
Coppens: So the system has over-capacity for one crystallographic
group.
Sparks: Yes, it has indeed.
Corfield: I think this system is not totally unreasonable, but what makes
it reasonable is the availability of inexpensive hardware floating-
point arithmetic units. We've had at Ohio State University for the
past two or three years a system rather more sophisticated than this
but that does not have hardware floating-point arithmetic. Present-
ly we do all our least-squares and all our Fourier summations in-
house, but once we get up to a couple of hundred variables, it would
be worth our while to use a larger computer because of the limita-
tions of the software floating-point arithmetic on our in-house
machine.
Medrud: If this kind of approach is attractive to other crystallographers,
there is another encouraging factor in the change in attitude of some
of the minicomputer manufacturers. Our first contacts with them, with
regard to our application, were met with disdain. The most recent
contacts other people in our group have had with them indicate much
more interest in systems development. They formerly wanted to hand
you a computer and a bag of hardware for interfacing and say "go to
it", but now they are willing to discuss a system comparable to yours.
75

OCR for page 47