Click for next page ( 48


The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 47
Session II What New Developments Are in the Wind? Session Chairman: William R. Busing 47

OCR for page 47
New Computational Techniques, Particularly For Refinement (1) Carroll K. Johnson The two principal numerical techniques used to refine crystal structures (2) are the Fourier transform method and the method of linearized least-squares. The following remarks will be restricted to the least-squares approach; however, significant developments are also occurring in the Fourier field, the Fast Fourier Transform algorithm being used to decrease computing time substantially. An important preliminary for any crystal structure refinement is the selection of an appropriate mathematical model for the structure under study. The selection is usually influenced by the following three consi- derations. 1. What is the relative importance to the investigator of the dif- ferent types of information that can be obtained from a structure refine- ment? 2. Are there any unusual problems involved, such as major disorder in the structure or poor quality diffraction data? 3. Are the available computer hardware, program software, and computing budget adequate to handle the proposed refinement? Ideally, consideration number 1 is of greatest importance, and the refinement model should be based on the particular type of chemical or physical information that the investigator wants to gain from the structure refinement. There seem to be two different areas of interest to crystal- lographers doing crystal structure analysis. The first area concerns the geometrical properties of the idealized configuration of point atoms (i.e., metrical properties such as distances and angles), and the second area concerns the elucidation of atomic density function properties such as electron density. There are two different schools of thought concerning what is the best method to use in refining a crystal structure. These two schools may be termed the free-model school and the constrained-model school. The free-model school reasons that we should refine a structure in the least restrictive way possible, with independent parameters for each atom so that the final results are unbiased by preconceived chemical con- cepts incorporated into the model. The most commonly used model, with 3 positional parameters and 6 anisotropic temperature factor parameters for each atom, is an example of an unconstrained model. The constrained-model school argues that we should put as much chem- ical information as possible into the model so that the variables to be 48

OCR for page 47
determined are reduced to the basic parameters of direct interest to the investigator. Examples of constrained models are the rigid-body model, the segmented-body model, and the models which force chemically symmetrical groups to be geometrically symmetrical even though they are not crystallographically equivalent. Such constraints can be applied to both positional and thermal-motion parameters. The majority of the crystallographers seem to follow the free- model school of reasoning. The advantage of the unconstrained model is its simplicity and easy direct application to a wide variety of problems. A disadvantage is often the large number of variable parameters that must be handled when crystal structures of even modest complexity are refined. For example, a full-matrix refinement with anisotropic thermal parameters for a 45-atom structure will involve at least 406 variables and will require 82 621 words of core storage for the least-squares matrix alone. The economic importance of the least-squares calculation is empha- sized by the survey taken by Dr. Hamilton for this symposium. The sur- vey shows that 80 to 90% of the computing time used by U.S. crystallo- graphers is spent in the structure-refinement step. Furthermore, the greater part of this computer time is used in forming the matrix of the least-squares normal equations; consequently, it is often worthwhile and sometimes essential to approximate the matrix by an alternate matrix requiring less computer time and less computer memory. Table 1 lists some old and some new methods for approximating the crystallographic least-squares matrix. The principal approach used to minimize computer core requirements is to omit as many off-diagonal terms as possible, thus transforming the full matrix to a sparse matrix. The block-diagonal matrix with one atom per block is the most commonly used sparse-matrix approximation although further reduction is possible. Diag- onal matrix approximations are of little value for general crystallographic refinement because of the oblique coordinate systems used for trigonal, monoclinic, and triclinic crystals. TABLE 1 Approximations For The Crystallographic Least-Squares Matrix 1. Sparse matrix approximations (a) Block diagonal with one atom per block (b) Cross-word puzzle (block diagonal + first neighbor interaction terms) TABLE 1 continued 49

OCR for page 47
TABLE 1 continued 2. Recycle and update approximations (a) Use the same full matrix unchanged for several cycles (b) Recalculate only the block-diagonal sub- matrices and simply rescale the rest of the old full matrix (c) Recalculate only the matrix elements influenced by parameters which undergo appreciable shifts 3. Analytical matrix approximations An untried but seemingly logical extension from the one-atom block- diagonal matrix is the "cross-word puzzle" matrix where all interaction terms between close-neighbor atom are added to the block-diagonal matrix. It is well known that close-neighbor atoms have a greater least-squares interaction than distant-neighbor atoms. The cross-word matrix would have to be stored by blocks and inverted with a partitioned-matrix inver- sion scheme. The Recycle and Update Procedures listed in Table 1 assume that the complete matrix has been calculated and stored once and that the changes in it from cycle to cycle are small. The option of using the same matrix unchanged for several cycles was available in the original Busing and Levy least-squares program for the IBM-704. Unfortunately, there is no recorded evaluation of the usefulness of this approximation; however, most of the verbal reports received indicate erratic behavior. There are several rather obvious modifications of the Recycle Procedure which might prove to be useful. For example, the atom-block-diagonal submatrices might be recalculated each cycle and the rest of the matrix simply rescaled by the new over-all scale factor. Alternatively, an algorithm might be devised whereby the only matrix elements to be updated would be those in- volving parameters that shifted appreciably in the preceding cycle. The final method listed in Table 1 utilizes a completely different approach to reduce computer time. It replaces the time-consuming numer- ical summations over the thousands of reciprocal lattice points by anal- ytical integrations. The results on analytical matrix approximations pre- sented here are from my own work; however, I recently learned that Pro- fessor Verner Schomaker at the University of Washington has derived in- dependently a number of related results. There are five factors which are functions of the scaled reciprocal- 50

OCR for page 47
lattice vector t^ (i.e. t_ = 2trh) in each term of the sum for any particular matrix element. The five factors for a centrosymmetric structure with refinement based on F are listed in Table 2. TABLE 2 Factors In The Pl Crystallographic Least-Squares Matrix Sums Which Are Functions Of The Reciptocal Lattice Vector t, = 2ffh* (1) F*(t)/o*[F£(t)] (2) f (t)f (t) m — n ~ (3) exp{- y t'[(b + b )/2772]t}. L* """-" j^~m —.11 ~~~~ (4) t.t. or t.t.t. or t.t t. t g ; (i,j,k,je = 1,2,3) 1J 1 J K IJKi (5) Tcos[t'(x - x )] ± COS[t'(x -f x )]• sin"*' •**M «**n ' sin «»* •»•!« ~n x , XH are positional vectors and b , b are anisotropic thermal-motion matrices for atoms m and n. The first factor in Table 2 contains the calculated squared structure factor divided by the variance of the observed squared structure factor. This factor can be eliminated from the list by making the following approx- imation: Approximation 1 - The magnitude of the calculated squared structure is assumed to be proportional to the variance of the observed squared factor. Consequently, the ratio F^/o^(Fp) becomes a constant for all reciprocal lattice points. Approximation 1 is completely valid for the special case where variances are based on counting statistics alone with no correction for background counts. The second factor in Table 2, a product of atomic scattering factors 51

OCR for page 47
for atoms m and n, may be replaced by an analytical expression. Approximation 2 - The product of two atomic scattering factors is assumed to be approximated adequately by a short sum of spherical Gaussian functions. Sums of 3 to 5 Gaussian functions currently are used successfully to re- place scattering- factor table- look-up procedures in crystallographic programs. The same tabulated Gaussian coefficients could be used in a double summation; however, a more efficient procedure is to fit new Gaussian coefficients directly to the scattering factor product, taking care to make the fit acceptable for the entire range of The third factor in Table 2, the product of anisotropic Gaussian temperature factors for atoms m and n, presents no difficulty. The fourth factor t.t....t is the n degree product of the components for the three-dimensional reciprocal lattice vector t. The 2n , 3 , and 4th degree products occur in position-position, position-thermal, and thermal-thermal matrix elements respectively. The fifth factor is a product of trigonometric terms which can be rewritten as a sum of trig- onometric terms with arguments containing inner products of t with an interatomic vector between atoms m and n. When the periodic properties of the trigonometric functions and the crystal lattice are considered, this factor is seen to contain the Patterson vectors between all atoms of types m and n in the crystal. By incorporating approximations 1 and 2, we can write a simplified equation for any element in the crystallographic least-squares matrix L. For example, for space group Pl, the equation for the interaction term relating the i component of the position vector .x^ for atom m and the j component of the position vector for atom n (i,j = 1,2,3) is L(xm'xn} = K EZ'EV exp(-£'M t/2) cosCt/^) (1) with the matrix M defined as SB M = 0 G-1 + (b In these equations, K is a constant, £ is an interatomic vector between atoms on crystallographic sites m and n (i.e., x - x , JXL, - Jc , x + g , and -jc - x ), J)m and b are anisotropic temperature factor matrices for atoms m and n, G~^ is the contravariant metric matrix, and QJ.,,/3 are 52

OCR for page 47
coefficients in the Gaussian expansion for the scattering-factor product for atom pair m,n. The main step in the approximation procedure involves replacing the summation over £ in Eq. (1) by an integration over £. Approximation 3 - We assume that enough reciprocal-lattice points are included in the reciprocal-lattice summation to justify the replacement of the summation by an integration without including higher-order correction terms. Approximation 3 is based on the classical Euler-MacLaurin summation for- mula suitably generalized to the three-dimensional case. The one- dimensional Euler-MacLaurin summation formula is m b f(a + kh) = I /f(t)dt + |[f(b) + f(a)J +-.. , (3) k=0 a where h = (b - a)/m. The higher-order terms (not shown) involve powers of h and odd-order derivatives of f at the limits a and b. A number of special cases now occur, depending upon the integration limits. In the simplest case, the integration extends over all of the reciprocal space and we obtain the result, Lfx1 v^h • K-' V V r inf~i« r M-II < '„ i MW '~ K i/Z/V-IMI H^Cjr.M-MexpC-E M-'y/2) (4) with H.-CyjM-1) = z.z. - Mr1., and z = M-1V. ^•J — i J ij ~ ~ -c. The tensor component H..(Y.M ) is a second-order three-dimensional Hermite T_ 1 '^ dc polynomial. Corresponaing formulas for the position-thermal and the thermal-thermal interaction have the same form as the position-position interaction equation shown in Eq. 4 except that the second order H.. is replaced by the third and fourth order polynomials H^K and H-jiW respectively. Equation 4 represents the asymptotically limiting situation which is approached only when the entire reciprocal-lattice data set is inclu- ded. A more common experimental practice is spherical truncation of the data set at some fixed value of |t|. An exact solution for this general case with anisotropic temperature factors and spherical truncation is quite difficult, but some success has been achieved with empirical correct- ion factors applied to Eq. 4. If the temperature factor for each atom is isotropic and the truncation is spherical, the finite summation over t in 53

OCR for page 47
Eq. 1 can be replaced by an integration which has an anlytical solution involving Legendre polynomials and error functions. Equations 1 and 4, which are specialized for space group Pl, can be generalized to include any centrosymmetric space group by incorporating a double summation over the symmetry operation for the space group. The task of obtaining an analytical formulation for the noncentrosymmetric space groups is less straightforward because the first factor given in Table 1 (i.e., F£ (t)/cr {jo^t)]) is replaced by three terms. For example in the noncentrosymmetric space group Pl, the matrix element for a position position interaction may be written p i ° ° 2A B -i'| t/2>cos[(xm-xn)'t] A2 - B2 1tjexp(-t'F t/2)cos[(xn+xn)'t] (5) -t £ t/2)sin[(5iB+xn)'t]l where A and B are the real and imaginary parts of the structure factor and F " A + B « The problem is to predict the behavior of the factors (A§ - B^/Qo2^ + B*)] and A^/fa2^ + B2)] . Intuitively, it seems that the terms containing these factors may tend to integrate to zero if the entire reciprocal lattice is included and if the structure is a "random structure"; however, the conjecture has not been proven. The integral behavior of these terms for a real structure with a truncated data set seems rather unpredictable. Evaluation of the analytical matrix technique is underway. With favorable conditions (i.e., low crystallographic symmetry and extensive, but finite, diffraction data) the computing time required to form the matrix has been reduced by an order of magnitude. For cases where the symmetry is very high and the data collected are not extensive, there may be no saving of time. The principal testing of the procedure to date has been for an application rather different from least-squares refinement. We use the inverted analytical matrix to calculate the complete variance-covariance matrix for a published structure without computing structure factors or their derivatives. The only data needed to generate the analytical matrix are the structural parameters, a matrix scale factor, and a truncation parameter. The latter two parameters can usually be obtained quite 54

OCR for page 47
accurately from the standard deviations, which usually are published with the crystal-structure paper. In addition to the matrix approximations described above, there are also possibilities for saving computer time by utilizing some special redundancy properties of the full crystallographic least-squares matrix in space group Pl. The basic approach is simply to examine the equations for the elements in the matrix. An example is for a hypothetical structure displaying space group symmetry Pl, with two atoms (m and n) in the asym- metric unit. If we write out the equation for the 171 supposedly unique elements in the symmetric 18 by 18 matrix, ^, for positional and aniso- tropic thermal parameters, we quickly discover that considerable redun- dancy is present, only 103 elements actually being unique. The remaining 68 elements are simple multiples of other elements. For example, we find 1919 1199 1 •} 9 ^ 23 IT that L(b^2,bm 2) = 4L(b* ^bj 2) and Lt>n ) = 2L(b ,b ). For other centrosymmetric space groups, the redundant linear combinations are fewer and more complicated. 55

OCR for page 47
DISCUSSION Sparks: Concerning the calculation of correlation coefficients from position parameters and thermal parameters, I always thought that wouldn't work if you had a situation where you had refined a set of data having not only the termination problem you mention but also a lot of weak reflections that are just left out of the data set. It would seem to me this would tend, in some peculiar way, to bias the results. Johnson: The numerical agreement between the variances calculated from the regular inverted least-squares matix and those calculated from the inverted analytical matrix is usually quite good for any data set. The agreement for the covariances become much better if the data set is fairly extensive. Missing reflections may present a serious problem if the data set is quite sparse, but we have not examined this aspect numerically or theoretically. Templeton: This sounds like magic until you think about it, but there's a way of restating it which I think makes evident what you're doing. If you have a published structure, and use this to calculate structure factors, this is a data set more or less like the experimental data set, more to the structure's right and less to the structure's wrong. From that data set you select, depending on your knowledge, which reflections have been left out. For example commonly people say, "We observed 1600 reflections of which 400 were zero." If you simply chop off the 400 smallest ones then you would have a very good replica. One thing I noticed in your suggestion that one leave out matrix ele- ments not affected by temperature, it is not evident how you know which ones these are because they're not necessarily just the ones labeled by the subscripts of the parameter that is shifted. Johnson: I have to admit that I have not thought this through in detail and cannot at present describe an algorithm that would keep track of the major changes from cycle to cycle. Templeton: Then part of my next question is, how do you know which ones they are, because all of the derivatives include in them the structure factor? Johnson: But we have in this formulation eliminated the structure factor, but you're right, very good point. However, the structure factor was eliminated in the analytical formulation and perhaps, as an approxima- tion, the same reasoning could be extended to this approach. Hamilton: Would you predict that this may be the answer for people who are refining protein structures and that they should really be think- ing seriously about this method? 56

OCR for page 47
Johnson: I must admit I harbor the fond hope that the analytical approach might someday be applicable to protein refinement. Unfortunately, I cannot at present see how to handle the non-centrosymmetric problem properly. Luckily, you scheduled me in a session where I can discuss what should be done and not necessarily what can be done. I think it is certainly worthwhile to put some additional effort into this approach to see if it might be a feasible solution for proteins. Busing: You say this speeds up the computation of the matrix by perhaps a factor of 10. If the number of observations and parameters is very large, does this method become even more favorable? Johnson: The approximation improves as the number of observations increases and we are in best shape if the data set includes everything that can possibly be measured. In this case we also obtain our maximum time advantage. Additional parameters may also improve the time advantage because the sum over the Patterson vectors converges rapidly as a function of interatomic separation; consequently the long vectors can safely be omitted from the summation. 57

OCR for page 47
The Role of the Minicomputer in the Crystallography Laboratory Robert A. Sparks It is universally recognized that the minicomputer plays an important role in the crystallographic laboratory. The semi-automatic and auto- matically controlled diffractometers have offered welcome relief for crystallographers who previously had to measure large amounts of data manually. As with most computer-controlled instruments, early programs tended to be written to collect data in much the same way that the cry- stal lographer would have used to operate the instrument manually. Later programs have taken advantage of the flexibility of the computer to perform tasks that would have been virtually impossible to perform with the manual or semi-automatic diffractometer. Thus, it is now possible to: 1. Automatically center reflections. 2. Sample the profile for each reflection during data collection. 3. Choose different scan speeds dependent on the intensity of the reflections. 4. Search for peak maxima for all but the weakest reflections. 5. Measure reflections at many azimuthal angles about the dif- fraction vectors. 6. Measure regions of reciprocal space in a three-dimensional fashion to obtain diffuse-scattering information. 7. Automatically redetermine the crystal orientation if the crystal should move during data collection. 8. Obtain information about crystal quality and crystal symmetry. All of this can be done with a slow computer having a minimal amount of core (4000 words). For subsequent processing of the collected data a magnetic tape drive is desirable. Magnetic tape is chosen because it is an inexpensive means of storing large amounts of data in formats that are universally recognized by large and small computers. As collection methods become more complex one soon realizes that the limiting factor is the amount of core available. Thus, it is becoming fairly common for computer-controlled diffractometers to have more than 4000 words of core or to have a disk for program overlays. For reasons of economy, manufacturers of computer-controlled dif- fractometers have chosen the least expensive computers that can easily be interfaced to the many control and acquisition functions of the diffrac- tometer. Almost all commercial instruments are of the one instrument- one minicomputer type. Other configurations, such as several instruments- 66

OCR for page 47
one medium-size computer or one instrument-one minicomputer-communication link-large computer, have the possible advantage that more computer capa- bility becomes available for the diffractometer for at least part of the time. However, such approaches are expensive because they are almost always one of a kind. The advantage of the one instrument-one minicomputer is that the development cost, of which a large part is for software, can be spread over many identical instruments. Although the diffractometer experiments are slow, the inexpensive minicomputers used to control the diffractometers are not slow. The state of computer technology is such that computers with memory-cycle times of about one microsecond and execution times of one to three microseconds for most commands are no more expensive to build than computers that are one-half or one-tenth as fast. Therefore, the minicomputer is used to perform some calculations that do not use the diffractometer-control features of the computer. Thus, the minicomputer is used to determine indices for reflections, best least-squares unit-cell parameters, and Lorentz-polarization factors. All of these tasks can be performed by the large computer but are more conveniently done by the minicomputer. Some crystallographers have used the small computer for tasks more traditionally performed on the large computer. Thus, Eric Gabe uses the PDP-8 which controls his Picker diffractometer to do structure-factor calculations. Shiono for many years has used the IBM ll30 to do almost all types of crystallographic calculations. For the most part, however, crystallographic computations are done on the most powerful computer available. Why this is so is illustrated in the first two columns of Table 1, which compar_e the characteristics of the Nova 1200, used to control the Syntex Pi Autodiffractometer, with the characteristics of the CDC 6600, one of the most powerful computers used for crystallo- graphic calculations. Although core speeds are not very different, the CDC 6600 can achieve effective speeds of up to 100 nanoseconds because the memory has been divided into independent blocks of 4096 words each. In every other respect the CDC 6600 is a much more powerful computer. Because of the large core memory, all structure factor data and the nor- mal equations of a least-squares program can be resident. Because of the fast instruction registers, tight loops can be executed with no need to continually reference slow core. Because of the many arithmetic units and addressing and indexing registers, many operations can take place simultaneously. Not of least importance is the fact that CDC has an ex- cellent FORTRAN compiler which makes efficient use of all this sophisti- cated hardware. On the other hand, the minimal Nova 1200 configuration does not have enough core and is so slow that many of the important crystallographic programs would be virtually impossible to run for all but the smallest structures. I believe, however, that the most serious limitation is the unavailability of compilers of higher-level languages producing progams to minimize amount of core needed at run time. This deficiency has 67

OCR for page 47
meant that the crystallographer has not easily been able to tailor his data-collection programs to meet his requirements. The minicomputer industry however, is, advancing rapidly. The industry is extremely competitive and prices for all parts of the hard- ware are decreasing at a phenomenal rate. New innovations - for example, semiconductor memory - are introduced into minicomputers almost simul- taneously with introduction in large computers. Good compilers with full FORTRAN IV capability are available for computers with larger core memories (12 000 words or more for the NOVA computers). Disk operating systems that are as flexible and easy to use as those found on large computers are now arailable. Finally, fast floating-point hardware is now optional from some of the minicomputer manufacturers and also from several independent firms. The third column of Table 1 lists the characteristics of a system that would satisfy all or almost all of the crystallographer's computing needs. In addition to the basic 4000-word NOVA with a magnetic tape drive required for the PT Autodiffractometer, this system has an additional 12 000 words of core, 131 000-word fixed-head disc, and floating-point hardware. Software consists of FORTRAN IV and a Disc Operating System. Crystallographic data-processing programs would be written in FORTRAN IV. Programs would reside on magnetic tape reels and be loaded on to disc when needed. Large programs would consist of several overlays. Large arrays would also reside on disc and be brought in to core one sector at a time. Diffractometer programs would also be written in FORTRAN IV but using machine-language subroutines for driving the goniometer axes, reading the encoders and sealer, opening and closing the shutter, etc. Table 1 Comparison of Nova 1200 and GDC 6600 Nova 1200 with Structure Minimal Nova 1200 Determination Package CDC 6600 Magnetic Tape Drive many 1 1 Core Speed i.Ons 1.2 MS 1.2jis Word Size 60 bits 16 bits 16 bits Core Size 131 000 words 4000 words 16 000 words 131 000-word disk Table 1 continued 68

OCR for page 47
Table 1 Comparison of Nova 1200 and CDC 6600 CDC 6600 Minimal Nova 1200 Nova 1200 with Structure Determination Package Operand, Addressing, and Indexing Registers Fast Instruction Registers Floating Multiply Arithmetic Units FORTRAN IV & Operating System 24 4 8 (60 bits each) None (60 bits) 2 ms (32 bits) 10 Very Good No None 15.6 MS (32 bits) 24.2 jxs (64 bits) Good Almost all crystallographic programs could be run on such a system. It is hard to justify the cost of a plotter for the infrequent use crystallographers would make of it. Therefore, it would probably be most economical to generate plotting information on magnetic tape on this sys- tem and then have the actual plotting done at central facilities. Fourier maps would be generated on magnetic tape and either printed at central facilities or printed on the slow printer by the NOVA 1200. At ten char- acters per second a large Fourier map could take several hours to print. In many cases, good peak-picking programs exist that eliminate the need to print the maps. There is no question that such a system is feasible for almost all crystallographic calculations. The structure of vitamin B12 was solved and refined on a computer with a configuration closer to the basic NOVA 1200 than to the system proposed here. Indeed much of the philosophy of disc (or drum) usage and external plotting and printing of large files is identical to that used on the large computers of 5 and 10 years ago. Time-sharing of data collection and data processing presents prob- lems not associated with the amount of core or the arithmetic processing speed, but rather with the allocation of peripherals to the two tasks. Data collection must have the magnetic tape drive available for output of the intensity data. Therefore, production of a Fourier map could not be done simultaneously with data collection. However, Fourier calculations are quite fast (except for printing) and interrupting data collection for the few minutes necessary to generate the map and output it to magnetic 69

OCR for page 47
tape is not a serious limitation. Happily, the least-squares calculation which takes the bulk of time for structure determination requires the magnetic-tape drive only for the brief time necessary to dump all the data onto disc. After this, several operations could be performed with- out using the magnetic-tape drive and could be effectively overlapped with data collection. The proposed system in the crystallographer's laboratory is clearly more convenient than a centralized computing facility. It is, in most cases, also more economical. The inner loop of a least-squares program (namely the generation of the normal equations) was written in FORTRAN and executed on a number of different computers. The program is shown in Figure 1 and the results of the test in Table 2. The FORTRAN compilers that produce the most effi- cient code were used on the CDC 6600 and IBM 370/155. Because the floating- point hardware is fairly new for the NOVA machines, the FORTRAN compiler has not yet been modified to produce code for this feature. Reasonable substitutions were made in the assembly listing generated for the soft- ware floating-point version in order to produce the "FORTRAN-like" code. The hand-optimized version was an assembly language program written to be executed as efficiently as possible. If the matrix is large enough to require that it be stored on disc, the data-channel transfers would in- crease the NOVA 1200 times in this example by about 0.8. If 64-bit floating-point numbers are required, an increase of about 257o is required for the NOVA 1200 times. Table 2 Comparison of Time for Least-Squares Inner Loop time CDC 6600 0.93 s (60-bit words) IBM 370/155 7.5 s (32-bit words) HP 2100 A 150 s (32-bit words) (Hardware multiply/divide Software floating-point) HP 2100 A 29 s (32-bit words) (Hardware floating-point) Nova 800 (Software floating-point) 206 s (32-bit words) Table 2 continued 70

OCR for page 47
Table 2 Comparison of Time for Least-Squares Inner Loop time Nova 800 (Hardware floating-point) FORTRAN-like code generation 16.8 s (32-bit words) Hand-optimized code 13.2 s (32-bit words) Nova 1200 360 s (32-bit words) (Software floating-point) Nova 1200 (Hardware floating-point) FORTRAN-like code generation 24.2 s (32-bit words)* Hand-optimized code 17.5 s (32-bit words) Calculated from Nova 800 performance. Typically, we at Syntex use about one hour of CDC 6600 computer time for a structure with 40-50 non-hydrogen atoms in the asymmetric unit. If the FORTRAN test in Figure 1 is typical, the "FORTRAN-like" time on the NOVA 1200 would be 26 hours for this same structure. This amount of time is small compared to typical data collection times of one to two weeks. Even without overlap of data collection and data processing there would not be a serious deterioration of diffractometer usage. With simultaneous least-squares calculations and data collection, diffractometer servicing will be negligibly affected. N = 64 NREF = 100 M = N + 1 MM = M + 1 DO 6001 IP = 1, NREF DO 20 I = 1, M 20 DV(I) = I *IP*0.9 K = 1 DO 5001 J = 1, N B = DV (J) IF (B.NE.O) GO TO 5002 K = K + MM - J Figure 1 FORTRAN test program (continued) 71

OCR for page 47
GO TO 5001 5002 DO 5003 L = J, M A (K) = A (K) + DV (L) *B 5003 K = K + 1 5001 CONTINUE 6001 CONTINUE Figure 1 FORTRAN test program (continued) Even though the NOVA 1200 with floating point hardware is 26 times slower than the CDC 6600 and 3.2 times slower than the IBM 370/155, turn- around time will in many cases favor the dedicated computer because it is located in the crystallographer's laboratory. Another important feature of the small dedicated system compared to the large very fast computer is that it is impossible on the former system to find out one day that an error made by a student has exhausted the year's computer budget. Because of the above arguments, Syntex has decided to make avail- able to customers a Structure Determination Package which would consist of a 131 000-word fixed head disc, 12 000-word core, and floating point hardware for those who already have a Pl Autodiffractometer or AD-l Auto- donsitometer, and a stand-alone unit consisting of a NOVA 1200, a 131 000- word fixed head disc, 16 000-word core, floating point hardware, and a magnetic tape drive for those who do not have the Syntex instruments. Software will consist of a FORTRAN IV compiler modified to make efficient use of the floating point hardware, a Disc Operating System modified to allow time-sharing of data collection with data processing, machine lan- guage subroutines for the diffractometer, FORTRAN versions of the current diffractometer programs, and FORTRAN programs properly broken up into overlays for the basic crystallographic programs. The user will be able to add his own FORTRAN or assembly language programs to the library. At this early stage, it looks quite probably that the selling price would be $30 000 for hardware and software for the attachment to existing in- struments, and about $45 000 for the stand-alone option. First deliveries are scheduled for the second quarter of 1973. The comparison of the cost of the system proposed here compared with existing costs at centralized computing facilities is difficult to make. University computing centers may charge the scientist anywhere from nothing up to the actual cost of the computing service, depending on what other sources of funds are available to the centers. Commercial rates are set to provide a profit for the company providing the service, but are usually complex functions of CPU time, amount of core used, amount of 72

OCR for page 47
input and output, and job priority. In Palo Alto the Control Data Center provides the most economical service for crystallographic type problems. Syntex pays about $1000 per structure for their service. Clearly, for us, the break-even point would be 30 structures for the $30 000 attachment or 45 structures for the $45 000 stand-alone configuration. In conclusion, whether crystallographers would be inclined to buy the Syntex package or whether they would wish to buy directly from the computer manufacturers and provide their own software, I believe that serious consideration should be given to the small dedicated computer. Not only does it provide the desirable features of FORTRAN data-collection programs and the convenience of having one's own computer, but it also provides, in many cases, a substantial cost saving compared to the cen- tralized computer approach used by most crystallographers today. DISCUSSION Young: If you are doing full-matrix least-squares, what is the maximum number of parameters you can handle with this sort of adorned mini- system? Sparks: It turns out to be the same figure Jim Ibers quoted, 240, be- cause the disc size is 131 000 words. A good suggestion by Mike Murphy is that instead of using a fixed-head disc as we are here, we ought to be using a movable-head disc which costs quite a bit less for the amount of disc space that would be available. Then the capacity would be something like two million words. Young: If you put all those core packages and discs plus an extra arith- metic unit on the mini-computer, why do you bother with putting a diffractometer on it? Sparks: I've given you the choice. $45 000 or $30 000. Young: No. My point is that what you've done is build a separate com- puter system, and the fact that the diffractometer is hooked on is incidental. 73

OCR for page 47
Sparks: It does give some capability for the collection programs that we do not now have. A couple of years ago you made a strong point that we ought to be writing these collection programs in FORTRAN. Lowrey: Professor S. H. Bauer at Cornell University has an extensive system for electron diffraction that is built around the PDP-8 and he has made extensive use of cathode-ray-tube display. He is able to search his electron diffraction data and his radial distri- butions and look at very fine portions. With respect to Fourier maps, instead of having to print them out you can set up a graphic interaction display for picking out the things you want. Bauer is able to do a great deal of electron diffraction using solely the small computer. He considers the advantage is that not only is it cheap but it is under his direct control so that he can run all night and have a guarantee of getting his programs back, and not have the problems of priorities on commercial computing systems. Sparks: We also sell a three-dimensional display. Ibers: Two points might be kept in mind. (1) It is easier to get com- puting money in a grant than it is to get $45 000 to buy a small computer. (2) It may be possible to sneak small computers into laboratories throughout a campus by claiming that these computers are controlling experiments, but their presence makes computer cen- ter directors very nervous, for good reason. If the small computer proliferates throughout the campus you are in trouble. Suppose we have 20 computers of the type you have discussed. In effect a million dollars has been spent and it has not benefited the central computing facility at all. For the good of the university community it might have been more reasonable to put that million dollars into the central facility. In any event there are obviously political problems that are by no means negligible. Sparks: Yes. I am aware of this. My feeling is that the instrument ought to be treated as having a very special application. It is not by any means a general purpose computer. Fritchie: Do you have any idea what the annual maintenance costs are on this $45 000 system? Computer alone perhaps? Sparks: I do not have that figure. What is it on the diffractometer? Dewar: It will be around 77o. Coppens: What is the capacity of the system? In other words, how many crystallographers can it handle? Sparks: It depends on how productive those crystallographers are. It's 74

OCR for page 47
better to say, how many structures could you reasonably hope to do on a system like this. We think that for a 40-50 atom struc- ture it would take twenty-six hours for the structure determination. It certainly takes quite a bit longer to collect the data. So really, you are still limited by the amount of time it takes to collect the data. Coppens: So the system has over-capacity for one crystallographic group. Sparks: Yes, it has indeed. Corfield: I think this system is not totally unreasonable, but what makes it reasonable is the availability of inexpensive hardware floating- point arithmetic units. We've had at Ohio State University for the past two or three years a system rather more sophisticated than this but that does not have hardware floating-point arithmetic. Present- ly we do all our least-squares and all our Fourier summations in- house, but once we get up to a couple of hundred variables, it would be worth our while to use a larger computer because of the limita- tions of the software floating-point arithmetic on our in-house machine. Medrud: If this kind of approach is attractive to other crystallographers, there is another encouraging factor in the change in attitude of some of the minicomputer manufacturers. Our first contacts with them, with regard to our application, were met with disdain. The most recent contacts other people in our group have had with them indicate much more interest in systems development. They formerly wanted to hand you a computer and a bag of hardware for interfacing and say "go to it", but now they are willing to discuss a system comparable to yours. 75

OCR for page 47