Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
THE LIFE SCIENCES considerations are important. The computer becomes widely and success- fully used in a field only as the scientists in that field come to understand and use the computer. Otherwise application remains isolated and peripheral. Despite superficial similarity of tasks among various fields, each field proves to have its own unique problems in processing information. This is not to deny the importance of an autonomous computer science, or of attempts at interdisciplinary efforts (especially when they serve to seduce those with computer expertise ultimately to become life scientists!~. It does, however, establish the essential condition of successful assimila- tion and where the ultimate responsibility rests. 6. Computer facilities must specialize. The generality of the computer leads naturally to the view that a single computational facility can service all needs. The establishment of large university computation centers in the last decade reflects this philosophy in part. But so great is the diversity of computer use that, in fact, each computer facility, whether large or small, serves only a fraction of the whole range of needs. Every existing computer facility substantiates this notion. Time-sharing systems are not efficient for processing large numerical calculations; large systems for statistical calcula- tions cannot accommodate laboratory monitoring, and so on. The gen- erality of the computer is that it can be adapted to any symbolic task, not that it can be all things for all users simultaneously. THE STATE OF COMPUTER APPLICATION IN THE LIFE SCIENCES How quickly and thoroughly, and in what respects, computer application becomes effective in a given field will be determined by the interplay of the general facts cited above. Extent of Use Computing is widespread in the life sciences. Approximately one in three life scientists computes, and the total cost of their computing is in excess of $18 million a year. Unless otherwise specified, all data con- cerning the state of computing in the life sciences come from the census of individual life scientists conducted by the Committee and reflect use in academic year 1966-1967. See Appendix A, Individual Questionnaire, Questions 22-27. Information was gathered on the number of hours of usage by type of machine (A, B. C, D, using the classification of the
DIGITAL COMPUTERS IN THE LIFE SCIENCES Rosser reported. Hours can be converted (for some purposes) to their equivalents on a type B machine (e.g., an IBM 70901: 1 type B machine is equal to 0.25 type A, 4 type C, and 20 type D machines. 1 B-hour provides approximately 300 million basic operations (in practice divided between input, compiling, computing, editing, output, etc.) . The total hours of computer time reported was equivalent to 90,000 hours on type B machines. This amounts to approximately $18 million a year in rental. Extrapolation to the total computing population increases this amount by a factor of 1.5 to 2. Large users who own computers that are used 24 hours a day partially offset the cost extrapolation as they can purchase machine time at less than the standard rental. Unfortunately we do not have comparable figures for other fields. The Rosser report, which confined itself to academic institutions, estimated that for 1968 the physical sciences would require 90, engineering 20, and the life sciences 21 type B computers. For all biologists the estimate from our census approximates 32 type B computers, with 17 of them in uni- versities. A guess must be made as to how many B-hours equal a type B machine, as defined in the Rosser report. It must range between 2,000 and 7,000 hours (one to four shift operations); we chose 4,000 hours. Actual usage in the other fields undoubtedly also exceeded earlier estimates. The two-decade head start in the physical sciences still remains. In any case, we can be sure that biologists are doing large amounts of computing, and the life sciences are no longer to be viewed as "computationally undevel- oped country." As would have been anticipated, actual computer use is very unevenly distributed. This is illustrated by dividing the users into "light users" (less - than 10 B-hours per year), "medium users" (10 to 99 B-hours per year) and "heavy users" (more than 100 B-hours per year). As Figure 40 shows, 69 percent of life scientists do no computing at all; 21 percent are light users and do 5 percent of the computing in the aggregate; 8 percent are medium users, doing 25 percent of the computing; and the remaining 2 percent are heavy users, accounting for 70 percent of the computing. One of the consequences of this skewed distribution is to make average values somewhat meaningless, since there is no "central tendency.": The ; Digital Computer Needs in Universities and Colleges, A report of the Committee on Uses of Computers, NAS-NRC, Publ. 1233, National Academy of Sciences, Washington, D.C., 1966. ~ More precisely, if we imagine the present sample to be drawn from some under- lying continuous distribution that gives the probability of a person using x number of B-hours of computing, then it appears that this distribution does not have a finite mean value, i.e., that|xp(x)dx does not converge.
390 THE LIFE SCIENCES A) U' . _ .a) Nonusers Light Users t' <10 hr/yr u' o Medium Users 10-99 hr/yr 169 Light Users `,,<10 hr/yr 3 o IMedium Users 10-99 hr/yr 4 - o Heavy Users ~ 100 hr/yr 62,000 U) . _ o o ._ m Light Users <10 hr/yr Medium Users 10-99 hr/yr Heavy Users >100 hr/yr 378 FIGURE 40 Distribution of computation use. (Source: Survey of Individual Life Scientists, National Academy of Sciences Committee on Research in the Life Sciences. ) total amount of computing is markedly dominated by the few heavy users. The diversity underlying this distribution is impressive. Seemingly equal amounts of computing power are not equivalent at the small and large ends of the scale. The composite figures summarized above include the use of vastly different computers, differing in power and facility up to a factor of 100. Our sample of almost 4,000 computer users claimed 14,000 hours of type A computer (like an IBM 360/65 or a CDC 6600),23,000 hours of type B computer (like an IBM 7090), 19,000 hours of type C computer
DIGITAL COMPUTERS IN THE LIFE SCIENCES 391 (like an IBM 360/40 or an SDS 940), and 100,000 hours of type D com- puter (like a DEC PDP-8 or an IBM 11301. This last category is especially noteworthy because it shows how widespread the use of small laboratory computers has become in the life sciences. Although accounting only for a small fraction of the total computing power (about 5,000 effective B-hours, or 5 percent of the total), it accounts for 60 percent of the "con- tact hours." Much of this form of computer use, as in on-line data acquisi- tion, cannot meaningfully be exchanged for hours on larger machines. Examining the data from our sample, diversity of arrangement meets one at every turn. The amount of actual type D computer use in effective time (equivalent to 5,000 B-hours) is of the order of the amount of computing time used by all light users (4,500 B-hours). However, light users utilized all types of machines; thus, approximately 40 percent of their computing was done on type A machines and 20 percent each on types B. C, and D. Conversely, approximately half of all time on the small computers (type D) was occupied by individuals who were heavy users. Types of Use Some information is available concerning the types of use of computers by broad functional categories. Almost all users performed some data analysis (93 percent), but half (47 percent) also did some other type of comput- ing, and a substantial number ( 17 percent) engaged in several types. Table 57 shows the distribution of hours and number of scientists for each TABLE 57 Tasks for Which Computers Were Used PERCENTAGE APPLICATION PERCENTAGE OF COMPUTING OF LIFE SCIENTISTS (B-HOURS) ALL USES 100 100 Data Analysis Only 54 21 Information Storage and Retrieval 16 7 Data Acquisition 3 3 On-line Experimental Control 1 1 Simulation 3 2 Theoretical Analysis 6 3 Multiple Uses 17 63 Source: Survey of Individual Life Scientists. National Academy of Sciences Committee on Research in the Life Sciences.
o loo Oa) to N o `,, 2 In o o o C~ . s . C) V: _ _ 5~ ~ .m .° ' ~ <: ~ o ~ 4. _ ~ - o . .4. n os . ~: a ~ a . ~ J ~ , . _ O Q x L~ ~o 4, o ,CO o . _ - . _ cn ~ tn . _ . _ a~ >\ o a ~ 6 1 a Q ._ ~ . ~ ~ ~ ~ . ~ ~n ~n 41) ~ .- ~ ~ 15 b0 Q) i ~ I .~ U' ~o ~: o ^ C~ U~ C) C) .O V) . ~ o ._ - :, ._ C) .r, ·= C~ C~ C) 3 ~ o CC C) _ ._ O ~ o c) . - o v) ~ o c~ ct ct L~ _ ~ o . - ~ -
DIGITAL COMPUTERS IN THE LIFE SCIENCES functional type. The indicated categories, other than 'Data Analysis Only," imply that both data analysis and the specified activity were conducted, but nothing else. The "Multiple Uses" category indicates that data analysis and at least two other types of computation were reported. The table shows that most of the computing hours are used by those in the "Multiple Uses" category, resecting the fact that most heavy users use computers in several ways. Surprisingly, the light users were not much more likely to engage only in data analysis than was the population as a whole (59 to 52 percent). Other similarities between the three user cate- gories are illustrated in Figure 41. Likewise, the machines themselves are not specialized for specific types of computing. As can be seen from Figure 42, only a few general aspects show through, e.g., type D computers are used relatively heavily for experimental control (which seems naturals, and type A computers are used somewhat less than others for storage and retrieval. The reason for this is not so clear. However, it was unlikely that our gross categories could reveal the specialization of particular facilities to do particular jobs. This description of functional types of computer use, based on answers to our questionnaire, is rather abstract and fails to describe the remarkable diversity of computer use in the life sciences. Only a few uses can be noted here. Included are very large numerical calculations, such as the processing of statistical data or deciphering the structure of an organic molecule from crystallographic data. Small data analysis may include relatively simple routine calculations from instrumental analysis, the strik- ing of a nutritional balance, or the calculation of relatively simple reaction rates. The widest possible variety of functions is found within the category of "data acquisition." This may be an experiment in which electrical signals are obtained and converted to digital records for later processing, possibly with concurrent display to check whether a good record has been obtained. It could mean equally well the use of a currently existing system for gather- ing data about the feeding and milk-producing behavior of cows from all over the country, which (along with their genealogy) permits the evalua- tion of both feeding plans and the worth of bulls. Simulation could mean a study of enzyme kinetics or a study of the life cycle of a salmon. The relatively small amount of effort indicated for on-line experimental control is a reflection of the relative recency of the practicality of such exercises. The use of small laboratory computers to "run the show" is, qualitatively, a completely different use of computers than all the other categories. The variation is such that no general pattern emerges, and the variety will surely expand in response to future demands for information-process- ing tasks.
394 THE LIFE SCIENCES 50 40 a) Q ~ 30 a) ~4 a, 20 10 O o 10 20 In o 30 I a) ma <t, 40 50 60 70 = _ _1 _ ]~ _ AL l , . , I , ................. Data Information Data On-Line . Theoretical Multiple Analysis Storage and Ex erimental Simulation Only R t. I Acquisition PC t I Analysis Uses a_ , ~ ~ l Type D Machine IType C Machine Type B Machine Type A Machine ~31 ~ ~ ~ ~ ~ . _ _ -.,. ~ ~' ~' _ 1 ~ . ~ _~ ~ FIGURE 42 Tasks for which different sizes of computers were used by questionnaire respondents. (Source: Survey of Individual Life Scientists, National Academy of Sciences Committee on Re- search in the Life Sciences.)
DIGITAL COMPUTERS IN THE LIFE SCIENCES 395 Computer Use in Research Areas of the Life Sciences Within each research area there occurs the same skewed distribution of computing, with many small users shading down to a very few heavy users. The more investigators active in a research area, the more computing they do; beyond that, no generalizations emerge. It is not useful to compute an average number of hours per scientist. Figure 43 displays this rather curious situation. This figure shows, for our sample, the number of com- puter scientists in a research area versus the amount of computing done by that area. Although it clearly rises linearly as the number of scientists who compute increases, the points become widely scattered. In terms of a computed mean value these "wild" points almost completely determine the slope, due to the small number of heavy users in each subfield. Our survey shows that, for all research areas, the light users consume about 2.4 B-hours per year, the medium users about 30.6, and the heavy users about 295. Six people (0.2 percent of the population) indicated they used more than 1,000 B-hours of computing time per year. These "super- heavy" users averaged 2,390 B-hours per year apiece. Further, for any one research area, the percentage of light users is about 67, the percentage of medium users about 27, and the percentage of heavy users about 5. As consumers or computing power, all subareas of the life sciences appear much the same. The deviations among areas do not appear to have any meaning.* Notwithstanding the uniformity of the distribution, there is a large variation in the amount of computing, depending on the exact behavior of the few heavy users. However, because these were truly very few, and subject to a large sampling bias, their exact values are not meaningful. Hence the data suggest that all research areas of the life sciences are engaged in computing. There is no specialized subarea that is the "com- puting part" of the life sciences. Surely this reflects the considerable gen- eral advancement of the life sciences in making use of this major tool of modern research. The Growth of Computer Usage The movement into the use of the computer has been rapid and recent. Figure 44A shows the percentages of our sample that had been using the : More precisely, the deviations appear to be due entirely to sampling variation. The deviations from the mean for each area are completely uncorrelated at the four levels of use. Furthermore, much of the seeming scatter results from the data for research areas involving relatively small numbers of scientists, so unusual behavior by a few scientists seems to create a large deviation. /
396 THE LIFE SCIENCES 1 1,000 10,000 9,000 8,000 7,000 o 6,000 I m i_ 5,000 4,000 3,000 2,000 1,000 o . . . .. . . . . . . . .... . . . . . . 100 200 300 Number of Computing Scientists in Research Area (Answering Both Questions) 400 FIGURE 43 Number of computing scientists versus hours of computing, by field. (Source: Survey of Individual Life Scientists, National Academy of Sciences Com- mittee on Research in the Life Sciences.) computer for various lengths of time; five years or less, six to ten years, etc. Seventy-seven percent of current users had begun their use of computers within the last five years, and 19 percent have been computing six to ten years. Thus four times as many biologists began computing within the most
DIGITAL COMPUTERS IN THE LIFE SCIENCES recent five-year period as had begun during the previous five-year interval. When due account is taken of the steep curve, this is an entry rate of almost 20 percent per year. This certainly cannot continue. Eventually the rate must approach the growth rate of the life scientist population (currently about 8 percent). By now (early 1970), the number of life scientists com- puting is already between 40 and 50 percent, instead of the 30 percent reported by our sample in 1966-1967. At such time as the percentage approaches 50, it is likely that the rate of growth will have slowed markedly. Figure 44B shows the years-of-use curve for the two areas in which approxi- mately 50 percent of the computing scientists commenced their computing within the last five years. These are genetics and nutrition, which have identical year-of-use distributions. Their curves have already begun to "bend over," and their entry rate is about 10 percent per year. Table 58 illustrates that, at least for our sample of biologists, the asser- tion that "computing is a young man's game" is not so. An examination of the age distributions of all biologists whether they compute or not, and of those who compute shows that they are essentially the same. Also, all ages, whether light, medium, or heavy users, do their equivalent share of computing. Furthermore, as seen in Table 59, the distribution of com- puting effort (percentage B-hours) for biologists in each age group is pro- portionate to the number of individuals in that group. This proportionality holds for the three types of users-light, medium, and heavy. Table 60 TABLE 58 Age Distribution of All Biologists versus Com- puting Biologists 397 TABLE 59 Age Distribution of Computing Biologists versus Extent of Computing PERCENTAGE DISTRIBUTION OF AGE PERCENTAGE COMPUTING TIME IN B-HOURS GROUP AGE (YEARS) Biologists GROUP AllLight MediumHeavy All. Who (YEARS) BiologistsUsers UsersUsers Biologists Compute Who( <10 (10-99( >100 Computehours) hours)hours) ALL AGES 100 100 ALL AGES 100100 100100 <30 4 3 30-39 39 39 ~ 30 14 31 40-49 37 38 30-39 3741 3637 50-59 16 17 40-49 4139 4340 > 60 4 3 50-59 1614 1715 >60 52 17 Source: Survey of Individual Life Scien- tists, National Academy of Sciences Committee on Research in the Life Sci- ences. Source: Survey of Individual Life Scientists, National Academy of Sciences Committee on Research in the Life Sciences.
m ' ~ _ to 1 1 1 1 1 0 0 0 0 0 0 ID ~A) N ~ f --it At_ o o o Do ~ Su!~ndwo~ s~S!8olo!8 10 98elUa0~9d . . _ . . . . . . , . . . . . . . o o o o o o o CM Ou!~ndwo~ SIS!9olo!9 10 aPelU90~9d o . - . so Ct cn . on ~ o ~ o ~ . Cal o c'5 Cal on ~ .= ~ o Cal . V) .
DIGITAL COMPUTERS IN THE LIFE SCIENCES TABLE 60 Shift of Percentage Distribution of Computing (Percent B- Hours) with Years of Computing Experience YEARS OF COMPUTING LIGHT MEDIUMHEAVY EXPERIENCE USERS USERSUSERS TOTAL 100 100100 <5 79 7263 6-10 18 2331 11-15 2 45 16-20 1 11 Source: Survey of Individual Life Scientists, National Academy of Sciences Committee on Research in the Life Sciences. shows that as they become more experienced, computer users tend to shift into the higher use categories, a trend quite in keeping with expectations.* However, research areas do differ in the percentage of their active scien- tists who compute (Figure 451. Here percentage participation for each field is plotted horizontally, and the percentage of these computing scientists who have used computers five years or less is plotted vertically. Thus genetics has the highest participation, with 49 percent of the field com- puting; and morphology has the lowest participation, with 18 percent. Hence, variation between different biological fields is considerable.: Fur- thermore, a field with very little participation should contain a high pro- portion of individuals just making the acquaintance of the computer. The regular decreasing sequence of Figure 45 clearly shows such a relationship. Extrapolation of the data to 100 percent participation indicates that about 30 percent of the users commenced computing within the last five years. This is equivalent to an approximate annual growth rate for computing participation of 6 percent. Such a growth rate is in tolerable agreement with the growth rate of biology as a whole. Hence it is plausible (though hardly conclusive from the evidence) to view all subareas as migrating down the curve of Figure 45, reinforcing the impression that all areas of biology are assimilating the computer. Their rate of assimilation differs only be- cause of the point in time in which they commenced computing; their rate depends upon their position on the curve. However, the magnitude of this effect is not enough to help predict the amount of computing used in a research area by knowing its age distribution, even though the average number of B-hours per scientist is about 25 for a new user (less than five years) and about 50 for an old user (greater than five years). ~ Again, there is no correlation with how much computing is done per scientist.
400 THE LIFE SCIENCES 100 80 n In a) o En ~60 ID ._ Q E o C: 40 as a) 20 o Percentage of Field Computing 0 20 40 60 80 100 - FIGURE 45 Percentage of biologists computing in a field versus percentage of recent entries to computing within that field. (Source: Survey of Individual Life Scientists, National Academy of Sciences Committee on Research in the Life Sciences. ) Institutional Arrangements for Computer Use It remains only to look briefly at institutional arrangements. Here again, the main impression, as seen in Table 61, is diversity; all arrangements are used heavily for multiple purposes. The table shows the percentages of A, B. C, and D type computer hours used by researchers using three major types of computation facilities: (1) laboratories that own their own com
DIGITAL COMPUTERS IN THE LIFE SCIENCES 401 TABLE 6 1 Facilities Percentage of Computing Done Using Different Types of Computer COMPUTER SIZE SOURCE OF COMPUTER All Sizes A B C EXTENT OF USE D Light Medium Heavy ALL SOURCES 100 100 100 100 100 100 100 100 Investigator's Laboratory 31 23 47 29 46 20 25 36 Life Sciences Computing Center 27 3 1 18 26 28 17 22 Other (Including University Computing Center) 42 46 35 45 26 63 53 34 Source: Survey of Individual Life Scientists, National Academy of Sciences Committee on Sciences. puters, (2) laboratories that use a life sciences computing center, and (3) laboratories that use some other computation center, usually a uni- versity center. Table 61 also shows corresponding percentages for light, medium, and heavy users. In this table it is possible to verify some facts that one might have expected. Thus, most type D computers belong to the scientists' own laboratories; but much computing by light users is done at university computation centers. The overall impression is one of multiple arrangements. Funding of Computer Use Table 62 shows the sources of funding for computing in the various re- search areas of biology. Overall, 42 percent of the support came from research grants to individual scientists; 29 percent from federal funds specifically allocated for life sciences computing, 9 percent from non-life- sciences funds (e.g., a university's own computing budget), 11 percent from other funds (e.g., state life sciences funds), and 9 percent from support whose source was unknown to the individual scientist. Given that about 75 percent of research grants are also provided by the federal government and that some fraction of the non-life-sciences funds and funds of unknown source undoubtedly is funded from National Science Foundation computer- facility grants to universities, it is clear that the federal government supports the great bulk of computation. Furthermore, Table 62 reveals that differ- ent areas of biology meet their computing costs in different ways. For Research in the Life