The Transmission of Scientific Information: A User’s Analysis
As a contribution to the present Conference this paper may seem out of place. It does not set out the results of any enquiry on the uses of scientific information nor does it confine its interest to the specific theme of the Conference on the Storage and Retrieval of Scientific Information. I understand very well why the Conference had to be so limited, for the subject of scientific information as a whole, which was dealt with at the Royal Society Conference ten years ago, has now grown so large that it can be attacked only piecemeal. Nevertheless, I hope that my contribution may still be acceptable because what I have tried to do is to present, from the point of view of a user of scientific information, the particular aspects of storage and retrieval (or what might be called the memory function of scientific information) on the background of the whole problem of communication between scientists, technologists, and the interested public.
It is perfectly admissible, indeed often necessary, to concentrate on the problems of storage and retrieval as forming part of a closed field. We may take the pieces of scientific information to be put into and taken out of store as simply given by the working of the rest of the scientific machine. The problem then is simply how to handle them most quickly, accurately, and economically by human or mechanical methods. This, no doubt, will be done in the fourth, fifth, and sixth sections of the Conference. There is, however, a danger, and necessarily a growing one, that the service of scientific information will develop as an activity entirely in its own right, ever more and more competent to take in, store, and hand out information regardless as to whether this information is superfluous, inaccurate, or unwanted. Fortunately the organisers of the Conference did stipulate that this activity had as its ultimate aim the service of the user and, therefore, it is legitimate to ask not only how the information is to be dealt with but also what that information is, who is it intended for, and to what degree does the process of transmission of information help in the advancement and use of science.
J.D.BERNAL Professor of Physics, Birbeck College, University of London, London.
Now I must admit at the outset that my views on this matter may be biassed because although I have been definitely a user of scientific information for the better part of forty years, I am not a representative user—though there may be no such creature—and not even a model user of scientific information. For it is evident that those engaged largely in fundamental research represent a small minority, possibly as small as five per cent, of the users, though, owing to the fact that they are obliged to read more individually, they may represent as much as twenty per cent of the actual uses of scientific information. Later I will propose statistical enquiries on the composition of the population of users and uses of scientific information. (See also Appendix.)
Even if my contribution is taken as representative of fundamental research users—and strictly it can be so only for crystallographers and some allied branches of chemistry, physics, and biology—it needs must express personal opinions and not objective statistical facts about such users. Here at the outset I would defend this position because I consider for various reasons, some of which I will touch on later, that there are fields of enquiry of such complexity, variability, and novelty that verbal and qualitative analysis should precede numerical analysis whether objective or by questionnaire.
I am here proposing only to indicate the nature of the enquiry, not to answer questions that might be raised in it. My concrete contribution will be to propose a series of enquiries all tending to elicit the user aspects of the problems of storage and retrieval of scientific information. (These will be described as they arise in the text and listed together in the Appendix.)
My main reason for presenting this paper at such a Conference is that I believe that the whole subject of transmission of scientific information needs an analysis of a descriptive or natural historical kind before we can hope to find the right figures to look for or the right questions to ask. This is not only to ensure that the answers we get are significant, statistically or otherwise, but also to determine whether the answers that prove to be significant and true are really relevant to the total situation that we hope to understand and control, namely an improved flow of scientific information.
The main reason behind this implied criticism is that if the matter be treated as one of operational research, it follows that all enquiries as to present uses of scientific information services, though a necessary background, can by themselves tell us little of use for improving the service. They tell us what people do with an admittedly very imperfect service, not what they would do with a better one (which would naturally include proper training for its users). A certain amount could be learned by a comparison between different systems in use, and some lessons from this quarter may emerge from our Conference, but we cannot hope to learn much until it becomes possible to carry out trials
involving considerable variations under strictly comparable conditions.
The essential difficulty is that, though the user may well know what he wants from an information service, he is in no position to know what he needs from it, namely what variation in the system would help most to further his work. Consequently, any action based on analysis of present user habits is unlikely to produce impressive results.
What I will try to do in this paper is first to give my own natural historical account of the whole process of the transmission of scientific information, indicating in the course of this what further enquiries or actions, such as analysis, would seem to be called for. I do not attach as much importance to this latter aspect as I do to the former as I am more concerned with stimulating discussion than with the carrying out of actions which will in any case require much deeper consideration than I can give them before they are likely to be useful.
It is difficult, especially in these days, to visualise or describe the process of transmission of scientific information as a whole. I tend to see it as a complicated irrigation system which is continually fed from many sources and in which the individual plants (the users) depend on what reaches them at any given time. Ideally, each should receive just the right amount of water at the right time but in practice, owing to the sluggishness and irregularity of the system, some never reaches the plants in time, and much of it evaporates or runs into the ground on the way. At other times the flow of water is so abundant that the plants are waterlogged and cannot absorb what they need. The simile is too crude, for it misses two essential features of scientific communication: first, that the receivers are in their turn sources of information, and secondly, that it is not generalised but highly specialised information that is wanted. However, it may serve to bring out two related defects of the present communication system, its viscosity and its wastage. In general the path from the provider of a piece of information to any one of his recipients is so long that the latter gets it too late to obtain the full value from it, often too late to be of any value at all, for scientific information is a particularly perishable commodity; further, the longer the time the greater the chance that he does not get it at all. This is the factor of relative wastage of information which prevents everybody capable of profiting from a piece of information from getting it. More serious, though also still unmeasured, is absolute wastage in which nobody capable of profiting from a piece of information gets it.
If this were all, then science as a whole would gradually settle down to a slow rate of progress determined by the amount of information that managed to get through. Actually science is advancing very rapidly because there are other sources of information which can be found all along the line from examining nature itself by experiment. A working scientist or technologist needs
information. He has the choice of getting that information through information services, or of finding it by experiment, or of working it out for himself. In that way the same fact, or method, more or less, may be rediscovered many times. The detailed history of science is so inadequate that we have no measure of how often this occurs, but that it does occur I am certain, for I have myself both discovered experimentally several things already known and had my own published work rediscovered experimentally by others (Appendix, Item 10).
The reason I stress this waste of effort and knowledge now is because in default of any reform of the system of communication, it is bound to grow with the growth of science. The sluggishness of the system is, however, in my opinion, more serious. Information, even vitally needed information, takes months or years to reach those most in need of it. We should reflect on our common experience when visiting our colleagues’ laboratories, even in the same country. It is rarely that we do not learn something of importance to us that we did not know and as often impart a useful piece of information to our hosts; and this is usually already published information. In principle, therefore, neither of us has an excuse; if we had read everything, we would have known the facts already. But now nobody does read everything, and indeed nobody could, even if he did nothing else. The basic fact remains, the amount to be read increases exponentially, and the time anyone has for reading it remains the same; therefore a smaller proportion of what is written is read by any one person.
One escape from this is through increasing specialisation. Reading surveys show (1) that one scientist can keep up with the work of some two hundred others in active production, but a field which contains only two hundred workers is necessarily a very narrow one. Now this may not matter if the field in question is a newly developed break-through in the front of ignorance such as “subgenetic analysis of bacterial viruses,” but these can represent only a small fraction of specialist fields which are far more often pedestrian collections of knowledge or skills. Here what science loses by such enforced specialisation is the cross fertilisation of ideas from different fields such as lead to all great discoveries.
Another way round the difficulty is to limit reading not so much by specialised fields as arbitrarily by reading only a small selection of journals. Dr. Urquhart (2) has shown that out of 9100 periodicals taken by the Science Library in London, 4300 were not consulted at all in a given year. Now it is difficult to believe that nothing of interest to the 87,000 readers at the Science Library was to be found in these 4300 periodicals. If so, the sooner they cease publication the better. Rather it would seem that these journals were simply unfamiliar to the readers and inadequately abstracted.
Still another way is to recognise frankly that not everything relevant can be read and to dip into the literature almost at random, qualified by hints given by colleagues or information officers. Thus at least the reader can hope to get a statistical view of the current state of knowledge.
Finally there is the wise and instructed reader (I wonder what proportion of scientists fall into this category) who reads a half-dozen journals regularly, another half-dozen annual or quarterly reports in different fields, and keeps up with the rest of scientific information through the systematic reading and indexing of abstracts (Appendix, Item 6). It is just now that this last task is itself becoming too time-consuming and forcing even the conscientious reader towards specialist or eclectic courses.
Now to me it seems evident that it is here that the information services should step in and so process the raw information that each user will receive just so much and no more than he can cope with. It should at the same time give him the necessary breadth of information and answer the specific questions that he has been able to formulate, even if only vaguely.
Before we can see how this might best be done, I feel it would be worth while looking more closely both into the users and the uses of scientific and technical information. I must include technical information here because nowadays no scientist can do his work well without technical information and vice versa. Now from the point of view of the kind of information services required, users can be divided into the following categories which are strictly functional, for the same man can at different times belong to any of these categories:
Workers in fundamental research.
Workers in applied research or development, including medical and agricultural research.
Technologists, including engineers, architects, medical practitioners, and agriculturalists.
Writers of reports, textbooks, teachers, students, etc.
Scientific and technical journalists.
The interested public.
Historians of science.
These categories differ very much in their relative numbers and in the degree to which they make use of information services. It should be one of the first tasks of a systematic study of such services to get some estimates of these quantities as a whole, broken down into subject fields (Appendix, Items 1 and 2). I should be prepared to guess that categories (b) and (c) are predominant, and of these (b) will be the most important for, though there are far more technologists, in the sense given above, than workers in applied research, the
latter need to use far more information. I should suspect, however, that though the information wanted by technologists is less in total quantity, it is of a kind more difficult to find—where it exists at all—and gives a disproportionate amount of work to the retrieval service.
Category (a), the fundamental research workers, probably make the most use of information, but they are relatively few in number and, as I have already estimated, probably account for less than twenty per cent of the load on the service. Category (d), which includes most academic scientists at some stages of their careers, is not large numerically, but their special requirement for thorough coverage of a particular field subjects information services to a serious and probably salutary strain. However, all such workers are not merely users of the information services but also contributors to it—they should effectively give back to it as much, or more, than they take out. Categories (e) and (f) require more personal attention than mechanical retrieving services. As to (g), historians of science, retrieval of information is part of their professional skill. I believe, moreover, that our information services and particularly our libraries effectively provide them with far more material than they need or indeed want. For, as the Urquhart (2) study shows, the demand for periodicals more than ten years old is small, and that for more than thirty years old, negligible. Consequently, much would be gained by clearing shelves of these and leaving them to central libraries or archives.
The question of the users of scientific information must be distinguished from that of the uses which are made or hoped to be made of this information when obtained. This question is an extremely complex one and can be approached in a number of different ways, each of which has some bearing on the performance of information services. We may distinguish between an analysis based on the nature of the material used and one based on the activity of the user.
In the first mode we can roughly divide the information sought into (a) data to be used in practice or incorporated in research; (b) procedures, techniques and methods, including descriptions of apparatus; (c) conceptual frameworks, theories or ideas—these are not only presented to be used but also to provide inspiration: positively by extending them or negatively by criticising them. To this may be added the most elusive category of transmitted information, namely absence of information, the suggestion which does not arise of itself, of gaps in knowledge, often the most fruitful impetus to new work.
Now it is evident that these categories offer very different problems to the information transmission mechanism. The data are far the simplest, and it is natural to think of them as the only material to be handled by information services. Here all that seems to be required is that they should be as accurate,
as fresh, and as complete as possible. Data, indeed, are the only types of information which can be safely detached from their original sources. The transmission of techniques is much more difficult and deserves very special consideration. It is a field in which abstracts and reports are almost necessarily inadequate. At the moment the difficulty and slowness in the transmission of techniques furnishes in itself adequate justification for visits and exchanges of scientists between laboratories.
It is the last category, that of theories and ideas, which is the most difficult of all to deal with by any means other than through original papers or personal contact. Few people are capable of understanding a genuinely new idea or theory. Very few, possibly only one or two, are capable of profiting by it. Here the essential function of the information service is to see that these people hear about it. This is where reports are more valuable than abstracts. Naturally, this difficulty does not occur at the growing points of science where there is the greatest proliferation of new ideas and all the workers are alive to their possibilities and anxious to catch at them. It is rather at the boundaries and dead ends of fields of research that ideas are apt to be lost, especially as the first germ of a new theory is bound to be, or at least seems to be, of a crazy character.
If we turn now to the second mode, the activities of the user in his capacity of wanting information, we may break this down first into general and special uses. The first is the task of keeping up with knowledge in general over the whole field of science, with a greater and greater concentration on the particular field of interest. This need is met for the outer field by the service of books and science magazines, less or more specialised, and in the central field, by half a dozen or less journals. The service is by no means perfect, but short of reorganising the whole of scientific publications, which will have to be done some time, there is little that can be done about it now by the information services as such. Indeed, the very chaos of present-day publications and the unpredictability of their contents automatically ensures a more or less random sampling of the field of science by the average scientific user. This might be systematised by a reader picking and reading through one paper in ten in his own field, one in a hundred in neighbouring fields, and one in ten thousand in the rest of science, at least of those parts whose language he could read. I doubt whether for most people this would be any better than the casual picking up of papers that goes on now. Far more important would be an improved, graded, up-to-date and not too abundant service of reports of progress in the various branches of science.
The scientific and technical user, besides these general or background requirements, has need of specific pieces of information. This need has been the raison d’être of special information services beyond the scope of the dis-
tribution of periodicals and the functions of libraries. This special information the user hopes to incorporate in his own work, or at least to have the chance of so incorporating it. For the same reasons he wants it quickly, and he wants to be able to rely on it or at least to know the evidence by which he is expected to rely on it.
It is at this point that the needs of different kinds of users diverge, though there is still much overlapping. The fundamental research worker and, to a lesser degree, the applied research worker, require scientific information as a central and essential part of their work. Their need for it, indeed, is greater than for the new results which they are themselves getting out. For inevitably, except in completely new fields, all they can hope to add is a small part to an already considerable edifice.
However, this problem of acquiring what might be called central information is not one which calls much on information services except indirectly. For the research worker would not, in general, be doing the work he does unless he already had, by his own efforts or through those working with him in the same project, mastered existing sources of information in his field and knew where to look for new results as they come out.
The exception is when he is entering the field for the first time, not so much for the student in his first research, where he can usually count on guidance, but for the mature worker who wants to branch out into a new line. This is fairly infrequent in fundamental research but extremely common in applied research where projects are often embarked on in fields quite unfamiliar to the research teams. The lack of provision for what might be called pilot information is one which I have often felt myself. I am unfamiliar with any steps that are taken to meet it, but I imagine that something of the kind is a regular practice in the larger research and development departments of industry and government.
It is probable that the main call on information services—though this point deserves to be tested—is not in respect of pieces of information that are central to the work of the enquirer but rather ancillary to it. The working technologist is generally occupied in work for which he already has the essential knowledge, whereas the research worker, as already indicated, has already his own means of finding the information in his special field of study.
Where technicians and scientists alike need information is in aspects of their work in fields with which they are not necessarily familiar. This is particularly true of apparatus, methods, and the properties of substances, where advances in one field could be utilised by workers in others as soon as they can find out about them. Here the initial difficulty is that the enquirer is likely to know only vaguely what he wants, and most of those who need the information do
not even know it exists and consequently never get as far as enquiring. The limited success of Dr. Urquhart’s “Unanswered Questions” scheme might make it appear that the need here is not great. However, I am convinced, particularly through my visits to laboratories, that here is a case where real needs greatly exceed felt wants.
Some years ago the British National Physics Laboratory sent round some of its expert staff to laboratories of research associations covering different aspects of applied science, such as rubber, flour milling, pottery, and leather. They found that, while in their own fields the knowledge of equipment in these establishments was well ahead, this was not the case where it lay outside their special field of competence. Here, equipment was often devised with great ingenuity to answer questions as they arose without the knowledge that similar equipment had already been developed in some quite different field of science. Given adequate information, it could have been used with slight adaptations and saved much trouble and time.
Cases of this sort must be far more frequent than we know and emphasise the necessity, particularly in applied science, for a positive information service that provides needed information unasked. I know that such services are provided in some of the larger research laboratories, but what we should seek is to spread them over the whole field, particularly to the smaller, two- or three-man laboratories so common in industry and in the fields of agriculture and public health.
So far in discussing users and uses of scientific and technical information I have treated all kinds of scientific information indifferently, without reference to subject divisions. This was necessary in order to compress this study within manageable bounds. However, I am fully aware that in any practical scheme for improving the flow of scientific information, the adaptation of the method to the field is perhaps the first requirement. Indeed, as information services have grown up, over the years, largely independently in different fields, they have developed different and even divergent methods of dealing with their problems.
I realise, for instance, that my own experiences, lying in the field of the mathematical-mechanical-physical sciences where much depends on the discovery and application of relatively few principles, are biased and do not adequately take account of the needs of the biological-geological descriptive sciences where the problem is to find the relations of vast numbers of originally unconnected facts. The problems of storage and retrieval are probably much greater in these fields. This is not only on account of their extensiveness in material but also on account of their much wider time range.
that the effective life of a piece of scientific information in the different fields of science is vastly different. The true half-life of a particular piece of information can be defined as the time after publication up to which half the uses (references) or enquiries about the piece of information are made. This is naturally extremely difficult to evaluate, though it would be well worth doing. Instead we are obliged to use what might be called the back half-life of a group of similar pieces of information—papers in a given journal for instance. This can be defined as the time counted back from a given date within which half the requests for, or references to, information have occurred. This period is about two years for physics and fifteen for biology.
This in itself indicates that the storage and retrieval apparatus in biology will always have to be much greater than in physics. It also indicates that it is quite possible in view of the ephemeral nature of information in physics that there would be some room here for economy in storage, and this might balance the need for greater speed in retrieval. In general my plea here is that in remodelling storage and retrieval, no effort should be made to achieve uniformity, but rather that a set of interlocked systems should be perfected by making full use of experience in each field.
Before discussing the second of my general questions as to the nature of the scientific information to be stored and retrieved, I must say something more about the scientific user in his other capacity as a producer of information. Now this aspect, as such, lies outside the scope of the conference, but it should not, in my opinion, be altogether excluded. We may be prepared to accept—but only for the purpose of these discussions—that there is just nothing to be done about the growing chaos of scientific publication. Our business here is to take the results as we find them, if we can find them, put them in the best order for storage, and hand them out where they are wanted.
However, I think that, even for this limited purpose, it is necessary to look a little closer into the production and fate of the material that is being so handled. The writer of a scientific or technical paper is trying to fulfil a duty to science and, incidentally, to establish his own reputation, by making his results known to the scientific world. This is at least his hope and belief, but it is one which it is increasingly difficult to realise, basically because there are so many thousands of other scientists doing the same thing. I pass over here the abuses of unnecessary, inflated, and multiple publication indulged in for prestige or simply to secure jobs. Certainly everyone, even former offenders, would gain if all scientific publications were genuinely new, concise, and accurate accounts of work done. It would also be possible for the reader to cover more ground if the length of normal scientific papers was shortened in something of the way practiced for centuries in the Comptes rendus of the French Academy or along
the lines of the proposals of the Royal Society’s Information Conference (3). It might be possible by such reforms to prune the material to be read by as much as half or three-quarters, but as it normally doubles every ten years or less, these steps, however welcome, would be temporary palliatives. The central difficulty, the growing plethora of scientific publication, would remain. It has already gone a long way to bury new work under the mass of print which there is no time to read.
The average scientific author of today—I am not talking of men of established reputations or easily recognised young geniuses—has less chance of having his work understood and made use of than at any previous period in the history of science. There is just not time to read all the papers, and the chance that any particular one will find the reader or readers who will make use of it within the very short effective life of scientific publications is small; exactly how small, it would be very interesting to know (Appendix, Item 9). Many knowledgeable people would put that chance very low; indeed few would suppose it to be a certainty. I have myself suffered more than once for publishing papers twenty years before their time and having them justly ignored or misunderstood.
Now this brings us to an illuminating remark of the organisers of this conference which might mark a great departure in the handling of scientific information. In the paper on Conference plans (Area 2) we find “The primary reason for publishing (original scientific work) is to disseminate scientific information, not to store it.” I was very struck with this statement, and for all that it may appear to some as a truism, I feel that it was highly significant. However, the conclusion I draw from it, which does not seem to be quite that intended, is that what is revealed here is a contradiction in terms. It is in present conditions growing more and more difficult, and may soon be impossible, to disseminate scientific information unless this is done in a fashion that permits its easy storage—or at least its easy processing through a storage and retrieval mechanism.
There was a time when scientific information was not disseminated—it was communicated by word of mouth or letter to those deemed by the original writer likely to be interested in it. Dissemination through scientific journals was an indication of a scientific world which had extended beyond the sphere of personal acquaintance. Now in its turn dissemination, or in English broadcast scattering, is failing on account of a still vaster growth of the scientific public. To use a biological simile, the method of transmission of scientific information is on the most primitive level of wind-blown pollen. The more pollen—above a certain level—the more miss the stigmata waiting to catch them. We ought to advance at least to the more selective stage of insect-borne pollination where with far less pollen more gets to the right flowers. What
we need now is to return to scientific communication on a vastly greater numerical scale and to make use of complex organisations and mechanisms in place of inefficient dissemination.
The contradiction I cited above is only an apparent one; its resolution can be found by treating original scientific publications no longer as the main exchangeable currency of scientific information but as the raw material for processing storable and retrievable information for the general reader. This implies in no way a restriction of the amount of direct contact between the original worker and his reader. Papers would be written as before and would be read in their original form by as many, if not more, readers than at present, that is, if the secondary publications—abstracts and reports—can be improved in coverage and speed. The only difference would be that the people benefiting by this information would no longer be limited to this circle. By passing all original material through the appropriate processes of classification and condensation, it would be possible to make at least their factual content also available to a much larger set of users. Further, by the use of positive information services it should be possible to ensure that at least a perceptible fraction of those that should benefit from the new information should hear of it.
There is an additional reason, quite apart from the compelling one of crowding of papers, for extending the range of the original publication. It seems to be a fact that the practical technological reader finds original scientific papers and even original technological papers very difficult to read and does not read them much. I can justify this statement only by hearsay, and a special enquiry ought to be useful here (Appendix, Item 5). It might also find, if the statement is true, why he does not read them or thinks he cannot understand them. It would seem, therefore, that those who have the greatest need to follow the advances of science have the least opportunity to do so. This situation could be remedied by adapting the results of original investigations to the interests of the practical users. This is a task undertaken today by the technical press and possibly it is as well done in detail as it can be. However, the problem of plethora is now beginning to be felt not only with original papers but also with specially written articles. Indeed, uncontrolled multiplication of scientific information, when the simple information itself is too much to get round, is, however well-intended, self-defeating.
So far I have indicated only the need for the processing of scientific information. Now I should like to put forward some tentative ideas as to how it might be done, or at least set out some conditions that any such schemes ought in my opinion conform to, if they are to be of use and not merely one further addition to the mass of secondary scientific literature. What I have in mind goes somewhat beyond what has hitherto been considered the function of in-
formation services, although for some thirty years it has been done in the field I know best, that of crystallography. In principle its purpose is to supplement the present mechanisms of storage and retrieval through abstracts and indices which treat the object to be retrieved as the original paper, by another system of reports which takes this object to be the facts contained in a number of papers.
Casting back to earlier remarks on the various uses made of scientific information, I distinguished three main categories: (a) data, (b) procedures and methods, (c) conceptual framework, theories, and ideas. Only the first of these, data, is at present suited to simple storage and retrieval techniques, and over many fields these are being so stored, while better and better methods are being designed for their complete and rapid retrieval.
The problem of storing procedures and methods including descriptions of equipment is intrinsically much more difficult, but it is one which would be of enormous value to solve, for I am convinced that the present slow rate of transmission of techniques, particularly instrumental techniques, forms at the moment the most restrictive single factor to scientific advance.
The third problem, that of dealing with ideas, is the most difficult of all, indeed it might be argued that it is intrinsically insoluble. The reasoning by which such a conclusion can be reached was given by a distinguished mathematician who, at one of the Royal Society’s Conference discussions, objected to any form of abstracting. His contention was “No one can abstract my papers; all they could show by trying to is that they did not understand them.” “Then why not abstract them yourself?” “Impossible; if they could have been written in a shorter space I would have done so.” “Well then, if they are not abstracted they will be read by no one but your friends.” “These are the only people I write them for.”
Without going so far as this, I would consider that we are still so far from being able to abstract and reproduce in classifiable and retrievable form this aspect of scientific achievement, that at this stage it is not worth trying to do so. Further, as nearly every scientific paper contains some ideas or theories new, at least to the author, this in itself furnishes a justification for preserving the access to all original papers through a complete abstracting and indexing system. In other words, this direct channel of transmission must be kept open, for even if only a small proportion of the total of scientific information flows through it, this amount is essential to the maintenance of the growth of science.
To admit that, whatever changes are made, the established means of access to original work is preserved does not mean that I would grant that the present methods of publishing and disseminating information in that form are ideal. I still feel, for reasons already given, that instead of the present intermediate
length paper often to twenty pages, it would be better to have a short, pointed paper of some two pages in the form of what has been called an informative abstract. This would be supplemented by a longer, more detailed paper, not printed and published, but available in duplicated, microfilm, or other modern method of reproduction, to all those thought to be interested in it or who requested it.
Such proposals, however, are outside the scope of the present Conference; here I want only to discuss the subject of secondary publication covered in Area 2. It seems to me that for this purpose we can divide the field into two aspects of the recording of science, what might be called the differential or current, and the integral or cumulative, both of which need to be served by different forms of secondary publication.
The first of these aims at bringing to the user an ordered picture of the present activities of science. This would be less immediate than the existing type of science magazines like Nature, but more so than the normal reports of Progress in ______ which are proliferating today, though like them it should carry enough indication to find the original literature. A time delay of six months from the latest entry, to eighteen for the earliest, is, I think, technically achievable. It should be recognised, more than it has been, that these kinds of reports, like the original papers on which they are based, are ephemeral publications. They would be aimed, on one hand, at giving notice of the working hypotheses and new discoveries of the day, but in no sense presenting a record either for reference beyond five years or so, or for history which requires quite a different technique. They must also be written by active and interested workers in the field, taking precautions to avoid or neutralise personal bias.
The cumulative secondary publications by contrast will not be intended for reading, but for reference. Already, over all the easily reduced parts of science they exist as Mathematical Tables, Data Tables, Floras, and Handbucher, etc. There is little that need be said about these except that they need more money and help to keep up to date. Further, as no doubt will be fully discussed at this Conference, they need to be adapted to modern methods of search and retrieval, even to the extent of superseding the old printed volumes.
There remains, however, the much more serious problem of extending the ideas of cumulation of data or facts in fields where hitherto there has only been accumulation of publications. We have to find out whether this is mainly because there has been no demand for such data or merely because the problem of collection of such information and its reduction to standard forms suitable for data compilation has proved too difficult. These two aspects are related. The real need for data compilation cannot be fully felt as a want until there is some possibility of meeting it.
The two great difficulties that have held up the extension of data collections to other fields of science have been those of classification, which grows with the complexity of the particular system of knowledge, and of reliability of the data themselves, which varies both with the difficulty of the subject and the rapidity with which it is advancing. I would think for instance, though I have no competence in this field, that in many branches of medicine standardised and statistical information would be of considerable value, though I can see how difficult it would be to prepare them and keep them up to date.
In another field where I do have some experience, that of scientific instruments, I believe that collections of data would be of value. I ground this belief on my own experience in visiting laboratories and on that of the National Physical Laboratory, already referred to. Here the difficulty is not that of complexity nor of classification, but of speed of advance and consequent obsolescence. The old method of printing data tables or dictionaries would here be quite useless, except for the history of science, because the information is wanted within a period reckoned in months rather than decades. However, I believe it would be an admirable field to try out rapid electronic sorting and retrieval devices once a sound and flexible method of classification had been worked out. Data on scientific instruments have one advantage not shared by many other branches of science. Because an instrument has to work and have a measurable performance, the original data are largely self-checking. In other words it would here be relatively easy to set up a semi-automatic filter against repetitive, irrelevant, and incorrect information.
The inability to do this is the great curse of all data compilation, for experience shows that the inclusion of even a small proportion of bad information in any set of tables results in a feeling of untrustworthiness that spreads over the whole collection.
Data collections are too often considered only in their passive aspect but, except as the subject of historical studies, they need continual working over, of which they never get enough, as the bodies responsible for them are generally so short of means that they have all they can do to get the new data. Good data tables need to be reclassified and rearranged at decennial intervals, or more frequently in rapidly moving fields. The switch over from printing or card methods to newer techniques should make such checkings and rearrangements not the annoying interruptions of routine they are now, but a normal part of the process of information storage, in ways of which we may hope to hear much during this Conference.
Such tables would have a value beyond their use for reference. They should suggest by their mere systematic arrangement, by the use of statistical or other methods, new generalisations and laws in the fields they cover.
As time goes on, we might expect a change in balance between the uses of the two types of secondary publications already discussed, that is, between that based on the storage and retrieval of primary publications by abstract and index systems and that based on the current and cumulative handling of facts given by reports and data compilations. As the former become more and more unmanageable by mere bulk, the latter will, in my opinion, gain in importance. It would be interesting to enquire to what degree this is already occurring (Appendix, Item 7).
I have now covered the main objectives of this paper in presenting an analysis of scientific information services from the point of view of the user. I must apologise for its qualitative and personal character, but it may for that very reason form a useful corrective to the predominantly quantitative and mechanical character of the Conference. The object we all have in mind is not that of processing a material or even a set of figures, but rather of effecting the largest measure of communication between human minds.
Before I conclude, however, I would like to add certain remarks and suggestions referring to the areas of the Conference concerned with enquiries on the use of information and on methods of search and retrieval.
I am a strong believer in the value of operational analysis in the field of information transfer. Indeed, I had a striking example of this in the conclusions from the rapid pilot survey of reading habits carried out for the Royal Society Information Conference (3). Before that, I had been so much impressed, through the experience of my own work, with the importance of reprints that I had proposed a scheme for substituting a rational distribution of these for the traditional scientific periodicals. This scheme roused much feeling and was even castigated in a Times leader as “Professor Bernal’s insidious and cavalier proposals.” However, the result of the pilot survey showed me that scientists as a whole did not work the way I did, but rather made use primarily of libraries where the disadvantages of the bound periodicals largely disappeared. Consequently, I immediately abandoned my original proposals and publicly withdrew them at the Conference.
Here I might have been precipitate and wrong, for the fact that working scientists used libraries only proved that with the existing system of distribution this was the best they could do and that they might well change their habits to advantage in the new conditions. However, operational proof that this was not the case was furnished by the results of the use of a scheme, similar to mine, but more limited, by the Physical and Chemical Societies in Britain. Both schemes failed and for a reason I had not anticipated, namely that working scientists cannot be bothered even to make a mark on a form to get the paper they are interested in. Whether it would have worked in the
way I originally suggested by which they would have received the papers in specified fields automatically remains to be seen, but I am inclined to doubt it.
Despite the proved value of such enquiries, backed by operational variations, I would like here to put in a very general plea against treating these enquiries on the level of ordinary scientific investigations and accepting the results at their face value. In all matters involving human behaviour, even that of professionally rational scientists, it is very difficult to infer from measured behaviour in one set of circumstances, even subject to some variation, what they would be under very different circumstances. In mathematical terms one method of measurement with slight variations may help to determine the local maximum of performance, but there may be other and much higher maxima which can only be reached by a considerable jump.
Applied to the problem of scientific communication, it means that we ought not to spend too much effort on routine studies of user behaviour except for pilot surveys to bring out striking but unrecognised features, such as that of Dr. Urquhart’s library request analysis. We should rather try to find or create extreme conditions which should test our conceptions of the whole function of information. For instance, we might see how scientific research, undertaken without any information at all, would go in comparison with that using the whole battery of modern information services (Appendix).
My last suggestion is one which might be considered to run counter to the whole trend of interest of the Conference in favour of the mechanisation of information services. I do not think, however, that, in proposing a greater use of the experience and knowledge of individual, active scientists in the whole procedure of information services, I am in fact taking an opposition view, but rather proposing a very necessary complement to the increased use of mechanical methods. What I feel we need to do is to find the areas most suited to the capacities of the human mind and those most suited to mechanical processes.
If I may anticipate, for argument’s sake, the results of such enquiry, it seems to me that the human mind excels in judgment and in search. As to the former I would use people, preferably several of them to avoid bias, in the first or filtering stage of processing of information in order to remove at the start the repetition and nonsense that is in danger of choking scientific information to death. This I would suggest would be best done by associating groups of competent and, for the most part, young scientists with the first stage of examination of the literature to extract the facts or alleged facts from it. This is indeed already an old practice and is used on a large scale in the preparation of the abstracts for abstract publication of the Soviet Academies (1). Every paper and its abstracts are read by three such auxiliary abstractors.
If the service were properly organised and sufficiently widespread, the load on individual scientific workers would be small. Where it might bear most heavily, in the preparation of reports, it would be of value to the generally younger scientists on whom it would fall. There is no better way of learning a subject than in preparing reports on it, as I have experienced in the years (1) in which I helped to prepare the Annual Reports of the Chemical Society. There is also no better way of appreciating the shortcomings of the scientific literature and the working of scientific information services and acquiring a desire to see them reformed. The association of all junior scientists (not just a selected few) in documented work in this positive and useful way would, in my opinion, be a most valuable introduction to them of modern documentary science. Many of the enquiries reported on or proposed at this Conference would give very different results if the scientists whose behaviour is being studied had a thorough grounding in the use of libraries and information services.
It is, however, in the other capacity of search that the greatest potential and least used capacity of the human mind lie. Being evolved largely for the selection of significant material and with a memory adapted for the purpose, the human mind skips over the field of choice and arrives at the desired point—if it gets there at all—quicker than a machine of many times the intrinsic speed which must work exhaustively.
But this is only the least important of its services in the matter of search. A machine can in principle find everything that has been written about a specified subject. For lack of judgment it may find too much, and most of what it finds may be nonsense, but that could, if necessary, be sorted out by the recipient. What the machine cannot do, however, and the human mind can, is to answer questions that have not been asked. The first sensible answer to any enquiry, except to the few who really know precisely what they want, is to ask in turn “Why do you want to know that?” If the questioner persists and swallows the insult, it is usually possible to give him what he really needs to know, rather than what he thinks he wants to know, and this all the more because such enquiries are usually made in fields so unfamiliar to the questioner that he literally does not know what to ask for.
Now I know that much of this function of eliciting information needs is already performed by the officers of the larger information services today. My plea is for an extension of this to make use particularly of the experience of older scientists who do not perhaps always know the answers or even where the answers will be found, but who do know the man who will certainly put them on the track. I remember well during the war, when at a Conference to unify military information services, the Chairman, the Librarian of the Admiralty himself, showed us the futility of our efforts by remarking that in all
his forty years of experience he had never found that anyone had failed to get the information he wanted, provided he knew the right person to ask! Without going quite as far as that I would like to add a human who’s who selector to the battery of mechanical information searching devices.
And now in bringing this paper to an end, I shall recall for the benefit of those who have lost the thread in its discursiveness only the main point I wished to make, which is that a knowledge of the requirements of the different users of scientific information and the uses to which they wish to put the information they secure should be the ultimate determining factor in the designing of methods of storage and retrieval of scientific information.
1. RALPH R.SHAW. Pilot Study on the Use of Scientific Literature by Scientists, Rutgers University, New Brunswick, New Jersey, 1956.
2. D.J.URQUHART, “Use of Scientific Periodicals,” this conference.
3. Royal Society Scientific Information Conference, June-July, 1946. Published 1948.
APPENDIX List of enquiries in the field of scientific information proposed in this paper
Statistical estimates of the number of enquiries for scientific information in the categories of (a) fundamental research, (b) applied research and development, (c) technologists, (d) writers of reports, textbooks, teachers, students, etc., (e) scientific and technical journalists, (f) the interested public, (g) historians of science.
Statistical estimates of the number of enquiries emanating from the above categories of users. (It would be preferable to combine 1 and 2 and to add a further breakdown into subject fields.)
What proportion of information required is needed for purposes ancillary to the main work in which the user is engaged?
What proportion of information required here, not only in enquiries but in books and periodicals asked for, is for general and what for special purposes?
How far does the technological world read (a) original, (b) secondary articles?
What proportion of users of scientific information has any special knowledge of bibliographic techniques?
How has the reading of primary and secondary sources of information varied within recent years?
How do the true and back half-lives of original papers differ in the different subjects?
What are the chances of a scientific paper in different fields finding one reader who will make good use of it?
Can any measure be made of the proportion of facts in different sciences that are rediscovered?
In addition I have proposed a competition in scientific research in the same field of three teams (a) with best available information services, (b) with present average information services, (c) with no information services at all.