A Combined Indexing-Abstracting System
The Cardiovascular Literature Project1 was established in 1955 for the purpose of collecting, indexing, and disseminating detailed experimental and clinical information concerning the effects of chemical agents upon the anatomy, physiology and pathology of the cardiovascular system. The “raw” data are derived from the extensive available world literature.
The end result of this accumulation will consist of a series of handbooks in index format which are designed to act as a clear and authoritative guide to the world literature published during the last few decades.
The indexing procedures employed in this compilation are the result of a rather novel approach to the problems of storage and retrieval of scientific and technical information. They are based upon the assumption that existing methods do not meet the increasing needs of scientists for detailed, informative indexing. An attempt has been made to insure the retrieval of isolated “bits” of information of interest to the prospective user from among the complex, voluminous and polyglot literature of a specific area of medical science. In addition, such information is pinpointed with a great deal of precision so that the number of scientific papers which must be examined during the course of a search may be minimized.
The conventional approach to the control of the scientific literature is almost invariably based upon the use of abstracts.
An abstract usually consists of a relatively brief and concise summary of the
ISAAC D.WELT Director, Cardiovascular Literature Project, Division of Medical Sciences, National Academy of Sciences—National Research Council, Washington, D.C.
contents of a scientific paper. It may be mainly of a descriptive nature, that is, it may present a general résumé of the main conclusions reached by the authors of the communication. It thus represents a detailed, expanded annotation. On the other hand, the abstract may be of an informative nature, presenting experimental data in some detail and quantitative information. The informative abstract is best represented in the Organic Chemistry section of Chemical Abstracts. In some cases, the abstract is so complete that an examination of the original paper by a searcher is almost superfluous. Such abstracts are usually compiled by subject matter specialists.
Within recent years, as a result of the explosive growth of the scientific literature, the informative abstract has assumed great importance. As it has become virtually impossible for the active scientist to read all of the original papers which are pertinent to his field of endeavor, he has begun to rely more heavily on secondary publications as an information source. The value of informative abstracting, carried on by individuals who are highly qualified scientists themselves, therefore becomes readily apparent.
Abstracts, however, also tend to become too numerous for efficient and regular perusal by the busy researcher. One of the complicating factors involves the classification of abstracts in the abstract journal. In many cases, multiple entries are indicated, since it is frequently impossible to classify satisfactorily an abstract of a paper dealing with a number of dissimilar subjects. Thus, for example, there is bound to be a difference of opinion concerning the placement of a particular CA abstract in Section 11F—Physiology rather than in 11G—Pathology or in 11H—Pharmacology. It seems to be largely a matter of the highly subjective decision of the editor in charge.
A number of scientific journals and abstracting periodicals make use of so-called authors’ abstracts. In addition to summarizing their publication within the body of the article, authors are requested to provide an abstract of their contribution. Since most authors are not skilled in the art of abstracting, such abstracts are bound to be rather uneven in style as well as in content. They may constitute nothing more than brief annotations, several sentences in length. They may be purely descriptive, emphasizing what the authors think they have been successful in demonstrating. On the other hand, they may be very informative, indeed, but limited to a small segment of the publication itself. On the whole, they must necessarily be highly subjective. It is a normal human failing subconsciously to de-emphasize the significance of negative results or those not in accordance with theoretical preconceptions. Conversely, data assumed to be significant may be overemphasized. On the whole, the user of authors’ abstracts cannot feel quite sure that the abstracts represent fair, undistorted résumés of the original publications. A meticulous literature
searcher may fear that potentially important items may have been overlooked by the author, engrossed as he may have been in trying to establish his main thesis. This is particularly true in the numerous instances in which scattered, seemingly unimportant bits of data, accumulated as by-products of the main experiment, turn out to be of permanent importance to workers in seemingly unrelated fields.
In general, too much dependence on informative abstracts, even when written by impartial observers of the scientific scene, may lead to a false sense of security. In fact, in extreme cases, the lack of an abstract may prove to be advantageous. In order to illustrate this point, we may consider a searcher seeking for a highly specific piece of information, such as whether a certain chemical compound had ever been tested for a specific pharmacological action. He may find only one abstract reporting this compound and concerned solely with its purely chemical aspects. As a result, he may naturally assume that no biological tests had ever been performed and, as a result, neglects to look up the original paper. This scientific communication may have been abstracted by a chemist with a natural bias in favor of strictly chemical facts. He may have unwittingly overlooked a single sentence buried deep within the manuscript documenting the fact that the compound had indeed been tested biologically but had been found to be totally inactive. We thus have a case of irretrievable information due to a false sense of security on the part of the searcher. Many examples of significant data which have been similarly overlooked easily come to mind, resulting in completely unnecessary, time-consuming, and expensive duplication of effort.
As a result of the relatively high information content of the average scientific paper which must be compressed into the brief and concise format of the usual abstract, many subjective decisions must be made by abstractors concerning inclusion and exclusion problems. This, in the language of information engineers, is bound to generate a goodly amount of “noise.” Better abstracts would certainly result if each paper were abstracted separately by a number of abstractors and their efforts combined by a completely impartial editor. However, this is usually not the case as a result of economic factors and the necessity of keeping up with an ever increasing flow of papers to be abstracted. This state of affairs has led to the all too prevalent belief that any abstract is better than none at all. It has also given rise to the often-heard complaint that too many abstracting services are duplicating each other in their journal coverage. Such duplication is not only inevitable but is beneficial, up to a point. An abstracting service devoted to information of a chemical nature will provide abstracts from journals such as Science or Nature, which are markedly different from those compiled by a biological abstracting service.
Indexing constitutes, without doubt, the most important avenue of retrieval of scientific literature. Most abstracting journals provide comprehensive indexes to their abstracts, based upon the realization that without them, the utility of the abstracts would be greatly impaired.
The compilation of adequate indexes in any field of intellectual endeavor presents a number of highly complex problems to the documentalist. In the field of science, these difficulties are multiplied as a result of obscure and esoteric terminology, extensive subject specificity and, as mentioned above, the relatively high content of indexable information in comparison with the non-scientific literature. The indexer of scientific and technical publications cannot easily justify the omission of any bit of indexable information. There is always a good chance that someone may sometime wish to have this specific piece of data available to him.
“Word” indexing as opposed to “subject” indexing is of dubious value in the indexing of scientific periodicals. Subject indexing, on the other hand, must involve a more or less rigid standardization of semantic factors or terms. The lack of standardized subject headings, their undue flexibility or inconstancy from one year to another, is bound to lead to situations where irretrievability of sought-for information becomes a significant factor. The searcher’s “threshold of frustration” can most easily be exceeded when he is obliged to search under a plethora of possible subject headings without the assurance that he will eventually find the proper one. A standard subject heading authority list should always be accompanied by as many “see” references as are necessary. Such references act as a glossary or dictionary and serve to direct the user to the proper heading from among a number of synonyms or “near synonyms.” The differentiation between a synonym and a “near synonym” is a function of the analytical level of the index; that is, how specific and detailed are the subject headings themselves? It is believed that the number of subject headings should remain relatively constant as the scope of the indexing endeavor is diminished. In other words, if the area of information to be covered is decreased, those subject headings pertaining specifically to the new field of concentration must be further subdivided and refined, with the result that as many of them are used to describe the restricted area as were formerly employed for the less limited field of literature coverage. The construction of a suitable subject list is a task for the subject specialist only and cannot be based solely upon proper terminology. A compromise must be made with accepted usage in the interests of achieving easy accessibility. Since the prospective
user must constantly be kept in mind, adequate provision of “see also” references must be assured. Such additional aids are based upon the “association of ideas” concept, and not necessarily upon obvious word relationships. The “see also” reference is a means of teaching the user how to use the index to his best advantage. He is given the all-important sense of security that he is quite likely to find the information he seeks even though he is not quite certain as to exactly what he is looking for when he first approaches the index. It is quite obvious, therefore, that the individual who sets up subject headings with their accompanying “see” and “see also” references must himself be a good representative of the prospective users of an index in terms of subject matter background and literature searching habits, practices, and approaches.
Many indexes, especially those accompanying abstract publications, are compiled from the abstracts alone. The indexer does not attempt to examine the original papers but relies largely upon the information contained in the abstracts. This leads to a rather unsatisfactory state of affairs. Any subject slanting, omissions, or errors on the part of the abstractor are perpetuated and magnified. The information provided by the index entry is now twice removed from reality, that is to say, the original paper. Additional “noise” generated by the indexer is added to the information system without the prior removal of that contributed by the abstractor. This practice of indexing abstracts is, of course, again traceable to the great mass of scientific publications which must be handled and to financial and personnel limitations. For the same reasons, index entries of trained, qualified indexers are rarely checked or duplicated by others with similar backgrounds and training. Index entries are usually rather telegraphic in style and are kept as brief as possible. As a result, here again, there may be a loss of information contained in the abstract which has under-gone indexing procedures. All that has been said about irretrievability resulting from abstracting holds true with respect to indexing and gives rise to an even more unsatisfactory state of affairs. All scientists must use an index as a means of preliminary scanning of pertinent literature. Therefore, when an indexer misses a potentially important item or indexes it inaccurately or obscurely, the item is rendered irretrievable for most intents and purposes.
When a relatively circumscribed area of science is indexed, the use of even the most detailed subject headings may still not be specific enough. The searcher may be directed to a dozen or so papers only to find that the information which he seeks is contained in no more than one or two of them. There is thus an avoidable loss of time on the part of the busy scientist and a certain dissatisfaction with the index. In addition to the use of standardized subject headings, therefore, other means of achieving the pinpointing of data must be used.
A possible solution to the problems discussed above can now be suggested. Essentially, it involves the combination of abstracts with index entries, thereby leading to the elimination of the conventional abstracts and the all too brief index entries. Instead, a detailed and relatively lengthy hybridized index entry results, containing much more information than the conventional entry and easily accessible by means of the alphabetized subject heading approach.
Conventional abstracts of an informative nature, containing all indexable items, can be relatively easily transformed into detailed index entries. The procedure involves the “dissection” of the abstract into a number of complete sentences, each standing alone as distinct “lines of data” and contributing a unique item of information. Each of these sentences can then be rearranged as to word order so that the standardized subject headings or subheadings may appear in a predetermined position. An example of this would be the following: “Compound A, when injected intravenously into anesthetized dogs, produces damage to their kidneys.” This sentence, extracted from within the body of an abstract, provides a useful item of information concerning the effects of Compound A in a specific situation and under special circumstances. The obvious subject headings would be “Compound A” and “kidney.” The above sentence is rewritten in indexable form as follows: “Compound A → Kidney, produces damage to, when injected intravenously into anesthetized dogs.” The arrow (→) which can also be used thus (←) is equivalent to “effect upon,” an item commonly used in conventional abstracts. Its direction is from the subject of the sentence to the object. The same information can also be written in a slightly modified form, as follows: “Kidney ← Compound A produces damage to, when injected intravenously into anesthetized dogs.”
As we have seen, accurate indexing can best be done by individuals who use the original paper rather than an abstract of it as their primary information source. It is therefore not much more time-consuming or expensive to index a piece of information according to the above method than to use the time-honored “effect upon” which would lead to the following conventional entries: “Compound A—kidney, effect upon, in dogs” and “Kidney=Compound A, effect upon, in dogs.” It can readily be seen that the additional information concerning the positive effects of Compound A; that is, that it really does produce damage, the importance of the route of administration (e.g., it may not produce damage when given by mouth) and the anesthetized state of the animal (which may alter the response to the administered compound as compared with a normal, unanesthetized dog) are available to the
searcher. These data enable him to preselect the papers which he would like to consult in the original, to a relatively fine degree. The simple “effect upon” entry might refer to some twenty or so papers. Upon detailed inspection of these articles, however, it might be found that in only five of these was a positive effect of Compound A upon the dog kidney described. In the others, the proper experimental condition had not been present and the results therefore appeared negative or equivocal. The saving in time involved in pinpointing information and the lack of frustration cannot be overemphasized.
A binary (i.e., subject → object; object ← subject) indexing procedure logically gives rise to two index entries, for each “bit” of independent information. In our present undertaking, the subject is usually a chemical compound or group of compounds and the object, the biological entity, which may be an organ, an organ system, a physiological function, a disease or a symptom of a disease. The system of what we have come to call “reciprocal entries” enables the user to approach the indexed information either from the chemical side or from the biological point of view. Obviously, the binary or reciprocal system is not limited to such subject headings and can easily be utilized in the indexing of any cause and effect relationship or, for that matter, any two terms which are to be brought into coincidence. The provision of a second, reciprocal entry need not necessarily be carried out by the indexer himself. It is essentially an editorial job which can readily be done by clerical personnel. The use of standardized subject headings which can function either as subject headings in their own right or as subheadings, that is, can be placed either to the left or to the right of the arrow, facilitates the construction of a reciprocal entry by a clerical person since he or she can use a list of standardized headings.
Negative data can, of course, also be handled by this method. One need only add the additional terms “does not” or “did not” between the subheading and the active verb, to designate a negative result.
The extreme importance of recording negative data in an easily retrievable form has been stressed by many scientists. Unfortunately, however, even when such information is available in a published paper, relatively few abstracting or indexing publications make it a point to emphasize it or even to abstract or to index it. A good deal of lost motion, duplication of effort and the resulting expense and delay would be obviated if negative results were treated in the same manner as are the more popular positive reports.
Equivocal information is of value to the scientist insofar as it alerts him to the presence of “unfinished business” in the literature. Terms such as “increases somewhat” or “produces a slight degree of” can be used to index data which are still not well established.
The use of active verbs instead of passive ones must be encouraged in the
interests of clarity and unambiguity. In the above example, if the entry were expressed as “Compound A → kidney damage produced by, upon intravenous administration in anesthetized dogs,” it would lose much of its positive impact. The point of the arrow would have become, as it were, somewhat blunted. Our extensive experience with this system indicates that the use of active verbs does not in any way contribute a deterrent to the speed and productivity of the indexer. It certainly helps the user to understand the index entry more clearly.
It is possible and indeed, advisable, to construct a series of active verbs which may be used preferentially to describe the effects of the subject upon the object. General verbs such as “alters” and “affects” may be used where more detailed effects are not furnished by the author of a paper. Much more preferable is the use of verbs such as “increases,” “decreases,” “produces,” etc., with their accompanying negative modifiers (i.e., “does not”) whenever necessary.
Another advantage of such a list of preferred verbs is that it enables the sub-alphabetization of entries. In many instances, especially when a relatively restricted field of knowledge is under consideration, there is so much information available relative to a small number of subject headings that, no matter how detailed these headings may be, a large number of entries will accumulate under them. For example, in our own field of interest, there are hundreds of entries under the chemical, epinephrine, as a main heading and the physiological function, blood pressure, as a subheading. If the word sequence in the index entry is standardized so that the verb invariably follows the subheading, entries which are alphabetized first under the main heading and then under the subheading, can still further be alphabetized under the verb. If this practice is adhered to, a preferred list of verbs is necessary. Synonyms (i.e., increasesaugments) also must be considered and eliminated as far as possible. As a result, all entries concerned with an “increase” in the physiological function which is affected by the drug will be grouped together. Furthermore, all negative entries are under the “does not” heading with “alter,” “decrease,” “increase,” and “inhibit” following in that order.
The provision of information concerning some of the conditions under which the specific effect was obtained is of help in further reducing the number of papers which must be examined by the searcher. The species of animal used, the route of administration, etc., are examples of these-special conditions.
The use of standard abbreviations serves to reduce the physical size of each index entry without at the same time cutting down its information content.
Admittedly, the average scientific paper will generate a large number of entries, if indexing is carried out in a detailed manner. On the other hand,
however, abstracts will have been abolished without the irretrievable loss of their unique function, which is to summarize the paper as a whole. Each indexed paper is given an accession number, and each index entry resulting from the paper is referred back to the original by means of such a key number. It then becomes possible to assemble all entries referring to a specific paper either by manual methods (i.e., each entry having been entered on a separate file card) or by machine methods where a separate punch card is used for each index entry. Once the entries are assembled, they can easily be edited and an acceptable abstract prepared. Therefore, although the combined indexing-abstracting approach does involve “short-circuiting” the abstracting stage, abstracts can still be made available where necessary without the need for consulting the original paper.
This last point may be used to emphasize yet another unique function of the informative detailed index entry. Such a compilation may, to a limited degree, also function as a “handbook.” A handbook may be considered as a terminal source of information, that is, its use eliminates the necessity of consulting the original paper. In many areas of science, particularly where quantitative information is of great importance (e.g., physical chemistry, toxicology), the index entry may furnish all the information necessary. Thus, for example, an entry such as, “Compound A—toxic effects, produces, in the dog at a dose of 500 mg. per kg., upon intravenous injection. LD50 is 700 mg. per kg. Symptoms are nausea, incoordination, etc,” provides valuable information in its own right, without the necessity of consulting the original literature. The addition of the usual reference number may then be superfluous. This is particularly true of the many thousands of items of scientific information obtained as a result of preliminary “screening” tests, such as are used in research in the field of cancer chemotherapy and antibiotic control of infectious diseases.
The recording of negative data by this means would result in a great saving of time. Users of such an index would rarely find it necessary to consult the original paper, since reference to the index entry would assure them that the particular compound in question had indeed been tested in a certain animal species under specific conditions, by a particular route of administration, at one or more dosage levels and found to be inactive with respect to the biological response under study.
The combined “abstracting-indexing” approach to the control of the scientific literature is now being utilized successfully in the Cardiovascular Literature Project in the National Research Council. To date (May 1958) it has handled some 13,000 separate papers in numerous languages. These documents are largely reports of original laboratory and clinical findings but also include a number of reviews, monographs, and advanced textbooks.
In other words, our methods have been put to the test and have been found to be practical. Although much more research is obviously indicated, it is possible to handle a relatively large body of information by means of our present approach.
The retrieval question which is, of course, at the heart of the matter, cannot as yet be fully answered. As with most new approaches, the user must be taught how to take full advantage of the benefits inherent in the system. Its limitations must also be pointed out. If the proper questions are asked of the system, there is no doubt that it can function more effectively than existing abstracting and indexing services.
Instead of attempting to construct a means of literature control applicable to a wide range of subject matter, the present endeavor has been limited to a relatively well-defined area. This was done in order to establish the efficacy of this novel method in an area where the subject matter itself can most easily be handled by it.
On the other hand, however, there is good reason to believe that, with adequate modification, this approach could be extended to other areas of science and is not necessarily restricted to the chemical-biological sphere. Neither is it restricted to a binary system (i.e., chemical → biological; biological ← chemical). With relatively minor changes, three, four, or more items per “line of data” can be accommodated, each functioning, in turn, as main headings and subheadings. Thus a form of “coordinate indexing” can be achieved, with the important proviso that the index itself furnishes the necessary coordination instead of the user. A truly multidimensional index can therefore ultimately be achieved.
Although the present undertaking will result in a conventional printed index, it is easily amenable to machine methods. The coding problems have been minimized by the use of standard main headings, subheadings, and active verbs. Their translation into meaningful machine language is therefore a relatively simple problem. Mechanization of a combined indexing-abstracting system would be most helpful in facilitating the dissemination of the information which it can store, in addition to being an invaluable aid to its efficient retrieval. The publication of printed lists of index entries can be most easily achieved by the use of punch cards which accommodate both the typewritten entry, as well as its encoded version. Selection of desired entries can then be accomplished mechanically with a consequent saving of much time and effort.
In summary, the present contribution attempts to outline a new system for the control of the scientific literature which combines within itself the best features of conventional methods of abstracting and indexing. It has been demonstrated, at least to our satisfaction, to constitute a practical method for
the storage of a relatively large volume of information concerning chemical effects upon biological entities. The use of this approach to facilitate the retrieval of indexed data appears, at this time, most promising. A great deal of effort has gone into attempts directed toward the mechanization of literature control within recent years. As a result, significant advances have been made, and numerous machine installations are now in operation. However, it seems to us that relatively little has been accomplished in the important area of abstracting and indexing which provide the raw material for machine processing. Mechanization ought to permit a much deeper and more detailed type of indexing than the manual approach without a proportional increase in costs. The savings effected by conversion to machines might well be used for the provision of more adequate indexing for, after all, no system of literature control, however intensively mechanized it may be, can ever be any better or more efficient than the accuracy of the raw material which it is called upon to process and the meticulous detail with which it is indexed for the purposes of storage and retrieval. This restores the “human factor” to the problem of the control of the ever increasing mass of scientific information, which is the basic problem of the present conference.
1. WELT, ISAAC D. Indexing of Chemical-Biological Data in a Restricted Field of Medical Science. Presented at American Documentation Institute, 1955.
2. WELT, ISAAC D. Subject Indexing in a Restricted Field. Science 123, 723 (1956).
3. WELT, ISAAC D., and JUDITH T.MACMILLAN. The World Literature on Cardiovascular Drugs. Bull. Med. Lib. Assoc. 46, 60–72 (1958).
4. WELT, ISAAC D. The Detailed Indexing of Biological Effects of Chemical Substances. Bull. Med. Lib. Assoc. 46, July (1958).