The Basic Types of Information Tasks and Some Methods of Their Solution
General information about retrieval tasks
THE POSING OF THE TASK
The defined set of information among which it is necessary to carry out a search is represented in the form of a universe E of individual elements of information ek (1−3). The answers to questions asked are particular subsets of E. The volume of E can grow by means of the addition of new ek, that is, E is considered as defined only for a definite interval of time. Data from the elements of information are linked more or less closely to one another, since the condition is usually assumed that they deal with one and the same “subject” (concept, experiment, process, substance, animal, person, etc.). On the basis of considerations which will be set forth below, the universe E is, as a rule, made up of homogeneous ek. Proceeding from the defined set of information, the scope of the questions asked is also determined more or less accurately. In general, this does not mean that all the possible questions are previously known, but that for each question it is usually known whether or not it can be asked. The general conditions for the selection of ek for the questions asked are also known.
The solution of the task “Does it answer the question asked?” for each ek, generally speaking, does not evolve clearly from the information contained in the ek and the question. It can require either a processing of that information according to definite rules or the referring only to information from other ek (if the set of the information is closed), or even the referring to a more general set of information than E, or, finally, it may go beyond the limits of searching for certain information and may require the carrying out of scientific research work. Thus the work of carrying out searches is in its immediate, original form one of the varieties of mental labor requiring not only great erudition and memory in order to cut down the number of references to the data which
V.P.CHERENIN All-Union Institute of Scientific and Technical Information, Academy of Sciences of the USSR, Moscow.
are not included in the information element being immediately considered, but also great art in processing the data from the ek being considered and from the question according to definite rules. In order not to resort repeatedly to such a complex and slow process during each search (it is assumed that searches will be made many times), it is obviously desirable either to record, by some method, the results of searches previously made, or to create a special information language and to use it to record the content of the information in all the ek, as well as the content of the questions asked, in such a form that will make it possible, to a greater or lesser extent, to automatize the process of solution, without at the same time leading to an inadmissibly large increase in the volume of initial data or time expended for the solutions.
The first method of solution cannot be applied to all possible questions (of the defined scope), if there are very many of them, and therefore the entering of all old and all newly incoming ek is done only for a limited number of rather general questions or for more limited, more important questions. This work can be done periodically (as a sufficient amount of new ek is accumulated) in the form of bibliographic reviews, reference books, tables, etc., prepared by specialists on the individual questions. This work is done continuously as new questions come into the information section, by having the section personnel examine all the existing material, with attempts being made here to use in part the results of searches made by the users themselves, either on the basis of the content of the requests (library) for sets of ek which they found as answers to the questions in which they were interested, or on the basis of lists of sources which they cited (ek) in articles which they published (citation indexes (4)). Subject classifiers also carry on continuous work in recording each incoming ek on a more or less established list of questions according to subject headings, from which subject indexes are then prepared. A secondary information task, however, develops as a result of this work: since the answers are found not for all, but usually only for a small number of all possible questions, it is often necessary to find for an arbitrary question (once again, from the definite scope) all the similar questions “corresponding” to it from among the already processed number of questions, which thus form a new set of elements of information E. For example, a system of cross references in the subject index is planned for this purpose. The solution of this secondary task has, as a rule, already had to be done (with the large number of all possible questions) by the second method, which is the only one that we shall consider in the future.
With the second method of solution the content of the information in each ek and the content of any question asked is, in general, approximated more or
less on a one-to-one basis by the set of standard semantic units—characteristics—with an indication of the standard synthetic relations between the latter (1–3). Such a process is usually called indexing. By synthetic relations we understand the connections which can be established between the characteristics directly on the basis of the content of the ek being indexed or the question. A particular degree of accuracy and one-to-one correspondence (the most that can be permitted is a few approximations) of indexing depends upon our knowledge of the ek and the question (it is impossible to approximate something with greater accuracy than the accuracy of the assignment of that something), upon our possibilities with respect to the scope of the work necessary to carry out a particular approximation and the time, complexity of equipment, and volume of entries necessary for solution by means of comparison of approximations of the particular accuracy. It is also necessary for the degrees of accuracy of approximations of ek and the question to be approximately identical. Therefore, it is often necessary, even for tasks in which a high degree of approximation is applicable, to limit oneself to lower degrees of the latter.
The demand for a one-to-one correspondence of approximation is, I dare say, more important than the accuracy, if by a shortcoming of accuracy we understand the searching for superfluous ek which do not answer the question asked, but not the loss of necessary ek, since when the one-to-one correspondence of approximation is violated, necessary ek will deliberately be lost. Therefore, it is sometimes better to approximate ek and the question less accurately, but on a more nearly one-to-one basis.
In general it can be said that the content of each ek and question must be approximated on a one-to-one (or almost one-to-one) basis by the logical function of logical variables, after which the selection conditions to be satisfied must be checked by means of definite logical operations on the functions of ek and the question. In this process the conditions for selection can in general be approximately described as the requirement that the function of the question be the logical consequence of the function of the ek. We shall meet an illustration of this approach below.
Thus, in the final analysis the special information language must be a variety of that “language of science” about the creation of which many scientists have dreamed. Without posing the task too broadly, but remaining on the ground of reality, we shall limit ourselves in the future only to the simplest degrees of approximation which are applied in practice and which require a not-too-complicated information language, which is so called, rather than the “language of science,” in order to emphasize its simplicity and its immediate practical applicability. It is on the basis of these degrees of approximation that the basic types of information tasks will be introduced.
Getting somewhat ahead of ourselves, it can be said that with an increase in the number of the type there is an increase in the complexity of approximation and the entire information task. Therefore, it is obvious that for any information task it is necessary to select the lowest type providing for a one-to-one correspondence and the minimally acceptable accuracy of approximation. The acceptable accuracy is determined on the basis of the practical requirements, and the possibility of carrying out the particular one-to-one correspondence and accuracy of approximation, on the basis of the content of the ek and the questions and on the basis of the results of experimental searching. The greater the accuracy of approximation, the more difficult it is to observe its one-to-one correspondence. This fact, which is well known to any classifier, also forces one to get by with the minimally acceptable accuracy.
The simplest method of indexing is the direct processing of the content of the ek and the questions according to definite rules without referring to any other data (with the exception of those rules). With the least accuracy of approximation this comes down to extracting from the content itself for each ek some one particular characteristic (or several independent characteristics) used as the standard name for that ek (type I). Examples of such names could be: the last name of an author or the name of an article, the name of a substance in some system of designations, or the principal subject of an article recorded in a definite form. When this degree of accuracy proves to be inadequate or the names obtained cannot be considered as standard ones (and it is difficult to standardize them), then the content of the ek and the questions is semantically subdivided into smaller units, for each of which the name described above may prove possible, on the basis of the characteristics which are encountered in them and which are accepted as the standard. This possibility is facilitated by the lesser concreteness of each part and, as a rule, the lesser number of all possible parts as compared with the total number of all ek or questions, and this makes it possible to standardize these parts more easily. This subdivision must be done on a one-to-one basis and must be carried out either according to definite sections (type II), or with the aid of other sufficiently unambiguous rules expressed with the aid of a standard structural formula for content. Since it is assumed that ek is taken as the sets of information which are not able to be subdivided into completely unrelated parts, the parts within the formula are either related by simple conjunctive relationships (type III) or by synthetic relationships (type IV). Examples of these parts for type II could be: year of publication, name of magazine, country, language, author’s last name, or principal subject, which are extracted from an article; headings of questionnaire data for a person; properties of substances from various sections. For type III, individual subjects, processes, conditions, methods, etc., for an ordinary sub-
ject heading could be extracted. Type IV can be illustrated by structural formulae of organic compounds.
With an insufficient degree of accuracy, such a subdivision can be further continued, provided that the first subdivision can be made on a one-to-one basis; in this process we shall come sooner or later to a small number of easily standardized characteristics. But if this subdivision, on the basis of meaning, on the basis of content of the ek or the question, proves to be an impossible one or cannot be carried out on a one-to-one basis (or an almost one-to-one basis, since several different subdivisions are acceptable), then it is necessary, when approximating the total ek or the question, or parts of them which were isolated as a result of the acceptable subdivisions, to resort to a more complicated method of indexing when referring to information not contained directly in the ek. This indexing is actually a classification, that is, a referring of the entire ek or the question, or parts of them, to one classification characteristic (heading)—or several independent ones—in a single system of standard characteristics of classification (type I), to one characteristic (or several independent ones) in each of several sections (Ranganathan facets) or classification systems (type II), or to several characteristics in one system jointly describing the content being indexed (type III).
Thus, there is a certain similarity between indexing and classifying (5). Indexing can be defined as direct, inductive classifying, which is more accurate than ordinary classifying, since it is also possible to take synthetic relations into consideration. Classifying can be viewed as less accurate, but a more nearly one-to-one, deductive indexing, where the role of the information language is played by the system of classification headings. Below we shall consider information language and types of information tasks from a more general point of view.
CONDITIONS FOR SELECTION
The conditions for selection evolve from the ordinary posing of the task of searching for information, when it is necessary to find—to collect—information devoted to each of the “subjects” making up the volume of the concept characterizing the question asked. This, by the way, is usually emphasized in the very way the questions are asked. For example, a person requests literature on the subject “properties of antibiotics,” thus wishing to obtain information on the properties of penicillin, or streptomycin, etc., and does not formulate the question as “the properties of an antibiotic,” which would mean the desire to find out information dealing only with properties common to all antibiotics. In other instances the way the task is posed cannot be reflected in
ordinary language, when, for example, information is requested on the subject “the effect of radiant energy upon bacteria,” although even here it is understood that it would be more correct to say “various types of action of various types of radiation upon various types of bacteria.” With the aid of a stricter formulation of questions it is possible to reduce all the varieties of questions to such a generic-specific scheme. For example, when a person requests information on the “effect of acceleration upon an aircraft,” he usually wishes to obtain information about “types of action of accelerative forces upon various parts of an aircraft,” not upon various types of aircraft. Therefore, not “aircraft,” but “part of an aircraft” should be taken as one of the concepts characterizing the question.
Thus, the content of the ek being sought as those dealing with the subjects which are part of the scope of the question-concept must be characterized by concepts whose content includes this question-concept. In this sense the logical function of the question, when satisfying the condition of selection, is the consequence of the logical function of the ek. The conditions for selection will be described in more detail when the individual types of information tasks are examined.
The characteristics pn, which form the universe P, and synthetic relations are analogous to the words and grammatical relations of ordinary language. Like the degree of approximation and the conditions for selection, they are determined by the nature of information contained in ek. However, the appearance of new ek may require the appearance of new characteristics to describe them, but the relations remain the same.
In order to observe the one-to-one correspondence of approximation it is desirable to have the characteristics (as well as the relations) not only standard, but also independent, that is, to assure that between them there do not exist any analytic relations which make it possible to substitute for some one characteristic another semantically similar characteristic or several characteristics which are united by means of the synthetic relations used in language. By analytic relations we understand here the constant semantic relations established between the characteristics on the force of a broader and older set of information than E. Obviously, the fewer relations that the language has, the more concrete and more varied the words must be to preserve the same degree of approximation, and, conversely, with the existence of many relations the number of characteristics and their concreteness can be fewer, and this makes it possible more easily to standardize them and ascertain their independence. The demand for independence of characteristics leads in the first instance to a com-
plicate representation of the less concrete, dependent characteristics in the form of disclosing their scope with the aid of enumerating a whole series of initial characteristics. In the second instance it leads to a complicated representation of very concrete characteristics in the form of compound complexes of a large number of initial characteristics (the content is disclosed). Ordinary language is rich both in relations and in words, and both are not independent, that is, use is made, for example, not only of a word designating a certain concept, but also words designating more general concepts (for example, generic for initial, which results in the non-one-to-one correspondence of representation of the content (without even considering synonyms), since instead of the first word it is possible to use its definition, consisting of other words which are related to one another. However, the presence of words for more general concepts does not make it necessary to describe them by means of a listing of words for the concepts making up their scope. Conversely, the presence of words for less general concepts does not make it necessary each time to give their complete definition by means of disclosing their content through words for more general concepts. Thus, the demand for the independence of characteristics leads to great difficulties in information language.
In order to surmount the difficulties mentioned, limitations are first imposed on the heterogeneity of content of the ek themselves. One information task does not include ek with contents which differ greatly with respect to degrees of concreteness. For example, searches for literature are not made simultaneously among books and articles. Analogous limitations are imposed on the questions asked, that is, it is required for the degree of concreteness of content of the questions to be within certain limits. Then choice is made of the most appropriate level of concreteness of the independent characteristics forming the universe P. With the aid of the generally used method of approximation, the characteristics from P must describe the content of ek and the questions with an accuracy to any permissible degree of concreteness. If during this process the above-indicated difficulties are not eliminated then one selects two or more sets of independent characteristics P′, P″, P″′, etc. without an intersection which correspond to various increasing levels of concreteness, and an approximation is made of the content of ek by means of the union of sets P=P′∪P″∪P″′· · ·, which thus consists of dependent characteristics. The content of the question is also approximated by means of a union of sets. In this process the above-mentioned difficulties are eliminated by using in the approximation of characteristics the most appropriate level of concreteness. That is, the particular content is described by characteristics which are as concrete as possible without using the listing of characteristics for disclosing the scope of the more general concept, if for some general concept this still proves impos-
sible even with the aid of characteristics at a lower level of concreteness, then it is considered to be beyond the limits of the accepted degree of concreteness of content and it is ignored, and, on the other hand, without the use of too complicated complexes of characteristics for revealing the content of a very concrete concept (if this proves impossible, the concept is replaced by a generic concept close to it). This method of approximation will be called the “natural” method.
Having avoided these difficulties, we nevertheless again come up against the possibility of the non-one-to-one representation of the content. Actually, the condition that characteristics which are as concrete as possible will be used requires the knowledge of all possible methods of representing them through less concrete characteristics, which (methods) can occur as a result of the dependence of the characteristics from P. On the other hand, the knowledge of these relations between characteristics is also necessary when searching for ek on the basis of generic characteristics, when what is requested is not a listing of all concrete characteristics, but a description of a somewhat broader content with the aid of less concrete characteristics. Thus, the manifestation, in an information task, of a universe of characteristics P and analytic relations between them is closely related to the establishment of the method of approximating the content of ek and the questions asked. This means that the development of a special information language is a necessary and, at the same time, the basic and most complicated, phase of making the process of task solution automatically controlled (1–3). This phase is obviously linked with works on terminology and classification and requires the knowledge of a broader scope of information than E, if the latter is not closed. The analytic links revealed between characteristics from P must be recorded in a form that makes it possible to locate them quickly and automatically when searching for ek, and when approximating the contents of ek and questions. The summarizing of these relations is similar to the set of cross references and other rules for the use of subject index. Thus, the locating of characteristics which are analytically related to one another constitutes an auxiliary information task (1), which, as a rule, deals with a broader content than E and which can be solved: (a) separately from the basic task when approximating the question asked, if only this does not result in a too complicated representation of the question, for example, in the form of listing a large number of characteristics revealing the volume of less concrete characteristics; (b) in the process of solution, when, for the content of the ek being examined, more general representations of it than that which is recorded are found very quickly in a special memory device and all the representations are compared with the question, which is approximated, like the content of ek in instance (a), by the natural method; (c) one time before recording the contents
of the ek with the subsequent recording of the result of solution of the auxiliary task when recording the content of the ek itself, that is, by limiting ourselves to natural approximation for the question, we shall make for the ek several approximations at various levels of concreteness, or, putting it more exactly, with a drop of one degree in the level of concreteness for each characteristic (a rise in the level of concreteness would, once again, require a complicated representation of the volume of less concrete characteristics, and a drop by several degrees of concreteness would again lead to a very complicated representation of very concrete characteristics, although sometimes this can be used with successful coding). For complicated information tasks with a broad volume of information from E, the first two methods of making the solution of the auxiliary information task automatically controlled are not very applicable, (a) because of the large amount of time expended when searching for ek with a complicated and, as a rule, unnatural representation of the questions, and (b) because of the absence of a high-speed memory device of sufficient capacity for auxiliary searches. The third method has found applicability in cases when, with the aid of proper coding, the recording of approximations at various levels of concreteness can be made economic enough. Actually, the principal shortcoming of the third method of solution of the auxiliary information task is the necessity of recording each characteristic as many times as it is encountered, not only in its natural form, but also in the form of representation through characteristics of a lower level of concreteness, which reveal its content; that is, it is necessary to give its definition each time.
Let us note that the first two methods of solving the auxiliary information task presuppose that the transition to approximation of the content of ek at a lower level of concreteness of characteristics can be made by means of representation of each individual characteristic through less concrete characteristics. In the third method of solution this cannot be presupposed; however, then this approximation will be made too labor-consuming and nonautomatic and will require a still greater increase in the number of entries. It will not be possible to reduce the number of entries by means of coding, since it will be necessary to give, for the entire content of the ek, several parallel, individual approximations. Therefore we shall assume in the future that this transition is allowed by the method of approximation itself.
Almost everything that has been said with respect to characteristics is applicable to synthetic relations; however, taking into consideration the fact that the number of different relations in a language is usually not great and therefore the standardizing of them and the ascertaining of their dependence upon one another do not represent any great labor, we shall not dwell on them in greater detail.
Having thus ascertained that the consideration of analytic interdependencies among characteristics, as well as among synthetic relations, leads to an auxiliary information task that can have parallel solutions (separately or jointly), we can consider that the universe P consists of standard characteristics pn which, when the basic task is solved separately from the auxiliary task, can be viewed as independent characteristics, with the approximation by them of the content of the ek and the questions being carried out on a one-to-one basis—by the natural method. An analogous assumption can also be made with respect to synthetic relations. The types of information tasks revealed below by this assumption will obviously also occur for auxiliary information tasks, which, if they are viewed separately, do not in any way differ in principle from the basic tasks (some of the characteristics of the basic task begin to play the role of elements of information in the auxiliary task). The type of any information task will then be determined by the combination of types of the basic and auxiliary tasks into which the information task is broken down.
Basic types of information tasks
This is the simplest type of task, to which the other tasks can also be reduced, with certain tasks belonging exclusively to this type as a result of the insignificance of the information contained in the ek. Here the content of each ek is characterized simply by one or several standard names. These name-characteristics can be completely independent of one another. They can be given to elements of information to a considerable degree arbitrarily. Names which are similar (or even identical) in spelling can correspond to ek which are completely dissimilar to one another. In instances when the ek is characterized by several names, these latter are linked purely accidentally and other, different ek can correspond to each of them.
The scope of the questions consists of the names encountered in all the ek and also of other names which can be constructed analogously to the former. Having in mind, here and elsewhere in the future, only elementary questions which cannot be represented in the form of several questions which can be solved by independent, separate searches, it is possible to say that each question consists of one name. The conditions for selection lie in the requirement of complete correspondence between the question and the name of the ek.
All semantic, classification relations are absent here. Examples of this could be: searches for telephone numbers, names of books, addresses, and the like, by persons’ last names; translations or definitions of words by words; descriptions
of magazines according to their standard names; and in general, code values by codes.
It is obvious that with complete independence of the name-characteristics, no auxiliary information searches are necessary and all that remains is the basic information task in its pure form.
Standard name-characteristics can also be nonindependent and can reflect the content of the ek somewhat more thoroughly, that is, they can be given in a nonarbitrary manner. The solution of the basic task here remains the same as before. However, in this instance the purpose of the auxiliary information task will be to reveal the (analytic) relations between the characteristics, and, as has already been stated, this task can belong to any of the types introduced. These relations can also find their reflection in the very spelling of the characteristics, that is, characteristics which are similar in spelling can correspond to dependent characteristics and to ek which are similar in content, and this can be used in coding to solve the auxiliary information task by the third method. This similarity is revealed partially in words characterizing similar concepts (searches for translations or definitions of words according to the words themselves). In every scientifically elaborated terminology and nomenclature, this similarity is reflected still more completely. Another example could be the searching for some substances according to the values of some one of their properties or, in general, the values of a continuous function according to the values of one argument.
In the second type of information tasks each ek is characterized not by one standard name, which cannot be the case, but more thoroughly by several sections—by one or several standard characteristics in each section, with several characteristics in one section being related purely accidentally, as formerly. Insofar as an elementary question, once again, contains here no more than one characteristic or its negation for each section, then, as in type I, any ek can be subdivided into several ek having no more than one characteristic in each section. The synthetic relations between characteristics are lacking here, if one does not include in them logical conjunctive and disjunctive relationships between predicate-characteristics and their negations. Actually, the content of any ek here (after the subdivision into ek containing no more than one characteristic in each section) is represented in the form of conjunction of the predicate-characteristic, and any nonelementary question in the form of a certain expression consisting of predicate-characteristics and their negations, united by conjunctive and disjunctive relations. The reduction of a nonelementary question to elementary ones is equivalent to the representation of that expression
in the normal disjunctive form, so that each elementary question is represented in the form of the conjunction characteristics and their negations. In conformity with the general understanding of conditions of selection, it is required here that the conjunction representing an elementary question, or the expression representing a nonelementary question, be the logical consequence of the conjunction describing the element of information ek which is selected out during the search. Since characteristics not included in the ek can be introduced with a negation sign into a conjunction describing it, this demand can be reduced to having all the characteristics of the question and their negations contained among the characteristics and their negations from the element of information. That is, the set of the non-negated characteristics of the question must be included in the set ek, but no negated characteristic of the question must be included in the ek. This could be expressed in another form as follows. By assigning the symbol 1 to each characteristic which is contained in the ek and the symbol 0 to each characteristic which is not contained in the ek, and substituting these values into an expression for the question (not necessarily an elementary one) which is viewed as a Boolean function, we shall select the ek only in the case that this function is reduced to 1.
Characteristics from various sections do not always depend (analytically) upon one another here. Within an individual section the characteristics may be not only dependent, but also independent. If the latter occurs for all sections, then obviously, auxiliary searches to determine analytically related characteristics will, as before, not be necessary.
Information tasks of the second type are a general instance with respect to tasks of the first type and are encountered much more often in practice. Only, as a result of the great complexity of such tasks, they have been reduced to tasks of the first type, when, for elements of information, we limited ourselves to the characteristics of the information contained in them only in one section, disregarding the other, occasionally no less important aspects. This corresponds to classification according to some one characteristic. However, as has been pointed out by many scientists, each event or substance is determined in the real, existing world by many coordinates, and a true classification must be multidimensional.
Searches for names and complete descriptions of substances, plants and bacteria, animals, and people according to the values of certain of their properties, names of diseases according to symptom complexes, bibliographical or even more complete descriptions of books, articles, and other publications according to the sets of characteristics from several sections (year, month, volume, country, language, author, type of publication, principal and secondary subject, and the like) and many other tasks (particularly statistical ones) pertain to this type.
They can briefly be characterized as the finding of values of functions of many independent variables.
The third type of tasks differs from the second type of tasks only by the fact that it is impossible to record all the characteristics for a small number of sections (the total number of sections, as it were, greatly increases), in each of which it is possible to have just one or several completely unrelated characteristics. Therefore, without dwelling on such tasks in more detail, let us note, however, that the isolation of them into an independent type is linked with the appearance of a whole series of complications as compared with the second type of tasks, which complications, of both a logical and technical nature, will be noted below.
This type of task includes, for example, searches for names of diseases according to groups of symptoms, among which it is difficult to carry out a classification for a small number of groups of mutually exclusive (in the question) symptom-characteristics. Pain, fatigue, general debility, loss of weight, tendency to perspire, etc., can serve as examples of such symptoms. Here too one can include searches for complex concepts according to the simpler concepts comprising them, if one ignores the relations existing between the latter within the complex concept. Such searches can be encountered when preparing classifications, elaborating terminology, determining subject headings for subsequent basic searches of literature, etc. Let us note only that even here we assume that the characteristics have been properly standardized.
The fourth type of tasks differs from the second and third type by the fact that for elements of information and the questions asked, not only the characteristics are indicated, but also the synthetic relations between them within these elements and the questions. Once again, the relations, like the characteristics, are assumed to be strictly standardized.
The recording of the content of elements of information and the questions asked by means of interrelated characteristics is the most complete and most nearly perfect. The phrases of one language constitute one of the forms of this recording, although, true, a non-one-to-one and nonstandardized recording. Word order, expressions set off by commas, word combinations consisting of a modified word and modifying words agreeing with it in person, case, or tense, and other methods of recording the relations between the words of the sentence are the subject of a special science—grammar. It is also possible to cite examples where the characteristics are not necessarily situated consistently
one after the other, but, as it were, correspond to points in space forming, together with the depictions of the relations, complicated spacial complexes. Such, for example, are structural formulae of organic compounds or geometric complexes themselves which characterize mathematical spaces.
A special variety of the fourth type of tasks is the search for structures where no distinction whatsoever is made between the characteristics in and of themselves, but consideration is made only of the relations among characteristics of the same type—points in space. If we do not make the distinction between relations, that is, if we consider that between each pair of characteristics, for example, there either exists a relation of a completely definite type or there does not, then we shall obtain an extreme instance, an example of which can be searched for linear complexes—graphs. Let us note that when searching for structures the distinction between characteristics and relations is beginning to become lost, and sometimes it proves more profitable to change over to a dual representation of mutually related characteristics, assuming the relations to be characteristics, and vice versa.
The questions in the fourth type of tasks consist, as a rule, of mutually related characteristics. A certain difficulty here lies in determining more accurately the conditions for selection. Apparently it is most natural, apart from the simple inclusion of the characteristics of the question or their negations among the characteristics or negations of the sought elements of information, to require that the relations existing in the question between the coincided characteristics of the question and the element of information be the consequence of the relations existing among the same characteristics in the ek. Let us recall that we, as before, feel that with the existence of dependency among standard relations (as between characteristics), expanded conditions of coincidence are checked when solving the auxiliary information task.
By means of introducing new characteristics instead of relations, the fourth type of information task can sometimes be reduced to the second or third type of task. It is possible, first, in those instances when any relation between any characteristics can be replaced, without violating the accuracy of approximation, by assigning to each of these characteristics a so-called “role indicator” (3) describing the relation of the characteristic to the entire content of the ek. For example, instead of an indication in a sentence with relations of the subject-object type with the aid of prepositions or case endings, it is possible to add the term “subject” to the word which is the subject, and to add the term “object” to the word which is the object, taking both words in the basic grammatical form (for example, for nouns, the nominative singular) and disregarding the prepositions. Introducing instead of each characteristic from P a group of new characteristics, each of which characteristics consists of the former
characteristic and one of the “role indicators,” we shall obtain a new set of characteristics, with the aid of which the content of ek is approximated according to type II or III, but no longer with any indication of synthetic relations. The second instance when it is possible to eliminate the relations is the instance when the relation is established not between concrete characteristics, but between whole sections existing in the second type of tasks, ignoring the specific characteristics which exist in these sections for the ek being examined. Such relations are actually new characteristics describing the entire content of the ek. Putting it more exactly, each new characteristic is made up of the symbol for one of the former relations and the names of the sections united by means of that relation. Joining these new characteristics to P, we shall also reduce our task here to the second or third type.
Thus, in the future we can include in the fourth type only those information tasks which cannot be reduced, by means of changing the set of characteristics P, to type II or III tasks. Since in the fourth type of tasks the relations characterize not directly the ek themselves, but are built up on the characteristics from P, it is no longer possible here, in contrast to types II and III tasks, to limit oneself to one-place predicate calculus, and in order to describe the logical nature of these tasks it is necessary to have a more complicated logical calculus. An example of the fourth type of task could be searches for structural formulae of organic compounds according to assigned fragments of those formulae, as well as, in general, searches for ek with rather concrete content, when it is approximated by very general, independent characteristics. This situation requires the introduction of relations at least of the type of groupings in order first to describe smaller and less concrete parts of the content of the ekby means of these characteristics, and then to describe the entire content of the ek with the aid of groups of characteristics thus united. The latter task is encountered especially frequently when making a combined solution, by the third method, of a basic and auxiliary information task, when the dependent, more concrete characteristics are represented by less concrete characteristics.
CODE DESIGNATIONS AND SYMBOLS
Before describing some methods of solving information tasks of the types introduced above, we should like to discuss a few general considerations regarding the coding of standard characteristics and synthetic relations. This coding is necessary for making the process of solution—the comparison of the contents of the ek and of the question—automatically controlled. In order to
carry out the searches we must obviously record the approximations of contents of all ek obtained during the process of indexing. This information must be recorded on some medium: in the lines of the subject index of the table of contents or table; in paragraphs of reference books, encyclopedias, etc.; on library catalogue cards; on punched cards, punched tape, magnetic tape, photographic film, etc. In addition to recording the approximation of the ek, it is sometimes possible to record the content itself, although frequently one limits himself merely to recording a standard name or number assigned to each ek. According to the names or numbers found in the process of searches, one can then get to the content of the ek itself, that is, once again the first type of information task is solved. Therefore, this method of describing the ek must be avoided as much as possible if the task itself belongs to the first type. The approximation of the question must also be recorded somewhere: on a question list or in the memory of an information machine. Whereas during nonautomatically controlled searches the characteristics and synthetic relations are recorded in both approximations in the usual standard form—in the form of words, numbers, or other symbols—during automatic searches use is made not of this form, but of code designations consisting of groups of code symbols situated one way or another on the medium and in the memory. Code symbols can be, for example, notches or perforations on punched cards or punched tape, magnetized spots on magnetic tape, transparent squares on photographic film, and the like. In the memory these symbols can be represented by raised rods, closed contacts, etc. Ignoring for the time being in this discussion of coding the creation of code symbols, we shall consider that, in conformity with generally accepted considerations (according to the “yes or no” principle), use is made, both on the medium and in the memory, of only two possible states, one of which is assumed as the presence of the code symbol and can be designated by the figure 1, and the other as the absence of the code symbol and is then designated by 0. Thus, the code designations of the characteristics (and sometimes of the relations, about which more details will be given below) are represented in the form of binary numbers. In addition to the value of the number for the configuration of the code symbols within the field, various characteristics can also be distinguished by the location of that field on the carrier or in the memory set aside for the recording of the code designation.
The following types of codes are distinguished according to the location of the fields set aside for code designations of the individual characteristics:
Local code, when one knows beforehand the place where the field set
aside for recording some individual characteristic from P must be located; obviously, this is possible in the case of type I and type II tasks, assuming that ek has just one standard name or no more than one characteristic in each section.
Direct code, which is a particular instance of a local code, when in each section it is possible to record only the presence or absence of a single characteristic. In this instance, the extent of the field is reduced to the minimum necessary for recording only either of the two possible code symbols 1 or 0. This code is applicable to type II tasks with an increase in the number of sections and a simultaneous decrease in the number of all possible characteristics capable of being found in each separate section to 1, or for the type III tasks, when the sections are absent but the total number of characteristics is not great and is equal to the number of the fields of the direct code.
Nonlocal code, when one does not know beforehand the location of the fields set aside for recording the individual characteristics, as is typical for type III and type IV tasks. The fields are situated consecutively so that they can be examined consecutively.
Superpositional code, which serves to eliminate the difficulties linked with the use of the nonlocal code, when all fields for the characteristics whose location cannot previously be anticipated are no longer situated one after the other, as in the case of the nonlocal code, but are superimposed on one another, that is, they coincide. This code can be used in the case of type III and type IV tasks only with certain limitations deduced with the aid of the theory of probabilities, since when fields are superimposed, there are formed parasitical configurations of symbols, which correspond to characteristics which are not actually included in the approximation of the ek, and this leads to the selection of excessive, unnecessary ek.
BASIC DEMANDS MADE UPON CODING
As we have seen, the use of certain of the above-enumerated codes is linked with the types of information tasks. In addition, it is also linked with one of the demands made upon coding—the demand of economy of recording. Actually, the establishment of a mutually one-to-one correspondence between sections and fields in the case of local code makes it possible not to record in the form of a configuration of symbols for the name of the sections themselves. In the case of direct code, this leads even to minimum configurations—to the use of just one of two symbols (0 or 1). On the other hand, in the case of the presence in the ek of only an insignificant portion of the characteristics recorded by the direct code, we should have to reserve on the medium much additional place, and this is specifically precluded when converting to nonlocal code.
The second group of demands made upon coding evolves from the demand for simplicity in making automatically controlled the very process of comparing the code values of the characteristics from ek and the question. From this point of view the most complicated is the nonlocal code, since, when it is used, it is necessary in general to compare each characteristic from the ek with each characteristic of the question, whereas the knowledge of the place where the particular characteristic can be situated makes it possible to avoid this when using the other codes. Therefore, when using simpler devices for searches, the nonlocal code is replaced by the superpositional code. But this group of requirements also imposes limitations on the configurations of symbols within the fields of the local and nonlocal code (for the superpositional code there exist still greater limitations which are obtained, as has already been mentioned above, with the aid of the theory of probabilities, and which we shall not consider here). Assuming that code values of all characteristics recorded in some field of the local code are binary numbers with equal number of bits, this limitation can be formulated as follows. A binary number for any characteristic from one and the same field (section) must contain an identical number of 1’s (1, 2, 6). This requirement is formulated analogously for all characteristics recorded by the nonlocal code. This code is called a “uniform” or “selector” code, as distinct from simply a binary code, when this limitation is absent.
The third group of demands made upon coding evolves from the demand of simplicity and automatic control in finding the code values themselves. In general the finding of the code designations of characteristics according to their standard characteristics represents a type I information task and can be arbitrarily called “tabular” coding. Therefore, for type I tasks a method of finding the code designations analogous to the recording of the content of the ek on the medium in the form of a reference number, which was mentioned above, is not very applicable. An alternative to this method is the direct obtaining of the code designation with the aid of definite rules from the characteristic’s standard name itself, which can be called “order” coding. Such, for example, is the usual order coding of the characteristic where the orders are understood as definite places upon which the standard symbols—letters, figures, and the like—stand in the characteristic. Since there usually are not many symbols used for recording a particular characteristic, it is easy to remember them or automatically obtain their code designation, from which a code designation of the entire characteristic is then made up. Before obtaining the code designations of the individual symbols, it is possible to have a standard conversion of the characteristic, for example, the reduction to a definite number of orders by means of the rejection of certain symbols, transposition and combination of
symbols, etc. Standard conversion is also possible on the obtained code designations of symbols from the already converted characteristic. For example, a binary-decimal representation of numbers recorded in a decimal system of numeration can be converted into a purely binary representation, and to this it is possible to include the superimposing of code designations for individual symbols one upon the other, which is analogous to superpositional coding. All these conversions must be such as will preserve the mutually one-to-one correspondence between the characteristics and their code values and such that, at the same time, the number of orders in the code designations will be as small as possible. Here it is necessary to note that the third group of demands made upon coding lies, as in part also for the second group, in the contradiction with the demand of economy; and therefore in practice it is necessary to limit oneself to a compromise solution.
We spoke earlier only about coding the content of ek, approximated by standard characteristics accepted when viewing the basic types of information tasks as independent ones. With the simultaneous solution of the basic and auxiliary information tasks by the third method, analogous demands also occur with respect to the coding of the dependent characteristics, since the less concrete characteristics approximate such more concrete characteristics for one of the four types considered above. With the aid of fortuitous coding it is sometimes possible to avoid a considerable increase in the volume of recording each dependent characteristic, if one reduces its former code designation to a combination of code designations approximating its less concrete characteristics.
CODING OF RELATIONS
Leaving to one side other, more detailed demands made upon coding, let us dwell in conclusion upon the coding of synthetic relations (1, 2). These relations are reflected onto the medium by the procedure of following the characteristics recorded by the nonlocal code and by the proper location alongside of the related characteristics of the code designations of those relations, each of which, once again, can be represented in the form of a binary number, or simply by a successive indication of the ordinal numerals of the related characteristics together with a code designation of this relation itself. The first method is similar to the local code for characteristics, and the second for nonlocal. In the memory the relations are reflected by the method of introducing into the comparison the question characteristics contained in it, by means of a particular cross-plugging of the detecting elements which detect the result of the comparison of each question characteristic with the characteristics of ek, by the procedure in which these detecting elements must operate, and also, as for the relations recorded on the medium, by the proper location of the code designa-
tions of the relations in the memory. In view of the fact that the types of synthetic relations have not as yet been sufficiently ascertained, the more detailed consideration of the methods of coding them in abstracting from concrete information tasks is a difficult one.
Some methods of solving information tasks
TYPE I TASKS
To solve such tasks, most often one uses ordinary dictionaries, reference books, encyclopedias, printed subject indexes (with standard subject headings), tables, card files, and the like. The characteristics pn∈ P assigned to a particular ek∈ E in its standard, uncoded form, are situated here in lexicographical order, and alongside of each characteristic one finds the answer En⊂ E consisting of all the ek approximated by that characteristic. Therefore, in order to find for some question characteristic the answer En, that is, the characteristic pn coinciding with it, there is no need to examine successively the whole set of elements of information E, which is reduced here to a successive examination of the characteristics from P, but it is sufficient, for a small number of steps, to subdivide successively the entire E (or P) into as many subsets as there are different symbols which can be in the first order of the characteristic , and then, having selected the subset corresponding to the symbol contained in that order of , to subdivide it in conformity with symbols from the second order of , etc., until one finds the required answer in the form pn, alongside of which the content of all the ek from the subset En is directly recorded. In this process, in order to avoid duplication of descriptions of all ek which are approximated by several pn, each such description is placed only under one characteristic, to which references under other pn are then given. In order to facilitate the process of subdividing E (or P) into subsets, the latter are placed in card files in different drawers or are subdivided by special inserts: similar arrangements are used in printed publications.
The described “devices” are the simplest parallel-action devices, since the question asked is simultaneously compared with many ek, not one at a time, as occurs in consecutive-action devices. The former devices, when carrying out the search in a small number of steps, prove, as a rule, to have a much higher speed than the latter (even extremely improved ones), but are more cumbersome, since the comparison is made simultaneously with many ek. Therefore, the simplest “devices” prove to be well adapted for solving type I tasks with large numbers of elements of information ek (on the order of millions or hundreds of thousands), if it is not necessary for the time spent for the search to be very small and the searches themselves to be made automatically.
If such demands are present, one uses the most diverse varieties of parallel-and consecutive-action decoders. The characteristics pn and are represented in coded form, and, as has been mentioned already, in the event of type I tasks the coding must be done “by order” or otherwise we would again return to a type I task solved nonautomatically with the aid of the simplest “devices.” For the same reason the answer must be given by the decoder in the form of the complete content of the information from ek∈ En and not in the form of the intermediate names or numbers of those ek, if only the number of all the ek∈ E is not many times less than the number of all pn∈ P, which usually does not occur, since each ek is approximated, as a rule, only by one pn. Examples of such decoders could be, in the case of very small numbers of ek (on the order of several tens), numerous varieties of mechanical and electrical decoders which very quickly give the value of the symbols encoded by their codes, and, in the case of small numbers of ek (on the order of several thousands), various types of internal high-speed memory devices of modern numerical computers which produce on the “number lines” the content of the ek almost immediately after the is fed to the “address lines.” For medium numbers of ek (on the order of tens of thousands), the creation of devices which are just as high-speed proves to be extremely difficult, and, in addition, the applicability of such devices to solve general type I tasks is greatly limited by the small capacity of the memory cells set aside for the recording of the content of the ek and by the necessity of spending time and foreseeing the method for extracting and decoding these records (recording of nonliteral and nonnumerical content here is almost completely precluded). Consecutive-action decoders, which use punched cards, punched tape, magnetic tape, photographic film, etc., are only partially free of the shortcomings mentioned and, although they do possess the proper automation of searches, in the case of medium numbers of ek, despite the complexity of the decoders, they do not produce a reduction in the time of search as compared with the simplest “devices.”
Mechanical and electromechanical parallel- or parallel-consecutive-action decoders prove to be better adapted for solving general type I tasks. These decoders are sufficiently automatic and possess, in the case of medium and large numbers of ek, higher speed of searching than the above-mentioned consecutive-action decoders or the simplest “devices,” and, in the case of small numbers of ek, higher speed than the simplest “devices,” but a somewhat slower speed than the high-speed memory devices of computers. Examples of such decoders could be the Amfis system proposed by Avakian (7) for large numbers of ek and a simplified variety of the decoder, proposed in the next section, with superimposed punched cards for medium numbers of ek. With the aid of these decoders the carrying out of that small number of steps into which the
searches, in the case of the simplest “devices,” are broken down, are made immediately automatic and are accelerated, and the frame of microfilm that is found as a result of the searches, with the content of ek∈En recorded in the usual form, appears on the screen of a microfilm viewer or on a television screen (for transmission to a distance). The use of microfilm to record the content of the ek makes it possible, when using these rather simple decoders, to obtain, in addition to the automation and acceleration of searches, a considerable decrease in the volume of the entries themselves. Such devices can therefore be considered the ones with best prospects for the rapid and automatic solution of type I tasks and, in particular, for finding translations of words in other languages, for the rapid obtaining of information concerning addresses, telephone numbers, and the like, and for other similar tasks.
TYPE II TASKS
An estimate of the number of all possible questions which can be made up for all ek∈E in the case of type II tasks shows that this number usually proves to be much greater than the number of all possible ek and pn. Therefore, the use of the simplest “devices” in which all these questions are located in lexicographical order becomes here, as a rule, practically impossible as a result of the excessive increase in their volume.
When using parallel- and consecutive-action decoders, the characteristics can now be encoded not only by “order,” but also by the “tabular” method, and it is possible to obtain as answers intermediate names or numbers of the ek sought, according to which intermediate answers one can then, as in a type I task, make an additional search of the contents of the ek themselves. In other words, in order to facilitate the solution of type II tasks, it is considered justified to reduce the individual stages to the solution of tasks of a simpler type, type I.
The process of solution with the aid of parallel-action decoders is broken up, as in the case of the simplest “devices,” into a not very large number of steps, during which a search is made for all the smaller subsets from a set of elements of information E. However, because of the absence of the completely systematic location of the ek (the impossibility of locating in lexicographical order all the questions pertaining to them), it is not possible here to limit oneself only to the consecutive isolation of subsets inserted in one another, but it is necessary also to search for the intersection of rather large subsets from E which correspond to the individual characteristics of the question. And this does not make it possible to use, in the case of type II tasks, certain of the simpler parallel-action decoders, as, for example, the Amfis system. Putting it more exactly, the process of searching can be described as the finding of the answer E★⊂E in the form:
where En1, En2,· · ·, Ena are subsets from E which correspond to the characteristics of the question , forming the set P★⊂P.
The subsets En1, En2,· · ·, Ena can be easily found by the characteristics with the aid of the simplest “devices,” as was done in type 1 tasks, if all the ∈P are situated in lexicographical order. If this situation is made for each section separately, then it is also possible to carry out the simultaneous search for all assigned characteristics. These subsets must, however, contain not the complete descriptions of the ek included in them, but only the intermediate names or numbers of the latter, in order to avoid the duplication of those descriptions (each ek is usually approximated by characteristics from all sections) and to facilitate the finding of the intersection of the subsets obtained. The nonautomatic finding of the intersection is used in Taube’s coordinate indexing system (8) and is actually the sole possibility of using the simplest “devices” for solving type II tasks. Such a method of solution proves to be not only labor-consuming, but also requires a rather large amount of time, without even mentioning the necessity of solving several type I tasks (one task for each question characteristic, and one task for each finding of the complete descriptions of the ek according to their intermediate names or numbers), and therefore it cannot be considered satisfactory.
Other parallel-action decoders used to solve type II tasks include punched cards with notches on the edges and superimposed punched cards (11) (aspect cards). These devices are to a certain degree dual one for another, as can be revealed from an examination of the principle of comparison of characteristics (1), which is common not only to them, but also to the majority of other decoders.
Let us assume that two characteristics are compared: pn⊂Pk, where Pk⊂P is a set of characteristics approximating the corresponding ek, and ∈P★, and let us assume that the code designations having the identical number of bits, the designations of pn on the medium and of in the memory, contain an identical number of code symbols (represented in the form 1 on the medium and in the form in the memory), that is, the code is a “selector” code. Then it is not difficult to show that the characteristics coincide in the instance, and only in the instance, that
where M1 and M0 are sets of bits of the code designation for pn which contain 1 and 0 respectively, and and are analogous sets for . In this process, the bits of the medium and the memory which mutually correspond to one another in a one-to-one relation are considered to be identified. Depending upon how one realizes the pairs of states 0 and 1, and and , and the signals θ and I characterizing respectively the absence or presence of intersection of sets M, one obtains different designs for decoders.
In the case of punched cards with notches on the edges, 1 is taken as the notched cell, is a raised or inserted rod, and θ is the absence of meeting between the rod and the unnotched cell causing the drop or falling out of punched cards corresponding to the sought ek. Characteristics from Pk and P★ are encoded here according to local (in particular, direct) selector code, and the comparison can be done immediately for all characteristics from P★.
In the case of superimposed punched cards, 1 is taken as the punched-out cell, is the superimposed card, and θ is the passage of light through the cell, with the absence of an obstacle in that cell on the superimposed punched cards , which passage of light detects the ek corresponding to the open cells. Characteristics from Pk and P★ are coded here usually only according to direct code, although later (1, 9) it also proved possible to use a local selector code.
Thus, the pair consisting of rod (light) and moving punched card in the first instance realizes the pair , and in the second instance the pair , in which the above-mentioned duality of the devices considered manifests itself.
The devices described do not provide for complete automation of searches or for a considerable reduction in the time required for searching, as compared with the simplest “devices,” although they are more effective than these latter. Therefore, the question arose concerning the development of more nearly perfect parallel-action decoders, and the use of consecutive-action decoders become justified (as distinct from type I tasks). More details about these decoders will be given in the next section, but here the only thing remaining is to recall that with medium numbers of ek (on the order of tens of thousands) consecutive-action decoders do not have a sufficiently high speed of search. This speed, although it does exceed the speed of the devices just described, is still, even for extremely complicated decoders, on the same order as the speed of the simplest “devices” when solving type I tasks. For large numbers of ek this ratio between the speeds of searching becomes still more unfavorable for consecutive-action decoders; their speed of searching must be almost the same, for example, as that of the last of the devices described in this section. Thus, the most suitable device for solving type II tasks must apparently be some kind of improved parallel-action decoder, which is somewhat more complicated than the Amfis system, but is simpler and faster than high-speed consecutive-action decoders. This finding is also supported by the fact that the locating of subsets En1, En2,· · ·,Ena is reduced to the solving, by an already known method, of a type I tasks, and only the finding of the intersection of those sets can cause difficulties.
nal high-speed parallel-action memory devices in present-day numerical computers. As distinct from type I tasks, the problem solved here is the extraction of the sought ek not in the form of a complete description of the information contained in them, but in the form of intermediate number, which situation corresponds more to the possibilities (volume of cells in the memory) of those decoders, but, on the other hand, now it is necessary to have a separate output for each ek, and it is impossible to get by with general output “number lines.” Actually, whereas formerly one answer E★, consisting, as a rule, of a small number of elements of information ek, was prepared for each question, and this answer was produced on the general output lines, now, as a result of the extremely large number of all possible questions, each ek is recorded separately on the “address lines,” and since all the ek from E★ are searched simultaneously, it is no longer possible to lead them out onto the general output lines. In such decoders can be taken as the presence of signals on the input “address lines,” 1 as the presence of a connection of the output line corresponding to a separate ek with an input line by means of a diode, capacitance, resistance, or the like, and θ is the absence of signals on the output line , which absence detects the necessary ek (having taken as 1, and also for the opposite state, we shall come to a still more obvious analogy with punched cards which are notched on the edges).
For medium numbers of ek, the described decoders produced have as yet, unfortunately, been too complicated, and this makes it possible in this instance to attempt to improve immediately the system of superimposed punched cards, where the finding of the intersection of sets En1, En2,· · ·, Ena is successfully made automatic, requiring, however, the recording of each set according to a direct code, that is, on each card (or set of cards) cells are set aside for all possible ek. The essence of the improvement proposed lies first of all in the fact that the characteristics pn are now recorded not according to the direct code, when each pn has its own card, but according to a local selector code, which makes it possible to reduce sharply the total number of cards (1, 9)—to reduce them to the number of bits in the code designation of the characteristics from all sections, and secondly, lies in the replacement of the superimposition of punched cards by a slight shift (1) of the always superimposed cards into an examination position similar to that used, for example, in the Card Translator. If each ek is described by no more than one characteristic from each section, as occurs or can be achieved in type II tasks, then after the shifting of cards corresponding to the set of bits , for each characteristic of the question those, and only those, cells which correspond to the necessary ek will allow light to pass through. Since the cards can be shifted automatically, for example, by means of electromagnets, the entire process of finding E★
proceeds extremely rapidly. With order encoding of characteristics from individual sections it is not necessary first to solve a type I tasks, since they are automatically solved on the decoder itself, and with the use of the position of each open cell on the surface of the card as a direct indication of the corresponding frame of microfilm it is possible automatically to receive on the screen a complete description of the ek found.
If in the described device one limits himself only to one section, in which any characteristic is assigned completely, the latter can also be encoded by a binary, but not necessarily a selector, local code, which leads to a still greater decrease in the total number of cards, but which requires, however, two cells for the recording of each ek. Such a simplified device can serve, as was mentioned above, for solving type I tasks. Let us note in conclusion that a still greater decrease in the number of cards can be achieved if one refrains from using just two code symbols; however, in this process, in order to record each ek it is necessary to have a number of cells which is one greater than the number of code symbols, and, instead of the simple shifting of each card, it is necessary to shift it into one of several different positions, the number of which is equal to the number of code symbols.
For large numbers of ek (on the order of millions or hundreds of thousands) the use of the device described proves to be a difficult one, since, in order to set aside on the card a cell for each ek, it is necessary to use cards which are too large (the size of the cell is limited from the bottom, in order to provide for the unhindered passage of light through all the punched cards). The replacement of each card by a series of cards leads to a manyfold duplication of the entire device. The way out of this position must apparently be sought (if one does not return again to complex nonmechanical decoders) in rejecting the direct code for recording sets En1, En2,· · ·, Ena, and in recording the answer En for each characteristic pn according to a nonlocal code on a group of punched cards or on a piece of punched tape, photographic film, or the like, in the form of a sequence of code designations of numbers of all ek∈En. Having found consistently the En corresponding to the individual characteristics of the question, by means of the simplest “devices” or a device of the type of the Amfis system we isolate the En which contains the smallest number of ek. Then the numbers ek from that En are introduced into the memory device of a consecutive-action decoder, and the content of each other found En is consecutively examined by that decoder. Those numbers in the memory of the decoder for which the coincidence in examination of each En will be detected will correspond to the sought ek forming the intersection of all the located En. Such a parallel-consecutive method of searches will obviously prove more profitable than the purely consecutive method if the number of ek from all the
examined En, multiplied by the quotient obtained when the number of ek from the En placed in the memory is divided by the number of numerical designations in that device, is much less than the total number of ek∈E. With a large number of characteristics pn in each section and a not very large number of the sections themselves, such a situation ordinarily occurs. Actually, considering that the characteristics in each section were chosen to be as homogeneous as possible in their concreteness and that approximately identical numbers of ek correspond to each of them, and remembering that each ek is described by no more than one characteristic in each section, we can approximately evaluate the capacity of the En as the quotient obtained when the number of ek∈E is divided by the number pn from the corresponding section. Obviously, the highest degree of automation will be achieved in such a device if the input of En into the memory and the examination of the remaining En are carried out directly from the screen of a system on the type of Amfis, for example, with the aid of an iconoscope.
In conclusion let us note that with the presence of a negated characteristic in the question, the methods of solutions remain approximately the same, except that the ek found for the negated characteristic, if it is viewed as nonnegated, are not left in the intersection of the En corresponding to the nonnegated characteristics, but, on the contrary, are rejected. Sometimes such ek are revealed when this intersection is compared with a new intersection obtained when a negated characteristic viewed as a nonnegated one is added.
TYPE III TASKS
In order to solve these tasks one may use the parallel-action decoders described in the preceding section, only in the instance that the characteristics of pn from the set Pk which describes the corresponding element of information ek, are recorded by direct code, that is, each pn∈P is given its own place in the simplest “devices,” its separate card in a system of superimposed punched cards, its cell on punched cards with notches on the edges, and its input line in electrical decoders. The attempt to use a local code to record the characteristics would lead to a situation in which, not knowing in what field a particular question characteristic could be found, we would have to ask the question by very many methods, the number of which, at best (with a systematized sequence of characteristics), is equal to the number of all possible combinations from among the fields of the local code for the number of question characteristics. With, as a rule the large number of all possible pn the last two decoders just listed cannot be considered practically feasible. Thus, the only methods remaining are the first two, relatively nonautomatic methods, and of the improved decoders, only the last of those described in the preceding section.
In order to make use of the remaining parallel-action decoders it is necessary to record the characteristics from Pk according to a superpositional code. As was already mentioned earlier, it is possible during this process, however, for superfluous, unnecessary ek to appear during the searches. Without considering here that possibility, which has been worked out in detail by many researchers, let us change over to a brief description of consecutive-action decoders. Although, in conformity with what has previously been set forth, they can be considered to be satisfactory only for medium numbers of ek, as a result of the absence at the present time of the last of the improved parallel-action decoders described in the previous section, they are the only devices for the automatic solution of type III tasks (without the use of superpositional code).
The basic shortcoming of a consecutive-action decoder is the necessity of a very rapid, consecutive examination of the content of each ek. Sectors of the medium which contain the code designations of the characteristics from Pk which are recorded by a nonlocal code are consecutively passed or switched up to the scanning device of the decoder. Then each pn from the Pk is compared with all the from the set of question characteristics P★. If the P★ is included in the Pk (P★⊂Pk), the corresponding ek is selected.
If the medium is a discrete one (punched cards, cards made out of film, or the like), it is possible to make a consecutive comparison first with one ∈P★, then to compare the ek selected with another ∈P★, etc. Taking into consideration the fact, which was already mentioned above, that usually only a small number of all the ek∈E proves to be approximated by a certain characteristic, it may be considered that the number of repeatedly examined ek will decrease rapidly, that is, it is completely admissible, with a discrete medium, to use the simplest decoders with a memory only for one question characteristic in order to solve type III tasks (an example could be the Samain (12) selector—the Filmorex system).
For a continuous medium it would be necessary by such a method to rerecord the content of the ek selected during each examination. Therefore, the comparison of each pn∈Pk is carried out immediately with all ∈P★. The decoder required for such a comparison is analogous to the electric, electronic, or other decoder described in the preceding section, in which decoder the role of the question characteristic is played by an individual characteristic pn∈ Pk, and the role of the code designations of the pn is played by the code designations of all ∈P★ which are recorded on the input lines with the aid of diodes, resistances, etc. Thus for 0 one can here take the presence of a signal on the input line, for the presence of connection of the output line of the corresponding individual to the input line with the aid of a diode, resistance, etc. (or directly in the case of a single in the question), and θ is the absence
of a signal on the output line , which absence detects the coincidence of the corresponding with the pn being examined. If, after the examination of a certain ek, the coincidence will be recorded for all ∈P★, then this ek is selected or its content is re-recorded. Since the number of in the question is usually not large, the decoder proves to be extremely simple, although it does require a special detecting element for each . The outputs of the detecting elements can be joined by means of switches in such a way as to detect not only the value of the conjunction of the nonnegated characteristics, but also the value of an arbitrary Boolean function of the question characteristics (1, 2, 6, 13, 14), that is, the ek can be selected during one examination even for a nonelementary question (in particular it is possible to search simultaneously for many questions if the decoder has a sufficiently large memory).
Depending upon the type of medium, the signals 0 can be fed to the input lines of the decoder or directly from the brushes through the holes in the punched card or punched tape, or, after amplification, from the heads picking the magnetized spots of the magnetic tape or drum, or from photocells reacting to the transparent spots of photographic film or photodisc, etc. Other designs of decoders are also possible, provided they are of such a high speed that they do not limit the speed of examination of the ek. Since they contain a small number of , a certain complication of them (both conditions, and , are checked) is also possible, and this complication makes it possible to record characteristics pn and pn according to a binary, but not selector, code. The medium with the best prospects is apparently photographic film (frames of film, photodiscs), not only as a result of its high degree of resolution, but also because of the possibility of recording and extracting complete descriptions of ek, instead of just intermediate numbers, which would require an additional solution of a type I task.
Thus, with medium numbers of ek the devices that can be considered the best adapted for the automatic solution of type III tasks are consecutive-action decoders or an improved decoder with superimposed punched cards and with the use of a superpositional code, but with large numbers of ek the best device is the improved parallel-consecutive-action device which was described at the end of the preceding section.
TYPE IV TASKS
Generally speaking, only consecutive-action decoders can be employed for the solution of these tasks, since the synthetic relations do not characterize directly the ek themselves, but are built up on characteristics from ek. Parallel-action decoders could be employed here only for the preliminary rough selection of ek according to which are assigned, as in a type III task, without any
indication of the relations. Although the number of ek found during this process proves to be less than the number of all ek∈E, nevertheless it is much greater than the number of the ek sought, or otherwise there would be no reason to show the relations at all. For medium numbers of ek the automatic solution of a type III task (with the aid of a parallel-action device) produces, as we have seen, only the numbers of these ek, and therefore the locating by numerical designations of a rather large number of complete descriptions of these ek, as in type I tasks, can occupy too much time for subsequent searches taking the relations into consideration.
Consecutive-action decoders used to solve type IV tasks differ from those mentioned in the preceding section only by the presence in the memory of additional detecting elements, switches, and counters for comparing the relations. Taking into consideration the certain lack of definiteness and lack of study of this type of tasks, it is for the time being difficult to give a general description of the requirements made upon the composition of such memory elements. For example, with very complicated relations it may be necessary to have a whole arsenal of facilities existing in modern numerical computers (15). With the existence of only the simplest relations of the grouping type (the uniting of some characteristics from Pk into groups which are then viewed as independent characteristics), it is sufficient to combine the outputs of the detecting elements corresponding to the characteristics grouped in the question, through the switch “and” and then to feed the output from this switch to an additional element which detects the inclusion of that group into a grouping of characteristics from Pk after the examination of that latter grouping, which situation is signalized by means of special symbols, “brackets”—the code symbols are recorded on the medium in definite places at the beginning and end of the grouping. Devices possessing such possibilities are, for example, the Experimental Information Machine (1, 2, 6) of the Institute of Scientific Information, Academy of Sciences USSR (EIM), the WRU selector (13), and the planned Kodak Minicard system (14).
In the event that the characteristics from P are too general, as can occur with the presence of a complicated system of relations, it sometimes proves possible to have very many methods of establishing the mutually one-to-one correspondence between ∈P★ and the pn∈Pk coinciding with them. Since only for certain of these methods can the relations between be generated by relations existing between the characteristics from Pk which correspond to those , it is necessary to analyze all the possible instances of establishment of this mutual one-to-one correspondence. For this purpose it may be necessary to have a repeated reproduction of the content of each ek. In order to be spared the repeated passage of a moving medium, in this instance it is necessary to rerecord the content of the ek on an extremely high-speed auxiliary operational
memory (16), and also to set aside a portion of this device for recording the intermediate results of the analysis and the program of its execution. Thus, the solution of type IV tasks in general requires the application of modern universal or even special digital computers. It must, however, be noted that at the present time there exist only a very few information tasks in which the approximation of ek and the questions could be carried out on a one-to-one basis with such great accuracy and complexity as to require the solution of those tasks by means of such highly improved devices.
1. V.P.CHERENIN, Nekotorye problemy dokumentatsii i mekhanizatsiya informatsionnykh poiskov [Certain Problems of Documentation and Mechanization of Information Searches], Moscow, 1955.
2. B.M.RAKOV, V.P.CHERENIN, Eksperimental’naya informatsionnaya mashina Instituta nauchnoy informatsii AN SSSR [Experimental Information Machine of the Institute of Scientific Information, Academy of Sciences USSR], Moscow, 1955 (there also is a publication in the English language).
3. J.W.PERRY, A.KENT, M.M.BERRY, Machine Literature Searching, Interscience, New York-London, 1956.
4. W.C.ADAIR, Citation Indexes for Scientific Literature? American Documentation, 6, , 1955.
5. B.C.VICKERY, Developments in Subject Indexing, The Journal of Documentation, 11, , 1955.
6. B.M.RAKOV, V.P.CHERENIN, Byulleten YUNESKO dlya bibliotek; Machines for Retrieving Information in the USSR, Unesco Library Bulletin, 11, [8–9], 1957. (Also published in English, French, and Spanish languages; the article was reprinted in the German language in the journal Nachrichten fur Dokumentation, 9 , 1958.
7. E.A.AVAKIAN, E.GARFIELD, AMFIS—The Automatic Microfilm Information System, Special Libraries, April, 1957.
8. M.TAUBE and ASSOCIATES, Studies in Coordinate Indexing, Vol. I, 1953.
9. Same, Vol. II, 1954.
10. Same, Vol. III, 1956.
11. R.S.CASEY, J.W.PERRY, Punched Cards—Their Applications to Science and Industry, Reinhold, New York, 1951.
12. J.SAMAIN, Filmorex—Une nouvelle technique de classement et de selection des documents et des informations, Paris, 1952.
13. J.W.PERRY and ALLEN KENT, Das lesende Selektiongerät der Western Reserve University, Nachrichten für Dokumentation, No. 2, 1957.
14. A.W.TYLER, W.L.MYERS, J.W.KUIPERS, The Application of the Kodak Minicard System to Problems of Documentation, American Documentation, 6, , 1955.
15. A.OPLER, T.R.NORTON, New Speed to Structural Searches, Chemical and Engineering News, June 4, 1956.
16. W.H.T.DAVISON and M.GORDON, Sorting for Chemical Groups Using Gordon-Kendall-Davison Ciphers, American Documentation, 8, , 1957.