PROPOSED SCOPE OF AREA 5
SYSTEMS AND PROCEDURES for storing and searching recorded knowledge are designed for the purpose of making the information available when needed. Such systems are often rendered complex by the requirement that information is to be drawn from a variety of sources and also by the fact that in some instances systems must provide both for the recall of individual facts and for their effective correlation. This area is concerned with the design of effective systems, with the problems encountered in processing recorded information for subsequent search, and with the possibilities of using machines or devices in the processing, storage, and search operations.1
The effective organization of recorded knowledge requires more understanding than we now have of the relationships between scientific concepts and the words and other symbols in which they are described or expressed. The processing of information for retrospective searching must also take into account various practical considerations such as the purpose to be served, the capabilities of equipment that is already available or that could be constructed by exploiting present-day technological possibilities, and the costs involved in applying the various methods of processing information.
We must consider not only the effectiveness of traditional procedures, such as alphabetical indexing and hierarchical classification, but also recent experimental work and research trends directed toward the development of new or modified methods of indexing or organizing material as well as the development of equipment to aid in the rapid processing and searching of information.
Principal Subjects for Discussion
The principal subjects for discussion might be grouped under the following headings:
Linguistics—Structure and Meaning;
Problems Encountered in the Design and Application of Systems;
Application of Machines in Storage and Search Processes;
Comparative Evaluation of Experimental Systems.
1 LINGUISTICS—STRUCTURE AND MEANING
The relationships between scientific concepts and the words and symbols in which they are expressed may be highly complex because symbols are entities in their own right and frequently more than one concept is associated with a single symbol. It is necessary to consider the problems involved in defining the meanings of words and also the relationships between words whose scope of meaning is narrow and words of more generic significance. The possibility of making use of generic concepts in correlating recorded information is particularly worthy of attention. The knowledge and methodology of linguistic scientists might be examined also for contributions in the effort to deal with the problems that occur in attempting to organize information. It may be helpful to study the devices used in natural language for indicating relationships between words—devices such as word order, directional prepositions, and various types of syntactic structure.
2 PROBLEMS ENCOUNTERED IN THE DESIGN AND APPLICATION OF SYSTEMS
The question as to whether subject matter is to be recorded for storage in the form of a total text, an abstract of that text, the key words selected from the text or some other form depends upon the type of service to be provided. Whether the information to be stored and later searched involves ideas or notions on the one hand or established facts on the other will influence the design of methods for organizing the subject matter and the selection of equipment to be used with the storage and search system. The problems encountered in designing and using the following types of indexing and classification schemes and the capabilities and limitations of the various schemes should be explored.
A Conventional Indexing
The word “index” according to Webster’s dictionary, is sometimes used to denote “a directing sign or instrument, that which points out; that which shows, indicates, manifests or discloses; a token or indication.” Taken in this broad sense, the establishment and use of an index is essential in the retrieval and correlation of recorded knowledge.
In a narrower sense, the word “index” may be used to denote “a list, usually alphabetical, of topics, etc., in a book, giving the numbers of the pages where each subject is treated, and commonly placed at the end of the volume;” also, “a similar list of references to a series of volumes, or literature in general, usually printed separately.”
The basic question of how words are most effectively used to provide “directing signs” and also the possibilities of using equipment to facilitate the preparation and use of indexes should be considered.
B Coordinate Indexing
When a system of coordinate indexing is used, the key words are used as “coordinates” and in the searching process it is possible to search for any desired combination of these “coordinates” or index entries. Scientific inquiries are usually expressed in terms of multiple concepts arranged according to the inquirer’s particular needs of the moment.
Various devices for storage and search can be used effectively with methods of coordinate indexing. The work of combining categories as needed to satisfy an inquiry is left to the machine or device. The high speed at which desired patterns can be detected is the basis for the usefulness of such equipment.
C Indexes that Specify Relationships
Most conventional and coordinate indexing systems are based on key words gleaned from the scientific records to be stored. A combination of the key words and the explicitly-stated relationships between the words forms a telegraphic style abstract that gives the subject content of a document at a glance. The merits of such abstracts for indexing purposes, the methods of preparing and coding the abstracts, and characteristics of equipment needed to conduct searches when information is so organized need to be explored.
The broad purpose of classification is to draw together those records that have common features of subject content. The growing number of scientific records has required the use of increasingly narrow subject divisions in classification schemes, meaning that finer distinctions must be made in analyzing subject matter and that the task of identifying documents with common characteristics has become more and more complicated. Broad classification schemes that attempt to cover all knowledge are being supplemented or even supplanted by specialized classification systems covering relatively narrow areas of subject matter. The use of carefully prepared specialized classification schemes or of a multiplicity of such specialized schemes may be very effective in organizing information for subsequent searching.
3 APPLICATION OF MACHINES IN STORAGE AND SEARCH PROCESSES
The type of information stored and the types of questions to be directed to the stored record determine the methods to be used and the kind of equipment that will be selected for a given system. It is sometimes desirable to perform
some operations on the information stored in the record, such as determining connections or relations between items. On the other hand, it is sometimes sufficient merely to recognize the presence or absence of items of information or of combinations of items recorded as such. Consideration needs to be given to general characteristics, capabilities, and limitations of the various types of equipment, including examples of such equipment now available which have not been exploited to full advantage.
Reviewing the stored record, rearranging the subject matter, and continually expediting searches can be accomplished mechanically as machines “learn” by experience. Feedback can be employed to establish search priorities, optimum search patterns, and so forth. The mechanics of conducting such maintenance steps should be considered here, and the experience of others who use automatic data-processing equipment should be examined for applicable techniques.
During the searching operation, it is possible to use the answers that first result from the search to direct further searching. This can be a simple process of counting the items yielded by various modifications of the original question, and analysis of the results of such counting to guide additional searchings. On the other hand, auxiliary memory devices can be used with the searching equipment—devices capable of looking for associated items, of recognizing the frequency of occurrence of such associations, and of changing the direction of the search while it is going on. The techniques of searching, the development of search strategy, and the practical problems of conducting searches should be examined here.
In all the discussions of this problem area, emphasis needs to be placed on total information systems, methods as well as equipment. Experiences within particular situations need to be discussed in terms of their applicability to the problems of other situations.
It should be borne in mind also that machines can be used most effectively to prepare statistical analyses of a given body of information and that such analyses could be most helpful in determining the best method for the mechanized handling of that information. First a statistically meaningful sample must be selected from the total body of information. It should then be analyzed in various ways for its characteristics, such as the frequency of occurrence of certain kinds of information or the frequency of occurrence of certain combinations of items. The methods of employing present equipment, or the requirements for new equipment, to conduct such statistical analyses have not yet been fully recognized. Research in the field of mechanical translation offers one example of the use of such analytical methods in designing mechanized information systems.
4 EVALUATION OF EXPERIMENTAL SYSTEMS
It appears that persons and groups who are engaged in the development of systems embodying new principles for organizing subject matter could effectively compare their systems by applying them to a common set of documents. This may be a small sample, to be agreed upon by the cooperating investigators. It is suggested that the papers from the August 1955 Conference at Geneva, on peaceful uses of atomic energy, might constitute a useful sample. Use of the same sample for all systems would make comparison and evaluation easier. The systems should be discussed in terms of:
Size of the file to be covered;
Rate of growth of file and system;
Range of inquiries to be serviced, or the purposes to be served;
Range of subject matter to be covered;
Kinds of concepts to be represented;
Specificity and type of analysis;
Personnel required to do the analysis;
Cost of processing information and conducting searches;
Reliability of results, or probability of retrieval;
Form of system.
It is recognized that discussions in this problem area cannot be as factual as the descriptions of systems in actual operation could be. But the discussions should be practical and point out capabilities and limitations of the systems insofar as they can be determined.2
5. Papers on the bearing of studies in linguistics on storage and search systems, papers on problems encountered in the design and application of systems, papers on the application of machines in storage and search processes, and papers evaluating experimental systems are especially invited.