Maze Structure and Information Retrieval1
In a paper (1) delivered by E.F.Moore, he established a group of algorithms for finding the shortest path through a maze. These algorithms resulted from pure research related to puzzles and games and an alertness to possible applications to complex communications systems.
The mazes considered were defined by a set of key points and connections between them. The links connecting them could be weighted by a cost function pertinent to the system represented by the maze. The simplest of the algorithms concerned a maze in which all links were of unit length.
A particular maze is illustrated in Fig. 1. The most elementary problem posed by Moore is: Given an origin point and a destination point in the maze, find a path from origin to destination containing the minimum number of links.
The algorithm established by Moore for this problem may be described in the following way:
Initial step. Select the origin point and tag it with a zero. Enable the recognition of the destination point whenever it may be selected.
Step i. Select all points which have not been tagged (i=1, 2, · · ·, k) but which are adjacent to the points which have been tagged and provide them with the tag i. Repeat this process until the destination point is selected and given the tag k. The tag represents the number of links in the minimum path.
Step j (j=k+1, k+2, · · ·, 2k). The second half of the process explicitly defines a minimum path starting back from the destination point. At each of these steps, select an adjacent point which has a smaller i tag. Provide that point with an additional tag, i′=i, and use the newly tagged point as the
GERALD ESTRIN Department of Engineering and Department of Mathematics Numerical Analysis Research, University of California, Los Angeles, California.
base for the next step. Continue this process until the origin is reached and given the tag i′=0. All points carrying an i′ tag lie on a minimum path.
The simplicity of the algorithm makes it amenable to mechanization by existing digital techniques. One may, as usual, make compromises between the parallelism of mechanization and the speed or number of steps.
Moore’s algorithm concentrates on the abstraction of a minimum path between two specified points in a maze. However, the first half of the process is unrelated to the destination point except for the last step.
If the network is regraphed in accordance with the step by step application of the algorithm, one achieves a reorientation of the maze with respect to the origin point. This process is carried out for the maze of Fig. 1 with A as the origin point and the result is displayed in Fig. 3. Comparison of Figs. 1 and 3 shows strikingly the relative ease with which the many paths may be followed after regraphing. For every point in Figure 3, the vertical distance from the origin represents the number of links between the point and the origin along a minimum path.
Figure 4 displays a graph of the same maze with respect to a different origin point and indicates how severely the structure of a network may change depending upon the point of entry.
Information retrieval systems
Supported by the success of this very simple procedure for unraveling complex networks, examples of information retrieval systems are considered according to the following criterion: When the system is characterized graphically, can the maze organizing algorithm or a modification thereof be used to systematize the network with respect to search requirements?
THE SUBJECT CATALOG
The subject catalog achieves its maze attributes as a result of the sets of cross references linking the subject headings. Although it was developed as a search aid, the difficulty of keeping in mind much more than point to point search properties is a limitation to its usefulness.
Given an initial subject heading related to the search requirements, the
searcher would most certainly be helped in forming a strategy of search if he were able to study the character of the subject heading maze in a region around the point of entry.
If the point of entry is chosen as the origin point and the procedure described under “Maze orientation” is applied for a defined number of steps, it can be used to produce a graph of subject headings as linked by cross references.
A simple example is illustrated in Fig. 5. The searcher then chooses a most desirable path among the subject headings and proceeds through the subject catalog in a planned fashion. The graph of the localized region is extendable by selecting the extremity of a desired path as a new origin point.
Experiments leading to a preliminary evaluation of the effectiveness of this method are easily formulated. Two groups are assigned to a set of selected search problems. One of them follows conventional techniques, the other begins the search by graphing a section of the subject heading catalog as described. The number of references, pertinency, time of search, experience of searchers may then be included as parameters in determining whether to evaluate the engineering design of automatic devices to permit larger scale experiments. Such studies should be carried on in both special and general library collections.
The Term-Document Index
The peek-a-boo retrieval system utilizes a card for each term in the index vocabulary and a fixed location on all cards for each document in the system.
One may visualize the system as having the highly linear and shallow structure of Fig. 6. The peek-a-boo technique is a superstructure which observes those documents which are intersected by at least all of the selected terms.
The effectiveness of the search is dependent upon the choice of terms attached to each document when it enters the system and the choice of terms defining the search request. One may expect therefore the development of a maze of interconnections in the vocabulary itself to aid the description of ambiguities.
Three types of difficulty are obvious.
If no significance is attached to the ordering of terms, some of the documents may not be pertinent to search requirements. This type of difficulty is usually not too serious since the searcher can quickly reject such documents.
If the set of search terms is not sufficiently specific, an excessively large group of documents may be retrieved. The search may then be continued with the addition of new significant term cards.
If the set of search terms is overly specific relative to the collection, pertinent documents will be excluded—in the extreme no documents will be retrieved.
In the last two cases, it is evident that the effectiveness of this method will be dependent on one’s ability to make perturbations of the initial selection of search terms. In general the way in which the selected set of documents converges to its final selection as term cards are added to the stack, is dependent on their order.
A point (set of documents) in the system is uniquely selected by the choice of the full set of search terms. However, if judgment is to be exercised about a perturbation on this set of search terms, it would be helpful to have some information about the nature of the system in the locality of the selected point.
The highly compressed graph of Fig. 6 may be expanded into a set of alternating term and document levels starting from a particular set of search terms.
One way of accomplishing this expansion is the simple application of the
maze organizing method, using the complete set of search terms as a starting level 0; level 1 contains all documents linked to any one of the initial search terms; level 2 contains those vocabulary terms not already in level 0 but linked to the documents in level 1; level 3 contains those documents linked to the search terms of level 2 but not already contained in level 1; etc. This structure does not offer much help. Its first levels do not include the selectivity of the peek-a-boo technique, and therefore one cannot recognize a central region of the network corresponding to conditions resulting from the peek-a-boo operation.
Figure 7 displays an expansion of the network which has more selectivity in each of the steps and establishes with levels 0 and 1 the result of peek-a-boo selection. It is developed in the following way.
Level 0 contains the set of n search terms.
Level 1 contains those documents indexed with at least all of the n search terms.
Level 2 contains the set of terms, assigned to the documents in level 1, but not already contained in level 0.
Level −1 contains the documents indexed with at least n−1 of the search terms, but not already contained in level 1.
Level −2 contains the set of terms assigned to the documents of level −1 but not already in level 0.
A further modification which might be made to increase the probable pertinency of documents in level −1 would require that k of the n−1 search terms must be included in the index set of all documents represented.
Levels 0 and 1 are obtained with the standard peek-a-boo technique. Level 2 implies the extraction of the index set for each document selected and the subtraction of the set of search terms. Study of level 2 will guide the choice of additional terms. Level −1 is obtained by the application of the peek-a-boo technique n (or n−k) times with a single term card withdrawn from the stack. Level −2 is developed at each of the steps of forming level −1 by extracting the index set for each document selected and subtracting the set of search terms. Study of level −2 will guide the choice of document selected in level −1.
As in the case of the discussion of the subject catalog, a set of small scale experiments may be defined to evaluate this procedure. If extensive cross referencing is utilized with the dictionary of key terms, the procedure outlined for the subject catalog may be used to aid in the selection of search terms.
The Term-Associative Index
In an attempt to enlarge the scope of the search and particularly to have it directed along “association trails” formed within the vocabulary, Taube conceived of a modification of the peek-a-boo system which would define a network of two-term associations such that, “the mechanical display of such networks of association effectively solves the challenge of Vannevar Bush and provides the ‘coincidences’ of ideas which Benier called the most important characteristic of an information system” (3).
In addition to the Term-Document cards discussed in the previous section, Taube established a card for each term on which there is a position defined for each term in the vocabulary.
N of these term association cards are stacked together (with the restriction that each term card added to the stack corresponds to a term which permitted transmission through the stack in the previous step). The peek-a-boo technique is applied and the outputs imply the existence of documents classed
within the following extremes: single documents indexed by at least all the n+1 terms used or groups of (n+1) documents each indexed by a different one of the search terms.
Taube conceives of the use of this device in a man machine process in which the human selects the next term card from the outputs of the previous group. It is apparently Taube’s hope that the “association machine” will lead to an optimum selection of search terms because the searcher is restricted to selection of terms from a group which has some possibility of producing a document. However, there is nothing in the process which either guarantees a high probability of retrieval or eliminates the problems raised in the discussion of the Term-Document Index. Taube’s procedure may lead to a better selection of search terms than the less guided experience of the searcher but is unlikely to dispense with the need for a directed perturbation of that set of terms.
The above discussion has considered the structure of three retrieval systems in an attempt to define search aids which are capable of displaying the local properties of a retrieval system maze. The approach taken here has assumed existing retrieval systems to represent a first approximation to efficient search with maze organizing techniques applied as a superstructure.
Experiments may easily be formulated with existing subject catalogs and mechanized search systems to test the relative effectiveness of the suggested methods. If initial tests indicate further interest the engineering evaluation of mechanical devices may be carried out.
1. MOORE, E.F. The Shortest Path Through a Maze, International Symposium on Switching Theory, Harvard University, April 1957. (To be published in the Annals of the Harvard Computation Laboratory.)
2. WILDHACK, W., STERN, J., SMITH, J. Documentation in Instrumentation, American Documentation, 5, 223–237 (October 1954).
3. MORTIMER TAUBE and ASSOCIATES. Storage and Retrieval of Information by Means of the Association of Ideas, American Documentation, 6, 1–18 (January 1955 ).
4. BRANTLINGER, R., JENKINS, N.P.H., MACK, C.D. List of Subject Headings for Glass Technology, Pittsburgh Plate Glass Co., Glass Division Research Laboratories, Creighton, Pa., 1954.