is that the space of possible analogies is quite large. For example, just restricting to the U.S. Patent database, there are ~107 patents to consider.

One approach to organizing a search through possible analogies is to put possible sources into a structure. But on what should basis the structure be developed—logic, lexical, or conceptual? Schunn and his colleague used a new Bayesian approach developed by Griffiths, Kemp, and Tennenbaum (2008) to empirically discover the structure in data. Their process allowed them to “bottom-up discover” that color is best represented as a wheel, animals as a tree structure, and the U.S. Supreme Court as a line. Schunn’s research team augmented this approach using Latent Semantic Analysis developed by Landauer, Foltz, and Laham(1998) to capture the basic semantics of text in patent descriptions. They then used Kemp and Tennenbaum’s algorithm to determine which kind of structure best organizes the data. It turns out a tree structure provides a good fit to the text similarity data.

Schunn then explained how they tested the approach on an actual engineering design problem: design a device to collect energy from human motion. They took a random set of “mechanical” patents from the U.S. Patent database and organized them using their algorithm. As part of the experiment, they used 72 undergraduate mechanical engineers, who were randomly assigned to one of three different conditions:

Near: see 5 patents considered close to the design problem in the semantic tree Far: see 5 patents considered far from the design problem in the semantic tree Control: given no patents prior to being asked to solve the problem

Design solutions that were generated by the experimental subjects were coded for both novelty and quality.

Describing the findings of his project, Schunn said that the “Far” condition did worse than the “Control” or the “Near” conditions on both aspects of assessment—novelty and quality. This finding was the reverse of a previous study by his team, for which they used hand-selected patents instead of randomly chosen patents for their experiment (see Chan et al., 2011). To understand the difference in outcomes, Schunn and his colleagues added the previously used hand-selected patents to their 45 randomly selected patents, and re-ran the algorithm. They discovered that their near and far hand-selected patents were all closer to the design problem than the randomly selected patents that had been organized into relatively near and far. Thus, they essentially identified an inverted U-shape in the design quality and novelty data: very near patents are of no help, medium-distance patents can be useful, but very far patents are also of no help.

In conclusion, Schunn remarked that the result of his work expands notions of distance beyond the simple near/far binary distinction that is commonly used. His study also provides an objective distance function that can be used in research and practice to guide optimal analogy searches.33


33For more details on this research, see: Fu, Chan, Schunn et al. (2013); Fu, Chan, Cagan et al. (2013); and Chan et al. (2011).

The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement