examples of the latter. Ani Nenkova (University of Pennsylvania) begins with a survey of some of the techniques that have been successfully applied to automatic text understanding and will point out some of the outstanding challenges. She also sheds light on the impact that text quality has on semantic processing algorithms.
The proliferation of Internet use has led to the creation of large bodies of knowledge such as Wikipedia. Furthermore, the social aspect of the Web has resulted in collaboratively generated content (e.g., Yahoo! Answers). Accurate semantic processing of such sources of knowledge can lead to knowledge-rich approaches to information access that go far beyond the conventional word-based methods. Evgeniy Gabrilovich (Yahoo! Research) describes using collaboratively generated content for representing the semantics of natural language and presents new information retrieval algorithms enabled by this representation.
Images and video form a key component of the overall Internet experience. Accurate semantic understanding of images and video can lead to faster and better search. Samy Bengio (Google Research) discusses algorithms that learn how to “embed” images and their descriptions (labels or annotations) within a common space. Such a space can be used to find the nearest annotations to a given image. He shows how one can construct a “visio-semantic” tree from such annotations.
Tables, plots, graphs, and diagrams are yet another way information is represented on web pages. These data-driven images are complicated objects that have a close relationship with the surrounding text. For example, they may be used to illustrate the text’s conclusions or provide additional data. Unfortunately, state-of-the-art algorithms treat diagrams in the same way as photos or illustrations. As a result, searching for a relevant diagram online often yields very poor quality results. Michael Cafarella (University of Michigan) covers smart semantic processing algorithms for plots, graphs, and diagrams. He also discusses ways such data can be summarized to make it easier for end-user consumption.