Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 47
Introduction
AlekSAndAr kuzMAnovic
Northwestern University
AMArnAg SuBrAMAnyA
Google Research
Semantics is the study of meaning. A large number of naturally occurring
phenomena follow certain semantic rules, for example, the semantics of human
speech, semantics associated with an image of a scene, and the semantics of natu -
ral language. Accurate semantic processing is required for a number of high-level
information-understanding tasks such as inferring author sentiment given a blog
or review; searching through a collection of documents, images, and videos; and
translating text from one language to another. For example, it may be hard to infer
the positive sentiment expressed by the statement, “The Prince of Egypt succeeds
where other movies have failed,” without the aid of semantics-based inference.
In the past few years, there has been an explosion in the amount of human-
generated content on the Internet and exponential growth in the number of times a
user turns to the Internet to perform a daily activity. It is estimated that we create
about 1.6 billion blog posts, 60 billion emails, 2 million photographs, and 200,000
videos on the Internet every day. These days, users read the news, watch televi-
sion, and stay connected to their friends and family via the Internet, yet users’
need for Internet-based applications is now greater than ever before. Satisfying
these ever-increasing demands requires a deeper semantic understanding of all
the content on the Web. This session focuses on semantics processing algorithms
for natural language and images since they constitute a large majority of the data
on the Internet.
In the context of natural language, there are many different levels of semantic
processing, ranging from word- and sentence-level analysis to more complex
analysis of discourse. The task of understanding the meaning of words and their
relationships falls under the former; whereas, the ability to infer the meaning of
pronouns (e.g., he, she) and inferring sentiment expressed by a paragraph are
47
OCR for page 48
48 FRONTIERS OF ENGINEERING
examples of the latter. Ani Nenkova (University of Pennsylvania) begins with a
survey of some of the techniques that have been successfully applied to automatic
text understanding and will point out some of the outstanding challenges. She also
sheds light on the impact that text quality has on semantic processing algorithms.
The proliferation of Internet use has led to the creation of large bodies of
knowledge such as Wikipedia. Furthermore, the social aspect of the Web has
resulted in collaboratively generated content (e.g., Yahoo! Answers). Accurate
semantic processing of such sources of knowledge can lead to knowledge-rich
approaches to information access that go far beyond the conventional word-based
methods. Evgeniy Gabrilovich (Yahoo! Research) describes using collaboratively
generated content for representing the semantics of natural language and presents
new information retrieval algorithms enabled by this representation.
Images and video form a key component of the overall Internet experience.
Accurate semantic understanding of images and video can lead to faster and better
search. Samy Bengio (Google Research) discusses algorithms that learn how to
“embed” images and their descriptions (labels or annotations) within a common
space. Such a space can be used to find the nearest annotations to a given image.
He shows how one can construct a “visio-semantic” tree from such annotations.
Tables, plots, graphs, and diagrams are yet another way information is repre -
sented on web pages. These data-driven images are complicated objects that have
a close relationship with the surrounding text. For example, they may be used to
illustrate the text’s conclusions or provide additional data. Unfortunately, state-
of-the-art algorithms treat diagrams in the same way as photos or illustrations.
As a result, searching for a relevant diagram online often yields very poor quality
results. Michael Cafarella (University of Michigan) covers smart semantic pro-
cessing algorithms for plots, graphs, and diagrams. He also discusses ways such
data can be summarized to make it easier for end-user consumption.