. The system may have only very specific ways of receiving
input, finding solutions, and returning them to the
user.
How Agents And System Intelligence Can
Help
A solution to the above problems is to have a software system
between the user and the network that deals with the user in a
convenient manner and that interacts with the network and its many
facilities in the languages that the network requires. This is
analogous to the task of an operating system that may receive a
command to "print file1" (either typed or via a direct manipulation
command) and that may issue an array of commands to machine
facilities to find file1, format it for printing, allocate space
for its transfer, open communication to a printer, manage a file
transfer to it, receive messages back from the printer as it does
its job, and so forth. The user is only aware of the simple command
and the fact that the desired file was printed. Yet the
complexities involved in servicing the request can be tremendous.
In the current situation, however, the job of the intermediary may
be considerably more complicated. The facilities on the network may
have greater diversity, it may be necessary to decide how many
resources can reasonably be invested in a task, there may be a need
for common-sense reasoning to decide the relevance of one facility
or set of information versus another, and so forth.
The tasks of the agent are (1) to interact with the user to
determine the nature of the request, (2) to interact with the
network to obtain the best-possible solution, and (3) to present to
the user the response that has been obtained in the most useful
form. Figure 6.1 shows these functions in a diagram.
Following the flow arrows around the loop, the first step is
interaction with the user. This could involve any of the media
and/or modalities described in previous chapters: speech, graphic
inputs, typing, or others. It could also involve a full dialogue
because the user may have a request that is complex. Then the
system must translate the results of this interaction to an
internal form, which could be an extensive data structure. Next it
executes a variety of computations to obtain a response. These may
result in a series of additional interactions with the user.
Finally, the
OCR for page 183
Page 183
FIGURE 6.1 The intelligent system mediates
between the user and the network.
internal form from the computation is translated into
user-appropriate terms. Again, one or several of the media and/or
modalities could be used, and this could involve more user
interactions.
The type of system that delivers these kinds of behaviors is
called an agent. Such a system has the properties that it can
undertake goals on behalf of its sponsor (presumably the user), and
it can act autonomously and initiate actions according to its own
agenda. Such a system usually has an ability to undertake
responsibility over time, persistently seeking its assigned goals
and accounting for significant historical events. It may be
designed to handle information searches, communication jobs,
educational or recreational functions, commercial buying or selling
tasks, or any other function that may be available on the network.
It may guide the user through complex information spaces-for
example, the way a travel agent guides a customer through the maze
of possible itineraries or a research librarian guides a library
user through the various reference search facilities. Agents may
monitor information sources and inform the user of events that
match the user's interest profile or may actively search for new
information to call to the user's attention. Agents may look for
opportunities to assist the user or to teach the user new things.
The definition of an "agent" is, in fact, a controversial issue.
Various sources emphasize its ability to perceive and act (Russell
and Norvig, 1995), its
OCR for page 184
Page 184
role in doing a specialized task (Minsky and Riecken, 1994), its
network interactive capabilities (Genesereth and Ketchpel, 1994),
and its ability to carry out an agenda (Maes, 1994).
The technologies used to deliver these functions may come from
traditional computer science with standard programming languages
and methodologies or from more contemporary technologies, such as
neural networks or rule-based deduction.
Diagnosing the User's Needs
The first convenience a system can offer to a user is some
flexibility in the form of the input. The user may wish to make a
request by speaking into a telephone, by pointing to items on a
displayed menu, by typing a command, by some combination of all of
these, or by some other method. It may also be true that the
hardware device that is locally available will not allow all media
or modes of input. A very attractive feature of an intelligent
system is that it may be able to function properly regardless of
the input mode or device. Whatever the means of input, the task of
such a system is to convert it into an internal form that can be
used by the machine.
It is common that a user's input will be inadequate from some
point of view. Perhaps the user's syntax cannot be meaningfully
parsed, or there may be ambiguity in the request. The system may
need to enter into a clarifying dialogue. It must generate a proper
internal message to be returned to the user and then translate it
into the appropriate media modalities for presentation.
Corresponding to the given input, there may be a particularly
appropriate output: a menu clarification of some kind for a menu
input, a spoken language output for a spoken input, and so forth.
The clarification dialogue may continue for several iterations to
address various aspects of the request. Some clarifications may
come as the task goes forth and problems are encountered.
The agent may have some information about the user. This is
called a user model, and it may contain a history of the user's
typical requests, any special input/output preferences of the user,
and a list of information gathered in the current interaction. The
user model may be constructed explicitly by asking the user
questions or implicitly by monitoring keystrokes or selections
made. All of this information can be used by the system in the
interaction to reduce redundancy and to efficiently move toward the
goal. The system may be adaptive to improve efficiency. If the user
often makes the same request, the interaction might be able to jump
over the repetitive parts and move directly to the desired result.
If particular interactions with the network prove to be
unsatisfactory, the system may vary its behavior to avoid them.
OCR for page 185
Page 185
Servicing the User's Request
Upon receiving the user's request, the agent must then undertake
a variety of actions such as those described at the beginning of
this chapter. This requires an ability to search diverse databases
at distant places and to assemble discovered information into a
useful form. It could require extensive calculations, calling other
agents, and/or resorting to contacting other people and asking them
for help. It could also involve making some kind of judgments
concerning the reliability of the data being acquired. For example,
are data provided by a generic source of scholarship information as
reliable as the data presented by the institution making the offer?
Simultaneously, the system should be able to inform the user about
the nature of the search taking place. The user may want to know
what sources are being accessed and where successes and failures
have occurred. The user should be informed on the extent of the
resources being accessed and the time likely to be required. If the
undertaking is not of the nature desired, the user must be able to
modify it appropriately.
Presenting the User with a
Response
The system response could be a list of 10,000 documents, a
complex diagram with extensive annotations, an audio signal, or
some other collection of complex objects. Its task on response is
to present this in a comprehensible form. The system must undertake
content selection and decide which information will best satisfy
the user's needs. Then it must do presentation design to structure
and format the information in a comprehensible form. This includes
media allocation-the decision as to which media will best display
which information. Finally, it must realize individual media,
ensuring coherency across media (e.g., cross-modal referring
expressions, temporal coordination) and a layout that is consistent
with the content and intent of the communication.
Current Agents
Some Examples
Many types of agents have been developed and tested in recent
years. An example is the entertainment ratings agent by Maes
(1994), which requires the user to rate a series of items on which
he or she has an opinion. The system then compares that local
user's responses to many other users' responses who have submitted
their own ratings to similar agents and finds groups of users with
profiles of likes and dislikes that are similar to those of the
local user. Having found such groups, it compiles a list of their
average ratings for a variety of items the local user may not
OCR for page 186
Page 186
have seen and thus provides a group evaluation of items that
should be correlated with the local user's own tastes. Such an
agent leverages the power of the network to automatically and
conveniently deliver to people advice that is personally tuned and
would not be easily obtainable any other way, as discussed by Loren
Terveen at the workshop.
Lieberman's (1995) Letizia is an agent that recommends Web pages
as a user is browsing the Web. It operates in conjunction with a
standard browser such as Netscape, tracking the user's behavior and
using simple heuristics to form a profile of the user's interests
(e.g., if the user saves a bookmark to a document, that document is
assumed to be of high interest; if a user follows a link and then
returns immediately, that document is assumed to not be of
interest). While the user is browsing, Letizia tries to locate
other information that may be of interest to the user by performing
a resource-bounded best-first search of Web pages starting at the
current page. At any time the user may request a list of
recommendations and the system will display a page containing its
current recommendations, which the user can then follow or ignore.
This is another example of a current technology agent in that its
main mechanism depends on keyword frequency measurements to
represent document content for information retrieval (Salton,
1989).
An example of what can be done with more detailed models of user
tasks and more sophisticated processing is Horvitz's Lumiere
(http://www.research.microsoft.com/%7Ehorvitz /lum.htm), which
monitors a user's actions and determines when the user may need
assistance. Lumiere continuously follows a user's goals and tasks
as the user works with a suite of software tools (e.g., Microsoft
Office), and performs a type of task recognition; individually
observed events are combined into higher-level modeled events,
which are variables in a Bayesian model. To develop the models,
studies were performed to determine how experts in specific
software applications came to understand problems that users might
be having with software from the user's behaviors and the
evidential distinctions that experts used in their reasoning about
the best way to assist a user. As a user works, Lumiere generates a
probability distribution over topic areas that the user may need
assistance with, along with a probability that the user would not
mind being bothered with assistance. This research has led to the
development of a product, the Office Assistant in Office 97, which
monitors user behaviors and assists the user based on underlying
models of each of the Office 97 applications.
Another example of an agent that monitors a person's daily
behaviors and then can offer help is the Calendar Apprentice
(Mitchell et al., 1994). This system enables a user to manage his
or her calendar by hand and then it builds rules that attempt to
capture the user's typical behaviors. Maes (1994) has described
agents that help a user classify and process e-mail
OCR for page 187
Page 187
and classify news articles based on collections of keywords and
nominal information related to author, publication source, and
other items. In each case the user's behavior in carrying out daily
activities becomes the training data for mechanisms that can later
aid the user.
Etzioni and Weld (1994) created the Internet Softbot, which
receives a user's request and then employs a goal-oriented
controller to seek a solution. It handles network-based problems,
such as sending a message to an individual whose address may not be
immediately available. The system uses its goal-oriented mechanisms
to determine the tasks necessary to satisfy the request; this may
involve accessing remote databases and assembling the necessary
information to achieve the goal. The Internet Softbot provides a
shell in which implementers can embed knowledge to handle a variety
of network tasks automatically that can then be used by anyone.
Concerns About Agents
The term "agent" conjures in some critics' minds the picture of
a humanoid facade and possibly simulated human-like responses.
Individuals with this image may object to agents on the grounds
that such an interface is condescending to the user and distracting
to the goal of doing the work. While there may be populations or
situations where such an interface is desirable, such
anthropomorphic interfaces are certainly not a part of the concept
of an agent as presented here. The choice of the facade and the
interaction style are a design decision and are independent of the
decision to build an agent. An agent may exist with menus, keyboard
command line, voice, or other input modes and still carry out its
function. In fact, one of the goals listed above is to provide the
user with the choice of a variety of input/output techniques, any
of which will work.
Another concern of some observers is the loss of control implied
by having a system that carries out its own "agenda." Will a person
"trust" a software system (as discussed in Chapter 5)? The apparent
preferred mode for many users is the so-called direct manipulation
paradigm that presents the user with a world of objects and methods
to control them. Of course, there is a long tradition of
automatically handling low-level details for users, as done, for
example, by compilers and operating systems. Having a system agenda
and automatic processing at such levels is accepted practice, and
what is new occurs when the machine offers to carry out tasks
automatically that the user might ordinarily do. Here the important
thing is to keep the user in charge, either to accept the automatic
actions of the system or to reject them and maintain complete
step-by-step control.
Another worry relates to the robustness and reliability of a
knowledge-based
OCR for page 188
Page 188
reasoning or learning system that may be a part of an agent. A
reasonable policy on this issue is simply that conservative
decision making should predominate if such a system is to be used.
One can set objective criteria for measuring performance and use
experimental or theoretical means to guarantee that standards are
met.
Many of these concerns can be addressed by following the
guidelines for agents presented by Pattie Maes at the 1997
International Conference on Intelligent User Interfaces:
•
Make the user model available (inspectable,
modifiable) to the user.
•
The agent's method of operation should be
understandable to the user.
•
The agent should be able to explain its behavior
to the user.
•
The agent should have the ability to give
continuous feedback to the user about its state, actions, and
learning.
•
The agent should allow variable degrees of
autonomy, and the user should decide how much and what type of
tasks to delegate to the agent. The user should be able to
"program" the agent (e.g., teach it things, make it forget
things).
•
The user should not have to learn a new language
to deal with the agent. A goal is to use the application to
communicate between the agent and the user.
Suggested Research: Near And Far
Term
Conservative versions of agents exist and are being used today,
as described above. The input mode can be restricted to current
technology, menus, or a keyboard-oriented command-line language.
The computation undertaken for the user can be any traditional
computation, mail handling, browsing, information retrieval, and so
forth. The output facilities can employ any current technology. An
example is the agent mentioned above for providing a person with
personally tuned ratings for entertainment events. Such a system is
well within the capabilities of current technologies and provides
an example of a currently realizable agent. There are many kinds of
research projects that begin at this level. For example, one can
study human problem solving in the absence of technology and try to
determine its characteristics and needs. Or one can study the
dynamics of human group behaviors. In each case the goal can be to
determine whether and how technology, specifically software agents
and networking, might be used to augment or improve what occurs
naturally. As a part of this, it is necessary to develop measures
of effectiveness
OCR for page 189
Page 189
in problem solving, for humans or human-machine collaborations,
so that comparative studies can be done.
Other important research projects could study or create new
architectures for agents. For example, can architectures be found
that provide the capabilities called for by Maes and described in
the above section? Another area of interest is the development of
agent shells that can be created once and then specialized
repeatedly to provide a variety of agents that users may need. A
part of this research will be the development of common languages
for agents to communicate with each other, so that they can pass
around requests and answers to requests without regard to which
particular agent is being accessed.
More ambitious projects with substantially greater payoff, but
with greater risk and longer time periods, involve pushing the
state of the art in many areas. An example is the problem of
accessing and utilizing diverse sources of information. Data may
exist on the net in a variety of standard database formats-free
text, tables, or many other forms. Data may be multimodal, time
varying, and quality graded. The need of the network user is to
input a request in a convenient language and then receive a
response regardless of how varied the storage techniques may be.
Research is needed into the modeling of knowledge sources and their
use and the integration of knowledge from diverse sources. Research
is also needed into methods for summarizing data (DARPA, 1992,
1993, 1995a) or enabling a person to peruse large amounts of data
in an efficient manner. Clearly, enough progress can be made in
these areas to achieve useful results within a 10-year time frame.
But there are hard enough problems to keep researchers busy long
beyond that.
Another area where progress will come slowly but surely is in
the utilization of the input/output technologies described earlier
in this report. For example, three-dimensional imaging
methodologies will be useful for presenting the "shopping mall"
paradigm, where an agent might enable a user to "fly" to a place of
interest and make choices among alternatives, as noted by Thomas
DeFanti at the workshop. Progress in speech recognition will make
it usable, broadening the variety of input techniques
available.
The multimedia/multimodal input/output methodologies described
above are possible now in primitive forms (Maybury, 1993; Feiner
and McKeown, 1991; Wahlster et al., 1993). But advances are
required to make progress in fundamental issues such as cross-modal
integrated referring expressions, and temporal/spatial synchrony of
dynamic media realization. The several decades of language research
into formal and natural language theory is needed again for
multimedia/multimodal communication. This work is essential if the
important goal of media-mode independence is to be approached for
ordinary citizens.
OCR for page 190
Page 190
Related to these studies is the need for dialogue theory (Grosz
and Sidner, 1986, 1990; Lochbaum, 1994; Moore, 1995), which relates
individual input/output transactions to an interacter's overall
goal. It maintains a model of what has happened and what needs to
happen to achieve ultimate success and provides the fundamental
mechanisms for successful collaboration. Rich and Sidner (1996,
1997a,b) have demonstrated that there is much to be gained from
applying this theory to the design of collaborative agents. Smith
et al. (1995) have demonstrated a speech interactive system that
utilizes dialogue theory as developed by Grosz and Sidner (1986) to
process equipment repair dialogues.
Another area with a less-than-10-year payoff but with more
distant ultimate goals is reasoning or inferential systems. Users
performing standard tasks such as the scholarship search mentioned
above will be helped by inferential capabilities that guide them
down logically valid and well-worn paths rather than the plethora
that knowledge-free search can find. Expert systems successes of
the 1980s provide examples of inference systems that have found use
in the real world and similar techniques can be applied immediately
in ordinary-citizen applications. More ambitious
knowledge-intensive systems will ultimately be needed to provide
citizens help on more difficult problems. A reasonable project
would be to build a large database of common-sense knowledge for a
particular domain that could be used by any system designed to work
in that domain. An example of an important and broadly applicable
knowledge source is the WordNet system (Miller et al., 1990). Such
systems are, in many cases, beyond the 10-year time frame.
A final area of great importance that also will extend beyond
the 10-year time frame is user models (Kobsa and Wahlster, 1989)
and their utilization. The goal is to enable a system to specialize
its actions to account for what the user knows and needs. If a user
is entering a system for the first time, he or she should possibly
be treated differently from an experienced user. If a user has
unsuccessfully followed a given path, perhaps he or she should be
encouraged in a new direction. If the user can reasonably be
expected to know something, he or she should not be told it again
unless specific evidence arises indicating that it has been
forgotten. A long-term and very difficult problem is information
presentation that accounts for a user model. Thus, if one gives
instructions on how to go from point A to point B in a city, the
instructions will be very different in nature depending on whether
the user is new to town or is a long-time resident. Either
individual may be irritated to receive instructions intended for
the other.
OCR for page 191
Page 191
Recommendations
With respect to agents and systems intelligence, the steering
committee makes two recommendations on the basis of its
discussions:
Recommendation 1. Agents and intelligent systems technologies
should be a priority for research. A major goal of computer
science from its beginning has been to build systems that enable
people to interact in languages and with paradigms that are
comfortable to them and to apply the best current technologies
available to deal with the array of details related to the
computation. The field of agent technologies is aimed at
accomplishing this for users of the NII.
Recommendation 2. A major emphasis should be on the
development of technologies for translating between machine
internal representations and any of a range of external media or
combination of external media for both input and output. Mode
and media independence are an important goal for the benefit of
ordinary citizens, differently abled and otherwise. This
recommendation addresses the technologies that will make such
independence possible.
Representative terms from entire chapter:
current technologies