Page 180

6—
Agents And Systems Intelligence

Some Problems Associated With The Delivery Of Function Today

Current interfaces to the national information infrastructure (NII) require that the user form a detailed plan to accomplish the tasks he or she desires. Many in both industry and research argue that if ordinary citizens are to use the NII effectively, interfaces must be developed that allow users to specify their needs at a higher level, in terms of their goals. As Maes (1994) has written, "The currently dominant interaction metaphor of direct manipulation ... requires the user to initiate all tasks explicitly and to monitor all events. This metaphor will have to change if untrained users are to make effective use of the computers and networks of tomorrow." The argument is that we must move away from interfaces that require the user to "micromanage" a system's actions and toward interfaces that allow users to delegate actions to digital proxies (often called software "agents" or ''softbots") that use information about users' goals and interests to act on their behalf. Some of these proxies will simply make the networked world more manageable by hiding technical details, much like operating systems and high-level programming languages hide details from users.

An example of the kind of interaction that happens too often with current technologies appears in Box 6.1, which describes a prospective student's attempts to obtain information about scholarships. We can follow



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 180
Page 180 6— Agents And Systems Intelligence Some Problems Associated With The Delivery Of Function Today Current interfaces to the national information infrastructure (NII) require that the user form a detailed plan to accomplish the tasks he or she desires. Many in both industry and research argue that if ordinary citizens are to use the NII effectively, interfaces must be developed that allow users to specify their needs at a higher level, in terms of their goals. As Maes (1994) has written, "The currently dominant interaction metaphor of direct manipulation ... requires the user to initiate all tasks explicitly and to monitor all events. This metaphor will have to change if untrained users are to make effective use of the computers and networks of tomorrow." The argument is that we must move away from interfaces that require the user to "micromanage" a system's actions and toward interfaces that allow users to delegate actions to digital proxies (often called software "agents" or ''softbots") that use information about users' goals and interests to act on their behalf. Some of these proxies will simply make the networked world more manageable by hiding technical details, much like operating systems and high-level programming languages hide details from users. An example of the kind of interaction that happens too often with current technologies appears in Box 6.1, which describes a prospective student's attempts to obtain information about scholarships. We can follow

OCR for page 180
Page 181

Box 6.1 A Student Looks for Scholarship Information on the Internet A reasonable place to start is with one of the well-known indexes. Our user might look for the heading "Education" and do a search on "Scholarship Information." This yields two items: "Loan and Scholarship Programs" and "Science: Mathematics: Organizations: Professional: American Mathematical Society." The latter is not of interest to this student, but the former returns 291 sites where the student can seek further information. Because this flood of information is overwhelming, a reasonable response is to go back to "Education'' and follow a link to "Financial Aid." Here the categories are "College Aid Offices" (144), "Companies" (14), "Grants" (35), "Loan and Scholarship Programs" (34), and "Regional Resources" (10). Several of these look attractive, particularly "Loan and Scholarship Programs and Grants." The student does not know where he or she wants to go to college, so the 144 individual offices do not seem to be a good place to look. The prospective student follows a link and finds a site advertising "180,000 scholarships, grants, fellowships, and loans representing billions of dollars." Wow, this is getting interesting! The student is asked to enter a major but does not want to commit to one. Hitting "go" gives an error message. Trying "undecided," "none," and "science" leads to frustration. There is a button labeled "more." Here the student is asked to enter name, address, and more information. But he or she may not want to provide such information. Following a previously discovered link, the user can find a list of special loan and scholarship programs, but they all turn out to be narrowly aimed at such groups as beauty contest winners, specialists in cardiac electrophysiology, and so forth. Following yet another idea, the student looks for military-based scholarship programs, but the maze of paths is similarly extensive and unrewarding. the student's search on the current Internet and obtain a good idea of the state of existing facilities. The student follows a number of reasonable paths, conscientiously reads the entries, and makes selections. However, the search requires a troublesome number of difficult decisions, takes considerable time, and often results in frustration. The student must enter multiple databases that may be formatted in different ways, must interact with each on its own terms, and may have to restate his or her special interests and constraints again and again in each new environment. Eventually, the student will, in all likelihood, become frustrated and decide to ask a high school teacher or guidance counselor for help. Searches of this kind and with this level of success are more the rule than the exception with present-day facilities. If a person wants to access government services, look for merchandise, or report a downed power line, a multiplicity of choices, an inordinate amount of time, and a lack of satisfaction are common experiences. The main problems are as follows:

OCR for page 180
Page 182 1. Information overload for the user. The amounts of information available are astronomical, and any attempt to read the information can lead to an avalanche response. There is no mechanism that prunes the sources for quality or for applicability to the user's need. 2. System, application, and task complexity. The functions available on the network may require special command syntax and may have complex facilities not easily accessed by the user. Furthermore, the solution of any given user request could involve calls to many such systems. 3. Rigidity. The system may have only very specific ways of receiving input, finding solutions, and returning them to the user. How Agents And System Intelligence Can Help A solution to the above problems is to have a software system between the user and the network that deals with the user in a convenient manner and that interacts with the network and its many facilities in the languages that the network requires. This is analogous to the task of an operating system that may receive a command to "print file1" (either typed or via a direct manipulation command) and that may issue an array of commands to machine facilities to find file1, format it for printing, allocate space for its transfer, open communication to a printer, manage a file transfer to it, receive messages back from the printer as it does its job, and so forth. The user is only aware of the simple command and the fact that the desired file was printed. Yet the complexities involved in servicing the request can be tremendous. In the current situation, however, the job of the intermediary may be considerably more complicated. The facilities on the network may have greater diversity, it may be necessary to decide how many resources can reasonably be invested in a task, there may be a need for common-sense reasoning to decide the relevance of one facility or set of information versus another, and so forth. The tasks of the agent are (1) to interact with the user to determine the nature of the request, (2) to interact with the network to obtain the best-possible solution, and (3) to present to the user the response that has been obtained in the most useful form. Figure 6.1 shows these functions in a diagram. Following the flow arrows around the loop, the first step is interaction with the user. This could involve any of the media and/or modalities described in previous chapters: speech, graphic inputs, typing, or others. It could also involve a full dialogue because the user may have a request that is complex. Then the system must translate the results of this interaction to an internal form, which could be an extensive data structure. Next it executes a variety of computations to obtain a response. These may result in a series of additional interactions with the user. Finally, the

OCR for page 180
Page 183 FIGURE 6.1 The intelligent system mediates between the user and the network. internal form from the computation is translated into user-appropriate terms. Again, one or several of the media and/or modalities could be used, and this could involve more user interactions. The type of system that delivers these kinds of behaviors is called an agent. Such a system has the properties that it can undertake goals on behalf of its sponsor (presumably the user), and it can act autonomously and initiate actions according to its own agenda. Such a system usually has an ability to undertake responsibility over time, persistently seeking its assigned goals and accounting for significant historical events. It may be designed to handle information searches, communication jobs, educational or recreational functions, commercial buying or selling tasks, or any other function that may be available on the network. It may guide the user through complex information spaces-for example, the way a travel agent guides a customer through the maze of possible itineraries or a research librarian guides a library user through the various reference search facilities. Agents may monitor information sources and inform the user of events that match the user's interest profile or may actively search for new information to call to the user's attention. Agents may look for opportunities to assist the user or to teach the user new things. The definition of an "agent" is, in fact, a controversial issue. Various sources emphasize its ability to perceive and act (Russell and Norvig, 1995), its

OCR for page 180
Page 184 role in doing a specialized task (Minsky and Riecken, 1994), its network interactive capabilities (Genesereth and Ketchpel, 1994), and its ability to carry out an agenda (Maes, 1994). The technologies used to deliver these functions may come from traditional computer science with standard programming languages and methodologies or from more contemporary technologies, such as neural networks or rule-based deduction. Diagnosing the User's Needs The first convenience a system can offer to a user is some flexibility in the form of the input. The user may wish to make a request by speaking into a telephone, by pointing to items on a displayed menu, by typing a command, by some combination of all of these, or by some other method. It may also be true that the hardware device that is locally available will not allow all media or modes of input. A very attractive feature of an intelligent system is that it may be able to function properly regardless of the input mode or device. Whatever the means of input, the task of such a system is to convert it into an internal form that can be used by the machine. It is common that a user's input will be inadequate from some point of view. Perhaps the user's syntax cannot be meaningfully parsed, or there may be ambiguity in the request. The system may need to enter into a clarifying dialogue. It must generate a proper internal message to be returned to the user and then translate it into the appropriate media modalities for presentation. Corresponding to the given input, there may be a particularly appropriate output: a menu clarification of some kind for a menu input, a spoken language output for a spoken input, and so forth. The clarification dialogue may continue for several iterations to address various aspects of the request. Some clarifications may come as the task goes forth and problems are encountered. The agent may have some information about the user. This is called a user model, and it may contain a history of the user's typical requests, any special input/output preferences of the user, and a list of information gathered in the current interaction. The user model may be constructed explicitly by asking the user questions or implicitly by monitoring keystrokes or selections made. All of this information can be used by the system in the interaction to reduce redundancy and to efficiently move toward the goal. The system may be adaptive to improve efficiency. If the user often makes the same request, the interaction might be able to jump over the repetitive parts and move directly to the desired result. If particular interactions with the network prove to be unsatisfactory, the system may vary its behavior to avoid them.

OCR for page 180
Page 185 Servicing the User's Request Upon receiving the user's request, the agent must then undertake a variety of actions such as those described at the beginning of this chapter. This requires an ability to search diverse databases at distant places and to assemble discovered information into a useful form. It could require extensive calculations, calling other agents, and/or resorting to contacting other people and asking them for help. It could also involve making some kind of judgments concerning the reliability of the data being acquired. For example, are data provided by a generic source of scholarship information as reliable as the data presented by the institution making the offer? Simultaneously, the system should be able to inform the user about the nature of the search taking place. The user may want to know what sources are being accessed and where successes and failures have occurred. The user should be informed on the extent of the resources being accessed and the time likely to be required. If the undertaking is not of the nature desired, the user must be able to modify it appropriately. Presenting the User with a Response The system response could be a list of 10,000 documents, a complex diagram with extensive annotations, an audio signal, or some other collection of complex objects. Its task on response is to present this in a comprehensible form. The system must undertake content selection and decide which information will best satisfy the user's needs. Then it must do presentation design to structure and format the information in a comprehensible form. This includes media allocation-the decision as to which media will best display which information. Finally, it must realize individual media, ensuring coherency across media (e.g., cross-modal referring expressions, temporal coordination) and a layout that is consistent with the content and intent of the communication. Current Agents Some Examples Many types of agents have been developed and tested in recent years. An example is the entertainment ratings agent by Maes (1994), which requires the user to rate a series of items on which he or she has an opinion. The system then compares that local user's responses to many other users' responses who have submitted their own ratings to similar agents and finds groups of users with profiles of likes and dislikes that are similar to those of the local user. Having found such groups, it compiles a list of their average ratings for a variety of items the local user may not

OCR for page 180
Page 186 have seen and thus provides a group evaluation of items that should be correlated with the local user's own tastes. Such an agent leverages the power of the network to automatically and conveniently deliver to people advice that is personally tuned and would not be easily obtainable any other way, as discussed by Loren Terveen at the workshop. Lieberman's (1995) Letizia is an agent that recommends Web pages as a user is browsing the Web. It operates in conjunction with a standard browser such as Netscape, tracking the user's behavior and using simple heuristics to form a profile of the user's interests (e.g., if the user saves a bookmark to a document, that document is assumed to be of high interest; if a user follows a link and then returns immediately, that document is assumed to not be of interest). While the user is browsing, Letizia tries to locate other information that may be of interest to the user by performing a resource-bounded best-first search of Web pages starting at the current page. At any time the user may request a list of recommendations and the system will display a page containing its current recommendations, which the user can then follow or ignore. This is another example of a current technology agent in that its main mechanism depends on keyword frequency measurements to represent document content for information retrieval (Salton, 1989). An example of what can be done with more detailed models of user tasks and more sophisticated processing is Horvitz's Lumiere (http://www.research.microsoft.com/%7Ehorvitz /lum.htm), which monitors a user's actions and determines when the user may need assistance. Lumiere continuously follows a user's goals and tasks as the user works with a suite of software tools (e.g., Microsoft Office), and performs a type of task recognition; individually observed events are combined into higher-level modeled events, which are variables in a Bayesian model. To develop the models, studies were performed to determine how experts in specific software applications came to understand problems that users might be having with software from the user's behaviors and the evidential distinctions that experts used in their reasoning about the best way to assist a user. As a user works, Lumiere generates a probability distribution over topic areas that the user may need assistance with, along with a probability that the user would not mind being bothered with assistance. This research has led to the development of a product, the Office Assistant in Office 97, which monitors user behaviors and assists the user based on underlying models of each of the Office 97 applications. Another example of an agent that monitors a person's daily behaviors and then can offer help is the Calendar Apprentice (Mitchell et al., 1994). This system enables a user to manage his or her calendar by hand and then it builds rules that attempt to capture the user's typical behaviors. Maes (1994) has described agents that help a user classify and process e-mail

OCR for page 180
Page 187 and classify news articles based on collections of keywords and nominal information related to author, publication source, and other items. In each case the user's behavior in carrying out daily activities becomes the training data for mechanisms that can later aid the user. Etzioni and Weld (1994) created the Internet Softbot, which receives a user's request and then employs a goal-oriented controller to seek a solution. It handles network-based problems, such as sending a message to an individual whose address may not be immediately available. The system uses its goal-oriented mechanisms to determine the tasks necessary to satisfy the request; this may involve accessing remote databases and assembling the necessary information to achieve the goal. The Internet Softbot provides a shell in which implementers can embed knowledge to handle a variety of network tasks automatically that can then be used by anyone. Concerns About Agents The term "agent" conjures in some critics' minds the picture of a humanoid facade and possibly simulated human-like responses. Individuals with this image may object to agents on the grounds that such an interface is condescending to the user and distracting to the goal of doing the work. While there may be populations or situations where such an interface is desirable, such anthropomorphic interfaces are certainly not a part of the concept of an agent as presented here. The choice of the facade and the interaction style are a design decision and are independent of the decision to build an agent. An agent may exist with menus, keyboard command line, voice, or other input modes and still carry out its function. In fact, one of the goals listed above is to provide the user with the choice of a variety of input/output techniques, any of which will work. Another concern of some observers is the loss of control implied by having a system that carries out its own "agenda." Will a person "trust" a software system (as discussed in Chapter 5)? The apparent preferred mode for many users is the so-called direct manipulation paradigm that presents the user with a world of objects and methods to control them. Of course, there is a long tradition of automatically handling low-level details for users, as done, for example, by compilers and operating systems. Having a system agenda and automatic processing at such levels is accepted practice, and what is new occurs when the machine offers to carry out tasks automatically that the user might ordinarily do. Here the important thing is to keep the user in charge, either to accept the automatic actions of the system or to reject them and maintain complete step-by-step control. Another worry relates to the robustness and reliability of a knowledge-based

OCR for page 180
Page 188 reasoning or learning system that may be a part of an agent. A reasonable policy on this issue is simply that conservative decision making should predominate if such a system is to be used. One can set objective criteria for measuring performance and use experimental or theoretical means to guarantee that standards are met. Many of these concerns can be addressed by following the guidelines for agents presented by Pattie Maes at the 1997 International Conference on Intelligent User Interfaces: • Make the user model available (inspectable, modifiable) to the user. • The agent's method of operation should be understandable to the user. • The agent should be able to explain its behavior to the user. • The agent should have the ability to give continuous feedback to the user about its state, actions, and learning. • The agent should allow variable degrees of autonomy, and the user should decide how much and what type of tasks to delegate to the agent. The user should be able to "program" the agent (e.g., teach it things, make it forget things). • The user should not have to learn a new language to deal with the agent. A goal is to use the application to communicate between the agent and the user. Suggested Research: Near And Far Term Conservative versions of agents exist and are being used today, as described above. The input mode can be restricted to current technology, menus, or a keyboard-oriented command-line language. The computation undertaken for the user can be any traditional computation, mail handling, browsing, information retrieval, and so forth. The output facilities can employ any current technology. An example is the agent mentioned above for providing a person with personally tuned ratings for entertainment events. Such a system is well within the capabilities of current technologies and provides an example of a currently realizable agent. There are many kinds of research projects that begin at this level. For example, one can study human problem solving in the absence of technology and try to determine its characteristics and needs. Or one can study the dynamics of human group behaviors. In each case the goal can be to determine whether and how technology, specifically software agents and networking, might be used to augment or improve what occurs naturally. As a part of this, it is necessary to develop measures of effectiveness

OCR for page 180
Page 189 in problem solving, for humans or human-machine collaborations, so that comparative studies can be done. Other important research projects could study or create new architectures for agents. For example, can architectures be found that provide the capabilities called for by Maes and described in the above section? Another area of interest is the development of agent shells that can be created once and then specialized repeatedly to provide a variety of agents that users may need. A part of this research will be the development of common languages for agents to communicate with each other, so that they can pass around requests and answers to requests without regard to which particular agent is being accessed. More ambitious projects with substantially greater payoff, but with greater risk and longer time periods, involve pushing the state of the art in many areas. An example is the problem of accessing and utilizing diverse sources of information. Data may exist on the net in a variety of standard database formats-free text, tables, or many other forms. Data may be multimodal, time varying, and quality graded. The need of the network user is to input a request in a convenient language and then receive a response regardless of how varied the storage techniques may be. Research is needed into the modeling of knowledge sources and their use and the integration of knowledge from diverse sources. Research is also needed into methods for summarizing data (DARPA, 1992, 1993, 1995a) or enabling a person to peruse large amounts of data in an efficient manner. Clearly, enough progress can be made in these areas to achieve useful results within a 10-year time frame. But there are hard enough problems to keep researchers busy long beyond that. Another area where progress will come slowly but surely is in the utilization of the input/output technologies described earlier in this report. For example, three-dimensional imaging methodologies will be useful for presenting the "shopping mall" paradigm, where an agent might enable a user to "fly" to a place of interest and make choices among alternatives, as noted by Thomas DeFanti at the workshop. Progress in speech recognition will make it usable, broadening the variety of input techniques available. The multimedia/multimodal input/output methodologies described above are possible now in primitive forms (Maybury, 1993; Feiner and McKeown, 1991; Wahlster et al., 1993). But advances are required to make progress in fundamental issues such as cross-modal integrated referring expressions, and temporal/spatial synchrony of dynamic media realization. The several decades of language research into formal and natural language theory is needed again for multimedia/multimodal communication. This work is essential if the important goal of media-mode independence is to be approached for ordinary citizens.

OCR for page 180
Page 190 Related to these studies is the need for dialogue theory (Grosz and Sidner, 1986, 1990; Lochbaum, 1994; Moore, 1995), which relates individual input/output transactions to an interacter's overall goal. It maintains a model of what has happened and what needs to happen to achieve ultimate success and provides the fundamental mechanisms for successful collaboration. Rich and Sidner (1996, 1997a,b) have demonstrated that there is much to be gained from applying this theory to the design of collaborative agents. Smith et al. (1995) have demonstrated a speech interactive system that utilizes dialogue theory as developed by Grosz and Sidner (1986) to process equipment repair dialogues. Another area with a less-than-10-year payoff but with more distant ultimate goals is reasoning or inferential systems. Users performing standard tasks such as the scholarship search mentioned above will be helped by inferential capabilities that guide them down logically valid and well-worn paths rather than the plethora that knowledge-free search can find. Expert systems successes of the 1980s provide examples of inference systems that have found use in the real world and similar techniques can be applied immediately in ordinary-citizen applications. More ambitious knowledge-intensive systems will ultimately be needed to provide citizens help on more difficult problems. A reasonable project would be to build a large database of common-sense knowledge for a particular domain that could be used by any system designed to work in that domain. An example of an important and broadly applicable knowledge source is the WordNet system (Miller et al., 1990). Such systems are, in many cases, beyond the 10-year time frame. A final area of great importance that also will extend beyond the 10-year time frame is user models (Kobsa and Wahlster, 1989) and their utilization. The goal is to enable a system to specialize its actions to account for what the user knows and needs. If a user is entering a system for the first time, he or she should possibly be treated differently from an experienced user. If a user has unsuccessfully followed a given path, perhaps he or she should be encouraged in a new direction. If the user can reasonably be expected to know something, he or she should not be told it again unless specific evidence arises indicating that it has been forgotten. A long-term and very difficult problem is information presentation that accounts for a user model. Thus, if one gives instructions on how to go from point A to point B in a city, the instructions will be very different in nature depending on whether the user is new to town or is a long-time resident. Either individual may be irritated to receive instructions intended for the other.

OCR for page 180
Page 191 Recommendations With respect to agents and systems intelligence, the steering committee makes two recommendations on the basis of its discussions: Recommendation 1. Agents and intelligent systems technologies should be a priority for research. A major goal of computer science from its beginning has been to build systems that enable people to interact in languages and with paradigms that are comfortable to them and to apply the best current technologies available to deal with the array of details related to the computation. The field of agent technologies is aimed at accomplishing this for users of the NII. Recommendation 2. A major emphasis should be on the development of technologies for translating between machine internal representations and any of a range of external media or combination of external media for both input and output. Mode and media independence are an important goal for the benefit of ordinary citizens, differently abled and otherwise. This recommendation addresses the technologies that will make such independence possible.