COMPUTER-MEDIATED COLLABORATION
Loren Terveen
AT&T Research
INTERFACE . . . INTERACTION . . . COLLABORATION
A narrow view of the human-computer interface focusing on superficial "look-and-feel" issues is unproductive. It offers neither deep understanding nor practical design guidance. Even simple interface decisions may require significant knowledge about people's interaction with a system. Three interfaces provide practical examples: the popcorn button on microwave ovens, the VCR (video cassette recorder)+ system, and the ATM (automated teller machine) fast cash withdrawal button. Each of these interfaces was added years into the product cycle in response to people's actual use of the products. At a theoretical level, Hutchins et al. (1986) couched their seminal analysis of direct manipulation interfaces in terms of users' cognitive situation and resources, a general model of tasks, and the coupling between user goals and interface features. Their analysis shows why interface design decisions cannot be made on the basis of look and feel alone.
Indeed, we also begin to see that people may require even more from systems, namely help with tasks they don't know enough about to do on their own. Norman (1986) observed that the Pinball Construction Set makes it easy to design computerized pinball games but not good games; this takes knowledge about pinball design. More generally, Schoen (1983) discussed how skilled professionals can interpret the state of their work objects to make good decisions; they act and the situation "talks back." The problem is that less skilled people may not be able to understand what the situation is "saying." Fischer and Reeves (1992) studied interactions between customers and sales agent in a large hardware store. They identified crucial knowledge only the sales agents possessed, which they used to help customers. The knowledge included knowing that a tool existed, how to find a tool, the conditions under which a particular tool should be used, and how to combine tools for a specific situation.
People often work together on tasks. Thus, in addition to collaborating with users, another appropriate role for systems is to support human collaboration. The field of computer-supported cooperative work (CSCW) seeks to understand the nature of joint work and design technologies to support it. Important technologies include shared editors, group discussion support tools, and awareness systems.
Even when people do not work together explicitly, they still can benefit from the prior experience and opinions of others. Computational techniques for mining such information and turning it into a reusable asset raise the potential for a form of "virtual collaboration," with some of the benefits of collaboration without the costs of communication or personal involvement.
To summarize, there are three fundamental motivations for collaborative systems and a research approach built on each one:
Next I discuss the prospects for collaboration in common tasks
supported by the national information infrastructure (NII).
THE NIIWHAT PEOPLE USE IT FOR, WHERE COLLABORATION IS NEEDED
The change from stand-alone to networked computers is transforming computers from desktop tools into windows on the world, from information containers and processors into communication devices. The World Wide Web is the primary innovation ushering ordinary citizens into this new world, so much of my discussion focuses on the Web.
The World Wide Web was designed expressly to support communication and collaboration among geographically distributed colleagues (Berners-Lee et al., 1994). Specifically, it supports information sharing, with the dual aspects of publishing and finding information. As the Web has expanded to embrace a diverse population of users and a broad range of uses, more activities have become important:
COMPUTER-MEDIATED COLLABORATION:
A UNIFYING PERSPECTIVE
A unified research framework offers two main benefits: (1) it advances communication and understanding among researchers by helping them to share and compare methods and results, and (2) it makes it easier to explore designs that integrate different types of collaboration. I propose a perspective of "computer-mediated collaboration"people collaborating with people, mediated by computation.
A given instance of computer-mediated collaboration can be characterized by using the following dimensions:
This framework is adequate for describing CSCW and virtual collaboration; both of these explore computational techniques for mediating human collaboration. As applied to collaborative agents, it highlights the involvement of the people who create the agents, both domain experts whose knowledge is modeled in the agents and knowledge engineers (or artificial intelligence researchers) who work with the experts to articulate the knowledge and develop representations and algorithms for using it. It also reminds us of the time and resource costs of the design process.
More deeply, the framework guides us to consider combinations
of various types of collaboration. For example, users of a
computational agent may not think about its designers when things work;
however,when the user-agent interaction breaks down, an effective remedy may be
to provide the user access to a knowledgeable human expert, such as
the domain expert involved in designing the agent (Terveen et al., 1995).
Or when an agent has inadequate knowledge to perform a task on behalf
of its user, it might be able to obtain assistance from other agents
(Lashkari et al., 1994).
RESEARCH ISSUES
Dividing Responsibility Among People and Computational Agents
People and computers have fundamentally different abilities. Thus,
a basic issue is creating divisions of responsibility that maximize
the strengths and minimize the weaknesses of each (Terveen, 1995).
"Critics" (Fischer et al., 1993) represent a well-known approach that responds
to this issue. Critics are agents who observe users as they work in a
computational environment and offer assistance from time to time. Users
are responsible for the overall course of the work, while critics use
domain expertise to help users solve problems and evolve their conception of
the problem. While much interesting work has been done in this area,
most of it still consists of proof-of-concept explorations. The next step is
to develop robust generalizations that can be embedded in toolkits.
Collecting and Evaluating Data Necessary for Virtual Collaboration
Two major approaches to virtual collaboration have been explored. Systems like the Bellcore Recommender (Hill et al., 1995) and Firefly (http://www.firefly.com) ask users to rate objects of interest, such as movies or music. The systems maintain a database of raters and their ratings, compute similarities among raters, and recommend objects to people that were rated highly by other people with similar tastes. Data-mining approaches (Hill and Hollan, 1994; Hill and Terveen, 1996) attempt to extract useful information automatically from people's normal activities, such as reading and editing documents or discussing topics on netnews. (One goal is to require little or no extra data entry from users.) Abstracted versions of this information are then made available to other people engaged in the same activity.
One of the major issues for both types of approaches is obtaining the necessary information. For ratings systems the question is: Will enough people rate? For data-mining systems, the questions include: Can useful information be extracted automatically? Can it be extracted efficiently (important since quality often comes from aggregating over large amounts of data)? Can it be extracted and reused without violating the privacy of the people who produced it?
Once datarecommendations or ratingsare available, the problem is to evaluate them. One good way to do this is to consider the source; some people are more credible for any given topic. Therefore, computing a person's credibility from available data is a second major problem. One complication is that most interaction on the World Wide Web is anonymous; if one cannot even attribute particular actions or opinions to a person, it is hard to compute his or her credibility. This again raises a potential conflict in values between the privacy of on-line interaction and the attempt to mine information that could be used to enhance interaction.
The credibility problem can be further refined into that of
determining good sources (raters/recommenders) for a specific person.
Developing effective algorithms for this is precisely what the ratings
approach does. However, the problem is harder for data-mining approaches:
they operate only on already-available data, and existing data may not
always be an adequate source for computing similarities among people.
Introducing Computational Agents into On-line Communities
When an agent participates in an on-line community, such as a newsgroup or text-based virtual reality (e.g., a MUD or MOO), interesting issues arise beyond those faced in single-user human-computer collaboration. I illustrate these issues using PHOAKS (Hill and Terveen, 1996), which serves as a group memory agent that maintains recommended Web pages for a group.
CONCLUSION
I would like to conclude with two claims. First, if we take the
argument of this paper seriously, we need not one but many
every-citizen interfaces to the NII. It is specific appropriate types of
computer-mediated collaborations that have the potential to increase the access
and power of ordinary citizens, not a standard look-and-feel. Second,
research must move into the real world. Many of the PHOAKS
issues discussed here are ones not anticipated, but discovered only by
wading into the uncontrolled, unpredictable, messy World Wide Web. We
have been able to formulate issues, hone our tools, and evaluate our results
in ways that we could not have done if we had stayed in our laboratories.
At some stage, all promising new research ideas will have to take the
same plunge to prove their benefits to the ordinary citizen engaging in life
on the NII.
ACKNOWLEDGMENTS
I thank Will Hill for our collaboration on PHOAKS and for our
many conversations developing and exploring the issues mentioned here.
REFERENCES
Berners-Lee, T., Cailliau, R., Luotonen, A., Nielsen, H.F., and Secret, A. (1994) The World-Wide Web. Communications of the ACM, 34(12), 321-347.
Fischer, G., and Reeves, B. (1992) Beyond Intelligent Interfaces: Exploring, Analyzing, and Creating Success Models of Cooperative Problem Solving. Applied Intelligence, 1, 311-332.
Fischer, G., Nakakoji, K., Ostwald, J., Stahl, G., and Sumner, T. (1993) Embedding Critics in Design Environments. The Knowledge Engineering Review Journal, 4(8), 285-307.
Hill, W.C., and Hollan, J.D. (1994) History-Enriched Digital Objects: Prototypes and Policy Issues. The Information Society, 10, 139-145.
Hill, W.C., Stead, L., Rosenstein, M., and Furnas, G. (1995) Recommending and Evaluating Choices in a Virtual Community of Use. Pp. 194-201 in CHI'95. ACM Press, New York.
Hill, W.C., and Terveen, L.G. (1996) Using Frequency-of-Mention in Public Conversations for Social Filtering. CSCW'96. ACM Press, New York. (See also http://www.phoaks.com/phoaks)
Hutchins, E.L., Hollan, J.D., and Norman, D.A. (1986) Direct Manipulation Interfaces. Pp. 87-124 in Norman, D.A., and Draper, S.W., Eds., User Centered System Design. Erlbaum, Hillsdale, N.J.
Lashkari, Y., Metral, M., and Maes, P. (1994) Collaborative Interface Agents. In AAAI'94. AAAI Press, Seattle, Wash.
Maes, P. (1995) Artificial Life Meets Entertainment: Interacting with Lifelike Autonomous Agents. Communications of the ACM, 38(11), 108-114.
Norman, D.A. (1986) Cognitive Engineering. Pp. 31-61 in Norman, D.A. and Draper, S. W., Eds., User Centered System Design. Erlbaum, Hillsdale, N.J.
Schoen, D. (1983) The Reflective Practitioner. Basic Books, New York.
Terveen, L.G. (1995) An Overview of Human-Computer Collaboration. Knowledge-Based Systems, 8(2-3), 67-31.
Terveen, L.G, Selfridge, P.G., and Long, M.D. (1995) Living Design Memory: Framework, System, and Lessons Learned. Human-Computer Interaction, 10(1), 1-37.
CREATING INTERFACES
FOUNDED ON PRINCIPLES OF DISCOURSE
COMMUNICATION AND COLLABORATION
Candace Sidner
Lotus Development Corporation
Today's user interfaces are just too hard to use: they are too complex even for the narrow range of users for whom they were designed. At the same time, they also are impoverished in the range of modalities which they provide to users. While new modalities are becoming available, they could make interfaces even harder to use. What's the solution to this dilemma? Principles of human discourse communication and of human-to-human collaboration are two critically overlooked sources for simplifying interfaces. They offer a means of integrating various modalities and of extending the range of computer users.
Until recently user interface technology has not made use of what is now understood about the principles of discourse that govern human communication or the principles of collaboration that model joint work. This may seem surprising because interfaces are "communication engines" to the functionality software applications; interfaces are how we get our work done. While the field of computer-supported cooperative work has directed the majority of its concerns at understanding how computers can be used to help people work together, the computer has not been seen as a full-fledged partner in the human collaboration. Interfaces are designed to make collaboration between people better and to some extent they succeed, but the computer is not a collaborator with any of the people.
The current model of communication in interfaces is rudimentary at best. It is the "interaction" model, which is to say the user invokes a command and gets some, perhaps expected, performance by the computer, rather like when one's dog does a trick on the basis of a command such as "roll over." To communicate, users must choose one- or two-word commands from a menu with a mouse or incant a line of mumbo-jumbo that is meant to command the computer to run a program. Any clarification with the user results from the user responding to "dialogue boxes."
This interaction model of communication is, in the weakest sense, a dialogue: some information flows between two agents who are capable of acting on that information. While an interface to a given application may have hundreds of so-called dialogue boxes, dialogue in the human sense does not take place. There is no structure to the overall dialogue between user and interface from one dialogue box to the next and no memory of past dialogues or commands. Each command and action pairing is taken as completely independent of the next, so that there is no overall organization around the purposive intent of the overall set of "interactions" between the user and the machine.
Just as a dog doesn't always do what you tell it to, computers don't either. The interface is meant to inform users about what the computer can do, but, as we all know, short phrases are especially ambiguous in human language. Users have little means to resolve this ambiguity. If the meaning of a command is not obvious to them, they can at best try it out and hope that it does what they want, or they can make their way through a help system to determine if they are on the right track. All the while, they are required to be very explicit about every reference to objects, such as a files, that they make. While the user bears the burden of being explicit, the interface often communicates with ambiguity back to the user. For example, what should the user conclude is the meaning of the "ok" button in a dialogue box? "Yes, that's fine with me, I agree," or "I understood the words," or "well, I read the words" are possible, though in human discourse, these uses of "ok" convey very different responses to the content of utterances in the dialogue. Because users cannot communicate these distinctions, it becomes clear to them that the computer interface does not really know what it is doing. It's just a dumb machine.
Being able to be a collaborator is three steps up on the ladder of communication and work. The first is minimal interaction. Today's interfaces do not pass the "minimality test" because they do not know enough to do so. Only the user does any modeling or remembering of the interaction and its parts. Whatever role the machine has played in the interaction it completely forgets when it completes the action requested. It also is completely unaware of any difficulty the user may have had in determining the meaning of a command. Capturing this level of interaction provides a bare minimum of interactive understandingthe interface would have a more complete model of the back-and-forth nature of the communication than it does now even if it did not know why the user wanted to communicate in the first place.
The second step on the ladder of communication and collaboration is slave-like interaction. To perform this way, the interface must have a model of what the user wants to do. Current interfaces do not have such a model. The user's goals and tasks lie completely outside of interface, and there is no means to say anything about them. No part of the user's goals and tasks is recorded or even recognized.1 Instead, all of this information must reside only in the head of the user. None of it can be found in the application and its interface.
Some interfaces seem to be useful and quite satisfying to users. The metaphors on which they are based are highly predictive for users in determining what to do next. One such example is the interface for checkbook activities. I believe this is because the metaphor has been used to build in a model of the task the user is doing and to represent aspects of the task. The metaphor has also been used to keep the user narrowly focused on the task at handto balance a checkbook, write checks, or produce reports based on the checkbook information. As a result, the interface metaphor helps users work and also helps them predict what the interface is likely to do. While the interface is not aware of the task the user is doing, it is designed to do that task and to keep the user highly focused.
While it would be wise to continue to design interfaces carefully using well-thought-out metaphors, it will not solve the larger problems concerning interactivity, communication, and collaboration. No one metaphor is powerful enough for all work. Furthermore, lots of smaller applications each with an interface for performing one set of tasks leaves the user with lots of tasks to juggle. We still need an interface that communicates and collaborates, one that's at step three on the ladder. How do we get there from here? There is a great deal more known about human discourse communication that could be used in interfaces today. Recent work in linguistics, natural language processing, and psychology offers principles of communication that can be embodied in interfaces, even when they do not speak full human language. All discourse, of which dialogue is an example, is purposive behavior, and the structure of the discourse is organized and segmented according to purpose. The focus of attention of the discourse is used, among other things, to provide context, which means creating locality in the segments of the discourse for interpreting recent references and to help discourse participants assure that each of them is paying attention to the same items in the discourse (Grosz and Sidner, 1986). Grounding of utterances in the human-computer dialogue2 makes conversation more efficient by allowing people to leave out what is truly obvious to both participants, as well as to slow the conversation down in order to reestablish focus, correct for unwanted ambiguity, and determine the next participant who has the floor.
It is possible to build interfaces that make use of these principles (and associated algorithms, which I will not discuss here) while at the same time simplifying the interface itself. We are doing that in our current work on collaborative interface agents (Rich and Sidner, 1996). To do so, designers will need to think in terms of user purposes (not just what actions the interface permits), the structure of purposes, and the relationship between what the user must communicate and the purposes of the communication. Maintenance of focus of attention will provide users with a local context in which to complete their subpurposes and may even make it possible to introduce implicit means of referring to the objects of the application.
Far more research is needed. Most linguistics and natural language processing work is directed at progress in natural language/speech understanding and generation, in machine translation, or at more applied concerns such as language-based information retrieval. Uncovering the principles of human communication requires considerably more effort than has been undertaken so far. Applying those principles to interfaces is a largely untouched area of research. Little of this work is likely to occur in industrial research settings, as it is not near enough term for the needs of applied research now typical in industrial labs.
Recent work in understanding human collaboration and user modeling offers two sources of value to the interface: (1) the near-term ability to ground the interface in the users' goals and tasks and (2) the more futuristic ability to make the machine a collaborator with the user once those goals and tasks are available. What is known about collaboration makes it clear that collaborators come to share mutual beliefs about their ways of doing something (called the recipe), about their ability and intentions to do things, and about their commitment to completing the goal. The models of collaboration can also be linked to discourse not simply by saying that collaborators must communicate their beliefs to others, but through much more detailed models of the relationship of belief and intention as purposes for segments of discourse (cf. Lochbaum, 1995).
Recent work in modeling collaboration and discourse in interfaces (Biermann et al., 1993; Stein and Maier, 1995; Rich and Sidner, 1996; cf. Terveen, 1995, for an overview of human-computer collaboration) indicates this is a promising direction for research. While industrial groups as well as some university work has been directed at these problems, only the first steps have been taken in modeling interfaces after human collaboration. However, to extend this work to applications that real users would use on an everyday basis would require further research on human collaboration and more system experimentation in building interfaces. Two critical issues in human collaboration are more exploration of the means by which humans negotiate in collaboration (cf. Guinn and Biermann, 1993; Sidner, 1994a; Chu-Carroll and Carberry, 1995) and human collaboration in task domains that are richer than the simple ones (e.g., building simple physical equipment, gathering simple information) considered so far.
A new technology, which after long delay, is about to splash on the scene: speech. While I will confine my comments to speech input (recognition and understanding), similar comments apply to speech output. Speech will force issues about communication and collaboration. At the same time, if applied wisely, speech offers a valuable modality to many users who do not fit the profile of use for current interfaces. To use speech adequately, researchers must continue to address a number of technical problems in speech recognition and understanding. There are many technical problems in using speech well in interfaces. Some are related to speech recognition per se; others concern how interfaces are designed to use speech as the communication medium. Concerning the first set of problems, one must consider modeling the speech of small dialect populations. This is possible to do but may be overlooked because the cost may seem too high for market return for industrial labs to concern themselves with such populations. Yet small dialect groups make up part of the citizenship of our nation.
Having good speech recognition/production and understanding/generation technology is only the tip of the interface iceberg. A great deal more research is needed in understanding users and their interaction needs in the presence of speech.
Speech technology for desktop and telephony interfaces offers the potential of using computers in ways that users interact with other people. It also opens up a host of metaphorical uses3 that could enhance computer use or exacerbate our current use of metaphors in interfaces. Careful studies of users and applications with speech (such as the SpeechActs work of Martin et al., 1996) will provide speech interfaces that make use of communication principles. Special attention must be paid to the needs of users who communicate with special limitations. Research on the use of speech interfaces by visually or motor-impaired users and linguistically limited users4 is not likely to come from industry (as is evident from the current problems with recent operating systems providing interfaces for the blind) and will require industry/university/government collaboration to be feasible. Finally, user populations never considered before, such as the multiple millions of semiliterate and illiterate Americans, will require careful study in speech interfaces; this research is also not likely to occur in industry and will require joint research between industry and universities under government funding.
Speech as a modality naturally suggests speaking to someone. The speaking face is compelling not only because it is so imprinted on us from birth, but because it appears quite valuable to users in communicating. Recent research on faces, human or otherwise, has now captured the imagination of some interface researchers (e.g., Ball et al., 1996; Nagao and Takeuchi, 1994; Waters and Levergood, 1995). While much of the associated research concerns believable agents, that is, research on representing agents visually that are generally designed to have some effect on users (e.g., being persuasive, friendly, or entertaining), faces have inherent value for communication. While little is understood at the computational level about these matters, faces provide a locus for spoken communication and a means of introducing efficiency in the grounding aspects of dialogue. However, these issues are poorly understood, as are the means by which we find faces natural in terms of their micro-level changes (but see work by Thorisson, 1994, and Walker et al., 1994, for some directions to pursue). Research in these areas will require the combined efforts of researchers in a number of disciplines, including psychology, computational linguistics, linguistics, computer science, and media arts. Those aspects that apply to media and believable agents will probably be heavily funded because of their potential payoff in the new entertainment/computer industry. Communication-related matters will require some government funding to keep industrial applied research focused on this matter.
While current interfaces are hard to use and give few choices of
modality, we are on the brink of having available many new
technologies that can change the nature of interfaces. We must bring to bear
our knowledge of human collaboration and discourse communication
on these interfaces so that they serve a wider range of users. We must
extend our knowledge of collaboration and communication so that our
interfaces can grow into better collaborative partners as our work needs change.
NOTES
1. Although interface product groups are now aware of many of the micro-actions that users perform in a given software application, the only method they have come up with to help users is bottom-up recognition of micro-actions. They will never be able to do more because extending this solution to "higher-level" actions is computationally too hard.
2. By this I mean the process by which dialogue participants establish that the message was understood and determine who speaks next in the conversation and when (cf. Clark and Shaefer, 1987; Sidner, 1994a,b; Traum and Heeman, 1996).
3. A near-term example is name-dialing, which is the ability to call a person on the phone by simply saying the name to a telephone prompt.
4. By "linguistically limited," I mean people who have less-than-perfect knowledge or use of the majority culture language because they are nonnative speakers, have some cognitive/physical handicap, or have not yet been trained in the full range of the language owing to age or economic circumstances.
REFERENCES
Ball, Gene, Dan Ling, David Kurlander, John Miller, David Pugh, Tim Skelly, Andy Stankosky, David Thiel, Maarten Van Dantzich, and Trace Wax. 1996. Lifelike Computer Characters: The Persona Project at Microsoft Research. In Software Agents, Jeffrey M. Bradshaw (Ed.). AAAI/MIT Press, Cambridge, Mass.
Biermann, Alan W., Curry I. Guinn, D. Richard Hipp, and Ronnie W. Smith. 1993. Efficient Collaborative Discourse: A Theory and Its Implementation. Proceedings of the ARPA Human Language Technology Workshop. March. Princeton, NJ.
Chu-Carroll, Jennifer, and Sandra Carberry. 1995. Response Generation in Collaborative Negotiation. Pp. 136-143 in Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics. ACL, Somerset, N.J.
Clark, H.H., and E.F. Shaefer. 1987. Collaborating on Contributions to Conversations. Language and Cognitive Processes, 11(1):1-23.
Grosz, B., and C.L. Sidner. 1986. Attention, Intention and the Structure of Discourse. Computational Linguistics, 12(3).
Guinn, Curry, and Alan W. Biermann. 1993. Conflict Resolution in Collaborative Discourse. Proceedings of the 1993 International Joint Conference on Artificial Intelligence Workshop: Computational Models of Conflict Management in Cooperative Problem Solving, August. Chambery, France.
Lochbaum, Karen, E. 1994. Using Collaborative Plans to Model the Intentional Structure of Disclosure. Technical Report, Harvard University. Available on http://liinwww.ira.uka.de/searchbib/index.
Lochbaum, Karen E. 1995. "The Use of Knowledge Preconditions in Language Processing," Proceedings of the 14th International Joint Conference on Artificial Intelligence, Morgan-Kaufmann, San Mateo, CA, pp. 1260-1266.
Martin, P., F. Crabbe, S. Adams, E. Baatz, and N. Yankelovich, 1996. SpeechActs: A Spoken Language Framework. Computer, 29 (7):33-40.
Nagao, K., and A. Takeuchi. 1994. Speech Dialogue with Facial Displays: Multimodal Human-Computer Conversation. Pp. 102-109 in Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics. Morgan-Kaufman. San Francisco.
Rich, C., and C.L. Sidner. 1996. "Adding a Collaborative Agent to Direct-Manipulation Interfaces," Proceedings of UIST.
Sidner, C. 1994a. An Artificial Discourse Language for Collaborative Negotiation. Pp. 814-819. in Proceedings of the National Conference on Artificial Intelligence-94, Seattle. MIT Press, Cambridge, Mass.
Sidner, C. 1994b. Negotiation in Collaborative Activity: A Discourse Analysis. Knowledge-Based Systems, 7(4): 265-267.
Stein, A., and E. Maier. 1995. Structuring Collaborative Information-Seeking Dialogues. Knowledge-Based Systems, 8(2-3):82-93.
Terveen, L.G. 1995. An Overview of Human-Computer Collaboration. Knowledge-Based Systems, 8(2-3).
Thorisson, K.R. 1994. Face-to-Face Communication with Computer Agents. Pp. 86-90 in AAAI Spring Symposium on Believable Agents, March 19-20, Stanford University, Palo Alto, Calif.
Traum, D., and P. Heeman. 1996. Utterance Units and Grounding in Spoken Dialogue. ICSLP, October.
Walker, J., L. Sproull, and R. Subramani. 1994. Using a Human Face in an Interface. Pp. 85-91 in Proceedings of the Human Factors in Computing Systems Conference. ACM Press, New York.
Waters, K., and T. Levergood. 1995. DECface: A System for Synthetic Face Applications. Journal of Multimedia Tools and Applications, 1(4):349-366.
Lance McKee and Louis Hecht
Open GIS Consortium Inc.
GOAL: INTEGRATE THE NATIONAL SPATIAL DATA
INFRASTRUCTURE AND THE NATIONAL
INFORMATION
INFRASTRUCTURE
The National Spatial Data Infrastructure (NSDI), when opened up through geoprocessing interoperability interfaces based on the Open GIS Consortium's (OGC) OpenGISTM specification, will expand out of the domain of geographic information system (GIS) experts into the day-to-day lives of the general population. OGC's research and development goal is the development of the OpenGIS specification.
One goal of others in the national information infrastructure (NII) research and development community ought to be to examine the ways in which digital spatial data (geodata) can be most effectively used by citizens in their everyday way finding and transportation, electronic consumer purchasing, education, and interactive entertainment and also in the many existing and future jobs that will involve geodata and geoprocessing. Another research goal ought to be to seek new ways in which designers of virtual environments and visualization tools can make use of humans' spatial visualization abilities, including our almost innate ability to understand maps and aerial views.
Taking a longer psychological, social, and historical view of
every citizen, we should also research the various "media effects" of
digital maps. Maps of all kinds powerfully condition our thinking about
the world beyond our immediate viewspace. GISs, which enable
interactive viewing and intersection of multiple spatially coincident maps
representing diverse cultural and natural themes, promote holistic,
cross-disciplinary thinking. Widespread viewing and use of geographic
information potentially promote broad public global awareness in the same way
that views from orbiting spacecraft expand the world views of astronauts,
as reported by astronauts. If we assume that human-machine interfaces
and interactions affect consciousness, and if we care about the evolution
of consciousness, we ought to study and characterize these effects with
an eye toward developing high-level design principles that support the
development of interfaces and uses that nudge us toward greater
awareness of our relationships with each other and our planet.
MARKET AND TECHNOLOGY DRIVERS
Various market and technology drivers are converging to make geodata and geoprocessing a more important part of the NII.
Current producers of geoprocessing software have long looked for an expansion of their markets commensurate with the benefits their technology has to offer in many segments of society. That expansion has been inhibited by noninteroperability and difficulties in sharing data held in diverse proprietary formats. Open GIS interfaces will remove those barriers.
Society has a growing need for geoprocessing owing to growing population and worsening environmental problems; geographically distributed government and business activities; rapid globalization of many markets and activities; and increasing pressure on businesses, governments, and individuals to operate more efficiently.
There is a growing realization that much data (70 percent to 85 percent of data in all databases) has a spatial component that can be exploited in a variety of ways for more effective analysis and display.
Faster CPUs (central processing units) and high-performance image processing and graphics processing finally provide a base capable of supporting distributed geoprocessing, which often involves intense computation and large data files. Wider-bandwidth networks and distributed computing infrastructure (OLE/COM, CORBA, Java, etc.) and middleware and componentware architectures are important because so many geoprocessing applications benefit from transparent access to remote geodata stores and remote specialized geoprocessing functions and from integration of geoprocessing functions into other workflow. "gIS" with a lowercase g expresses the potential for open systems architectures and object technology to enable integration of geoprocessing as one (increasingly cost-effective) subordinated component of applications and decision support systems. Growth in the use of geoprocessing will occur as middleware and componentware approaches release geoprocessing from the confines of large, expensive, complex monolithic software systems.
Geoprocessing technology is proceeding as rapidly as the
general computing and telecommunications technologies and not only in the
area of geoprocessing interoperability interfaces. All of the following
support the wider use of geodata and geoprocessing by every citizen:
powerful spatial database technologies introduced by major database
vendors; smaller and cheaper geographic positioning systems (GPSs);
sophisticated, inexpensive, and abundant commercial earth imaging data
products; advances in digital orthophotogrammetry for satellite earth
imaging and aerial still and video imaging; continuing specialization and
product differentiation in the areas of GIS, CAD (computer-aided design),
and digital cartography; distributed interactive simulation; and
three-dimensional spatial data visualization techniques (including interactive
virtual reality approaches). These technologies hybridize in many ways.
For example, high-resolution satellite images and digital
orthophotogrammetry permit quite precise automatic generation of
three-dimensional views of the earth's surface.
GEODATA AND GEOGRAPHIC INTERFACES
Simple geodata accumulation is also a driver. There is only one
earth, and the set of all geodata is referenced to this one finite spherical
volume, like a rapidly growing onion of thematic maps of cultural and
natural phenomena. As network-accessible geodata accumulate in tens of
thousands of archives around the world, it becomes an ever-richer,
ever-more-significant basis for an ever-growing number of local and global
activities. It becomes one of the foundations of the new world culture of
the information age.
NETWORK-BASED GEOSPATIAL INFORMATION
Below are some examples of how network-resident geodata and geoprocessing resources will be used by every citizen. Most will involve simple, specialized, stylized interactive map displays. A set of research issues can be derived by examining the user interface requirements of categories of applications, such as simplicity, information density, and interactivity modes.
Citizens will use the NII to help them get from A to B. GPSs in car and cell phones will provide the coordinates of A, and a car's map display and the cell phone's multimedia yellow pages will show the way to B. The necessary geodata will be stored remotely and downloaded on demand, transparently to the user.
Geoprocessing middleware and componentware will compare the distances to multiple possible destinations. The multimedia yellow pages, for example, will show driving time or walking time to a selected set of nearby restaurants. The software need not be stored permanently in the information appliance.
Not just car drivers, but hikers, boaters, and visitors to a city will see on a little screen where they are and how to get to where they want to go. A numbered package en route from A to B will show up on a digital map display, showing where it is now on its route. (Some shippers already provide this service.) People waiting for buses and airplanes will see where the bus or airplane is, on a digital map, with estimated minutes till arrival.
More than 70 percent of database records contain spatial information. Every database and spreadsheet, and the compound documents and work environments in which these functions are embedded, will be able to make maps based on spatial information (usually street addresses) in data records. Spatial display and analysis will be important in many workflow scenarios.
Listed below are other geographic applications used by every citizen during daily life. Each has particular user interface requirements:
The number of applications for geodata is growing rapidly and
will continue to grow as the national and global spatial data
infrastructures develop.
MAPS
Maps are a part of most cultures because spatial thinking is an essential part of people's relationship with their physical and cultural environments. Even in simpler cultures that do not pass down written records, individuals make temporary maps to remind themselves or to show others how to find their way in unfamiliar territory. All birds and mammals form mental maps, and, as cooperative hunter-gatherers, humans developed sophisticated spatial awareness and spatial communications abilities that came to support other cultural activities besides physical way finding. For example, we say in a figurative sense that "we are on our way" to making the NSDI an integral part of the NII. User interfaces are collections of symbols and metaphors, and the map metaphor is inherently important in cyberspace. Basic research in spatial reasoning, spatial memory, and spatial communication would support development of better user interfaces that employ spatial display and manipulation.
Virtual reality will also help geodata users evaluate data sources.
Because so many geodata are available, and because geodata are
often complex, we will often be concerned about geodata quality, content,
and lineage. A system of geometric shapes could be used to represent
certain content parameters, and their shape, color, and motion could
represent quality parameters. A human-computer interface could be a map or
an image. Once you center on a spot you can call up various basic icons
that represent data objects. It is easy on the Internet to find lots of data
but hard to sift through all the data. The interface ought to be able to
tell every citizen easily and intuitively about the "goodness" of the data.
For example: How does software communicate to a skier who wants to
see the snow pack at eight Rocky Mountain ski resorts? The skier
finds imagery, but it is summer data, not winter data, so an error signal
intervenes. Through user configuration of simple preference files, the
computer system knows that skiing requires winter data.
DIGITAL MAPS
Paper maps are a special form of printed communication, important to motorists, subway riders, explorers, scientists in many disciplines, historians, municipal service agencies, shippers, travelers, property owners and managers, and marketers. The utility of maps is amplified in several ways by computers and networks. A GIS, for example, is like multiple same-region overlaid thematic maps drawn on clear film, a visual-interface spatial database. You can query a GIS to meld thematic maps into a new map showing, for example, all the areas 3,000 feet or higher in elevation, within 50 meters of a standing body of water, within 1,000 meters of a road, where most of the trees are pines, where the slope of the ground is less than 10 percent, and where the population density is less than 1 person per square mile. (More spatial temporal reasoning research needs to be done on how to articulate the conditions of a spatial search.) Digital technology allows storage of (and network access to) huge quantities of geodata; zooming, panning, and other kinds of interactive manipulation that overcome the limitations of paper space and human visual acuity; real-time tracking; input from GPS and earth observation satellites; and instant display of nonspatial data (text, pictures, graphs, etc.) associated with selected map features or locations.
Through paper maps every citizen is familiar with graphic abstraction of large terrestrial spaces. Digital maps apply this helpful information presentation convention to vastly greater information domains. Digital maps and three-dimensional virtual fly-overs and fly-throughs will be an important part of many graphical user interfaces because everyone intuitively understands maps and aerial views and many kinds of information have a spatial component that makes spatial representation and visualization appropriate.
The new media are "massaging," in Marshall McLuhan's term,
our individual minds and collective culture away from text-induced
linear sequential thinking toward nonlinear thinking characterized by
multiple simultaneous modalities. Spatial display and analysis offers a visual,
intuitive, and effective means for solving a wide range of complex
problems. Visualization of geographic information, or visualization of
information geographically, helps people cope with information glut.
Virtual reality applications will employ spatial representations of real spatial
phenomena, but they will also employ spatial representations of
nonspatial phenomena simply because our brains are hardwired for solving
problems in three-dimensional space. Important parts of the software
and data for configuring and populating cyberspace will be borrowed
from geoprocessing applications and geodata archives and data feeds.
Similarly, research into spatial thinking will ultimately benefit both
"real space" and cyberspace applications.
RESEARCH ISSUES
Several research issues are identified in the text above. OGC's research and development in the area of geoprocessing interoperability is primary in the sense that spatial data will have a much greater role in the NII when diverse systems can exchange diverse kinds of data and access other systems' geoprocessing resources. Many applications will then be using geodata, and application developers will be looking for ideas and guidance concerning geoprocessing user interface development. Useful research will draw inspiration from traditional cartography and from general ideas about user interfaces.
Over the next 20 years we will learn more about how people function while immersed or partially immersed in virtual environments. We will learn what problem simulation schemes work best and what kinds of problems are most fruitfully addressed by these schemes. Many of these environments, certainly, will include extended landscapes representing real or imaginary spaces, and the role of spatial reasoning, spatial memory, and maps will be of interest.
Everyone views the world differently. This is an issue for Open GIS specification developers because different geodata producers and users give the same geographic feature different names and sets of descriptive parameters and different metadata. Part of the specification proposes semantic translators that domain experts from two different domains will configure to enable semiautomatic translation and integration of geodata. The problem is a difficult one and is much broader than geodata integration because computer users involved with other computer users need common interfaces to enable effective communication and collaboration. Map interface developers as well as other kinds of interface developers need to address the issue of standard symbology and usage.
Undoubtedly, commercial research and development projects
and market activity will generate many of the dominant productizable
ideas and standard graphical and conceptual approaches. Academia
should take a longer view; it should (1) address the cognitive and broader
social effects of developments in the spatial subdomain of the multimedia
world and (2) look in very basic ways at how user interface design can
layer most elegantly on our legacy wetware and cultural firmware and
leverage most powerfully a positive vision for the future. The government
has a role in cataloging and tracking evolving research topics of all kinds
and supporting those that best serve the nation and the world community.
By participating as technology users in industry consortia (such as OGC)
that include users in technology planning and specification efforts,
government agencies can ensure that the technology provider community
meets agency needs and can influence the direction of technology that will
become part of the larger economy and culture.
GATHERING AND INTEGRATING INFORMATION IN THE
NATIONAL INFORMATION INFRASTRUCTURE
Craig A. Knoblock
University of Southern California
GOALS AND ISSUES
A critical and unsolved problem for the World Wide Web is how to gather and integrate the huge amount of available information in an automated fashion. Web browsers and search engines are an enormous step forward in providing access to the available information. Yet they rely heavily on hypertext links, which require a human to navigate. I believe an important goal for the national information infrastructure (NII) is to develop the infrastructure, technology, and tools to provide automated gathering and integration of data. Consider an example from the financial domain, where the ability to integrate the large amount of available data would make that information much more accessible and useful. Someone might be interested in investing in the airline industry but is undecided about which airline stock to buy and wants some additional information. Some of the information they might like to have includes the annual report, one-year price history, and current trading price for all U.S. airline stocks, and they might want them presented in order of increasing PE (price/earnings) ratios. All of this information is publicly available, but knowing where to look and integrating the information is not a simple task.
Assuming that users knew where to look (and that is a strong assumption), they could put together this information using the following steps. First, they must determine what all of the airline stocks are. This could be done by examining the prospectus of each publicly traded stock to see if it falls into the SIC (Standard Industrial Classification) category of Scheduled Air Transportation, but that would be very time consuming. Another approach would be to find an information source that contains a list of airlines, such as the Eaasy Sabre page that includes all of the airlines for which it provides reservations. This information could then be used on another page to input the name of a company and get back the ticker symbol (the symbol by which the stock is listed in one of the stock exchanges) for that company. Using that information, they can then go to another page and enter the ticker symbol to get the current trade price and PE ratio. They would go to yet another page to find the one-year price history in a graphical form. And they could go to the Securities and Exchange Commission (SEC) Edgar Archives to find the annual report filed with the SEC but in this case accessed using the company name. Even within the Edgar Archives, they would need to know that the code for an annual report is 10K in order to extract it from the 20 or 30 reports available for each company. This entire process would have to be repeated for each airline, and they would not know how to order them until they had the data for each company.
Ideally, instead of going through this tedious process, a user
could simply issue the query to a software agent for the financial domain
and that agent would know where to retrieve the relevant information
and how to process it to produce the data requested by the user. The goal
then is to develop the infrastructure and tools required to easily construct
and maintain software agents for querying and integrating information in
any domain of interest. One could laboriously construct and maintain
such software agents by writing specialized software applications. In fact,
that is the current state of the art. The limitation of this approach is that
such specialized applications are difficult to maintain and extend. If a
new source becomes available, a programmer must modify the program
in order to exploit that information. Also, each new application
domain requires constructing a whole new program.
RESEARCH PROBLEMS TO BE ADDRESSED
There are a number of research issues that must be addressed in order to realize the goal described in the previous section:
PROPOSED RESEARCH PROJECTS
The various issues described above can be broken down into
four possible research projects. The first two projects would provide the
core work, and the other two projects would be important in making the
resulting research useful to everyday users.
Representation and Ontologies
The problem of representing both the syntax and semantics of
sources is central to addressing the entire problem. The goal of the first project
is to develop approaches to describing the syntax and semantics of
Web pages. Given the size of the task, I would expect that this would be
a fairly large, long-term problem. The hope is that this would
eventually lead to standards for marking up pages with semantic information and
to the development of domain-specific ontologies that can be used for
information integration.
Planning Information Gathering and Integration
Given a description of the available sources, the problem still
remains as to how to select and integrate the most relevant information. There
are a number of challenging aspects to this problem, such as handling
semantic discrepancies, resolving syntactic differences, and evaluating
reliability and recency. In addition, another critical aspect to this problem
is performing the processing efficiently, since access to many
Web-based sources can be very slow, and plans that require access to many
sources can require hours or days.
Machine Learning and Data Mining of Sources
Because of the need for large amounts of information about
sources and how they relate, machine learning and data mining will play a
critical role in simplifying the task of the first two projects. Machine learning
can be used to help build grammars for parsing sources as well as for
building models that describe the contents of sources. Data mining can be used
to find relationships in individual sources and between sources, which
can then be used to optimize the query processing.
Natural Language Processing and User Interfaces
Natural language querying and/or graphical interfaces are needed
to provide user-friendly access to information. In addition, user
interfaces are important for displaying various types of multimedia data and
for providing tools to help build models of sources.
POTENTIAL IMPACT
The overall project has the potential to make a dramatic impact. If
the work is done right, it could change the nature of the NII. Instead of
using browsers and search engines, users would have access to a wide range
of specialized agents that could quickly locate and integrate the huge
amount of available data. Instead of being a data repository, the NII could be
a knowledge repository.
INTEGRATING AUDIENCES AND USERS
John Richards
Turner Le@rning Inc.
LIFELONG LEARNING
The National information infrastructure (NII) is designedfrom
a technological perspectiveto integrate a wide range of
communication and information systems including video delivery, telephony,
computer networks, and on-line services. These are, for the most part, known
technologies, and the technological integration is a matter of time and money.
In contrast, we have not yet begun to think about the integration of
diverse media from a human perspective. People relate to even a
single medium in very different ways, determined by the context of use and
by the individual's understanding of the situation. The development of
functional multiple media learning environments is not simply the result
of combining different media typesan additive processbut consists
of creating a brand new kind of media, a transforming process. The
development of interfaces for these new media types depends on coming
to understand the ways in which people will come to use the media
for learning. As real video and truly interactive networking are
integrated, television audiences do not simply become network users or vice versa.
Instead, there are qualitatively different experiences in store. In this
paper I consider some issues that we should try to anticipate in the
construction of a voice, video, and data infrastructure that provides the
opportunity for just-in-time learning throughout life.
ANALOG VS. DIGITAL
Kaufman and Smarr [1993] (p. ix)Our analog modes of communication by voice, print, and video are
gradually being replaced by digital modes. Ultimately, most of
human knowledge will be stored in a common digital library
This transformation from analog to digital will have deep implications for human knowledge, and, in my judgment, even deeper implications for human communication and relationships. Kaufman and Smarr argue that the fundamental idea of replacing the continuous world of nature with a model of that world, formed of discrete units, has transformed the pursuit of science. They argue that the digital perspective provides a controllable, imaginable representation that, ultimately, provides the key to a more comprehensible world.
How, then, does our understanding of the content of media differ as we move from analog to digital? This is not an arcane question pertaining to the relevance of simulations on supercomputers; rather, it affects all of our interactions with voice video and data technologies. For example, in the most developed of these digitizations, consider how writing e-mail messages differs from writing letters. It certainly seems that only yesterday we were seeing paeans to letter-writing as a dying medium. The commonly accepted explanation was that writing was the problemculture was deteriorating, and the literacy required for letters was a lost art. In retrospect, given the success of e-mail, the delays in letter delivery were simply not tolerable, especially when contrasted with the immediacy of telephones. But e-mail did not change the nature of letter writingit replaced letter writing with a full panoply of alternatives. Not only did a new, more informal genre evolve, but entirely new forms of written communication also evolved with entirely new rules for participation. E-mail evolved into bulletin boards and "listservs" that do not have straightforward analogs in the letter-writing or, more generally, precomputer culture. And even more distinct forms of communication are only now appearing. As argued by Sherry Turkle [1995], chats, MUDs, and MUSEs are developing unique, and unprecedented, participation structures. How will these conversations change with the easy availability of voice on the Internet? How will putting telephone (or video-telephone) on the Internet change the nature of a phone call? What new forms will evolve?
In my judgment, though, the most profound differences will
occur with video. The control added to video through the digitization
process changes the nature of the videomore importantly the digitization
inherent in the NII brings together television and computers, two
technologies that have been distinct in development and production. More
significantly, these technologies are culturally quite distinct. As we talk
about an infrastructure that integrates voice, video, and data, we must
consider the power of the cultural differences of these technologies and their
complex contexts of use.
CABLE MODEMS VS. CABLE BOXES
Prior to coming to Turner, I lived in Newton, Massachusetts, and had cable television installed. The cable came into the basement of my home and was routed up to my bedroom, into a cable box, through my VCR and into my televisionmy "entertainment center." In March of 1996, I became part of an early trial of cable modems by Continental Cablevision and BBN. The cable into the basement was split and part of it routed into my home office, into a cable modem, and then into my computer, providing ethernet-speed access to the Internetmy "work center." The two setups were separated by a thin wall, two different boxes, and two different monitors.
What separates these worlds? Why aren't they going to the same box or the same monitor? When will I see a picture within a picture? While watching TV the 2" square picture in the corner is the Internet. When surfing the Web the 2" square picture is the television signal. What separates these worlds are the viewers/users and the industry standards and expectations.
This is a temporary division. The cable industry has promised a
free cable modem to every U.S. school they pass as they provide this service
to the community. These will be 5 to 10 feet from television cable boxes.
What will the interface be when these are a single box connected to
a single monitor? How are we to think of television as an inset in
softwareand conversely? Are the two streams of data to be integrated
for interactive television or video-enhanced software? Once all the
technologies are digitized, there is no functional delivery difference between
television, e-mail, phone, radio, movies, or even the networked alarm
systems in people's homes. From a human perspective, this
convergence will represent remarkable transformations in the nature of the media.
PARTICIPATION VS. DELIVERY
Networking has been dominated by a philosophy of participation and user constructibility. From the beginning of the ARPANET, for national security reasons networking has been distributed with no central locus of control.1 The removal of any node would have no effect on the rest of the system. Moreover, the wild success of the World Wide Web is precisely because it so adeptly fits the underlying participatory philosophy of networking.
Television and cinema, in contrast, have been dominated by centralized delivery models. Beginning with Hollywood domination of movie-making, and continuing with the "big three" U.S. broadcast companies, television and cinema content has been tightly controlled, produced, and distributed. Even as television audiences are being analyzed as "active meaning producers of texts and technologies . . ." (Ang [1996], p. 8), this is seen as a postmodern development that is only now being taken into account. In particular, as the plethora of programs provides choice, the audience is seen as being freer to construct meaning through participating in these choices. Television itself is evolving as the surrounding technologies change:
Ang [1996] (p. 12)The VCR disrupted the modern entanglement between centralized
transmission and privatized reception because it displaced the locus of
control over the circulation of cultural texts to more local contexts.
Thus, there is a different experience when the same movie is shown
in a cinema, on TV, or as a tape played at home on your VCR. In fact,
the audience is different, with different expectations regarding
interruptionsa movie theater is continuous and commercials are resented,
television presentations are "geared" toward the interruptions, and only
with a rented tape can the bathroom break be at the discretion of the audience.
Replays are possible with the VCR. In short, the nature of the medium
is changing because of the role of the active audience.
AUDIENCE VS. USERS
Each of the media carries with it different relationships with its users/audiences. And these relationships are not dependent solely on the media in isolation. Consider the distinctions John Ellis [1982] draws between the audience of cinema and the audience of television. The cinema spectator is a voyeur. The television viewers are "uninvolved in the events portrayed" and ". . . are able to see 'life's parade at their fingertips,' but at the cost of exempting themselves from that parade for the duration of their TV viewing" (Ellis [1982], pp. 169-170). The spectator pays for the cinema, and resents any commercial intrusion during the showing. The viewer accepts television commercials as a part of the basic structure of watching.
How does this compare with the audience for video in software? Or video on the Web? As software developers we have naively assumed that the introduction of the new media types fit in with the nature of the softwarevideos, pictures, and sounds are included for motivational purposes, or as illustrations of some concept, and have little or no fundamental effect on the user.
Moreover, the deep distinctions between viewers and spectators suggest that the computer/user relationship may, and probably does, change with the introduction of the Web. Typically the computer/user relationship is one-to-one (or two or three at best), essentially an individual participation structure. The Web is somewhat different without many precursors. It is essentially a social structure. The underlying metaphor is that when we are connecting with other individuals there is a dynamic, changing, unstructured, cluttered world.
There are different educational philosophies that have grown
up around the technologies. There are two cultures in technology and
educationperhaps this is parallel to the C.P. Snow observation. These
two culturesdigital (computers/networking and education) and
analog (TV/cable video)know little of each other.
VIDEO IN SOFTWARE VS. SOFTWARE IN VIDEO
There is a small stream of research and development that has evolved in the intersection of video and computing. Kristina Hooper Woolsey and Bob Mohl produced the Aspen project. The user drove through the town of Aspen by manipulating a touch-sensitive screen. The branching structures themselves are sufficiently constrained that it is possible to anticipate all choices (at an intersection in the road you can stop or go forward, backward, right, or left). It is possible to film alternatives. The mapping metaphor provided the basis for the more successful commercial product, Palenque. Other early attempts to tie TV and computing were Sam Gibbon's and Bank Street's The Voyage of the Mimi, and John Bransford's Cognition and Technology Group at Vanderbilt's The Jason Project (Cognition [in press]). Another interesting attempt by Woolsey and the Apple Multimedia Lab is the Watson and Crick DNA story, based on a BBC production.
More recently, at Turner, we have experimented with several qualitatively different attempts to integrate the two cultures. CNN Newsroom is broadcast by CNN at 4:30 a.m. for taping by teachers. Traditionally, teachers would receive teachers' guides by fax or through a centralized distribution within a state. More recently, the teachers' guides may be downloaded from cnn.com/newsroom on the World Wide Web. We are working together with researchers at the Center for Educational Computing Initiatives at MIT who have set up a system to automatically digitize the broadcast, separate it into meaningful segments, and make it available through streaming video on the Web. The teachers' guides are also separately available for each segment. This qualitatively changes what teachers can access. They can choose particular segments from the day's broadcast, and they have access to an indexed history.
Turner Le@rning has also been experimenting with electronic field trips. Students participate in two to four weeks of curriculum activities involving videotapes, data disks, electronic chat groups, and print materials. The midpoint of the curriculum is marked by two live broadcasts where experts at the field trip location respond to students' questions, submitted to an 800 number or over the Web by students. The student's act of asking a question changes the presumption of the broadcast. Moreover, the variety of media is specifically designed to foster active participation on the part of the teacher and student in the construction of their learning.
As the broadcast and cable media become more involved in
the Internet, the nature of television is also changing. CNN networks
(CNN, CNN Headline News, CNN Airport News, CNN fn) download feeds
from CNN bureaus worldwide. In each network a team of editors and
writers produce stories that are then televised. CNN
Interactive is a Web site that is produced and distributed in much the same way. It is this
unique television-oriented model, with constant news-based updates, that
accounts for its immense popularity (over 5 million page-views per day).
How will television-based concepts translate onto the Web?
Currently, Web sites change weekly, if you are fortunate, and daily at the very
best sitesCNN's timely updates are very much an
anachronismrequiring over 140 programmers/writers. At what point will we be
programming the Web as we program television, with sites changing according to
the time of day? And how will this be modified by the Web's ability to
adjust for your interests and history?
CONCLUSION
The rise of image in communication is more than a matter of educating ourselves to analyze and interpret visual experiences. Rather, as argued by Taylor and Saarinen [1994], the incorporation of images in presentations has changed the very nature of communication. Text by its very nature is linear, sequential. A picture or video allows for an infinite series of branches.
This may not be a new stage of meaning but a return to an old one. McLuhan [1964] argues that prior to Gutenberg, story telling relied on images and metaphors that were much more generativetaking into account the multiplicity of the audience and the individual construction of meaning.
What I am suggesting in this paper is that the integration of
video media with computer technology is not a quantitative difference but
a qualitative difference that requires that we begin to rethink learning
in this digital world.
BIBLIOGRAPHY
Ang, Ien [1996]. Living room wars: Rethinking media audiences for a postmodern world. London: Routledge.
Cognition and Technology Group at Vanderbilt (in press). The Jason series: A design experiment in complex, mathematical problem solving. In J. Hawkins and A. Collins (Eds.), Design experiments: Integrating technologies into schools. New York: Cambridge University Press.
Ellis, John [1982]. Visible Fictions. London: Routledge and Kegan Paul.
Kaufman, William J. and Smarr, Larry L. [1993]. Supercomputing and the transformation of science. New York: Scientific American Library.
McLuhan, Marshall [1964]. Understanding media: The extensions of man. New York: New American Library.
Taylor, Mark C. and Saarinen, Esa [1994]. Imagologies: Media philosophy. London: Routledge.
Turkle, Sherry [1995]. Life on the screen. New York: Simon and Schuster.
NOTE
1. To understand that this is not necessarily true of the technology but has arisen as a deliberate design decision, notice the "star-nets" that MIS departments always try to establish. This design emphasizes the collection of data (attendance and grade information, or inventories, or bank accounts) in one central location, and the distribution of centralized resources or information (paychecks, reports, decisions).
INTELLIGENT AGENTS FOR INFORMATION
Katia P. Sycara
Carnegie Mellon University
GOALS AND ISSUES
The overall goal of the every-citizen interface research program is
to provide fundamental research and enabling technologies for the
development of computer interfaces that allow easy access to the national
information infrastructure (NII) by many citizens, ranging from software
experts to physically or mentally handicapped persons. Given the
current nature of computer technology and current characteristics of the NII,
there are many issues that must be addressed.
CURRENT PROBLEMS
Effective use of the Internet by humans or decision support machine systems has been hampered by some dominant characteristics of the infosphere. First, information available from the Net is unorganized, multimodal, and distributed on server sites all over the world. Second, the number and variety of data sources and services are increasing dramatically every day. Furthermore, the availability, type, and reliability of information services are constantly changing. Third, information is ambiguous and possibly erroneous owing to the dynamic nature of the information sources and potential information updating and maintenance problems. Therefore, information is becoming increasingly difficult for a person or machine system to collect, filter, evaluate, and use. Current interface technology, dominated by the Web browser paradigm, besides being slow, lets users do the access, filtering, interpretation of raw data through point and click, and text/graphic cognitive processing. Current NII technology has a number of limitations, including the following:
Addressing the above set of current interface technology
limitations can constitute a list of requirements for short-term (2 to 3 years)
and medium-term (3 to 5 years) research requirements. Additional
longer-term requirements include supporting the flexible physical realization
of "action-at-a-distance" capabilities and making the computer a real
collaborative partner of the user.
RESEARCH ISSUES
To address the above-mentioned goals and requirements, researchers from different disciplines, such as computer science, cognitive science, human factors, physiology, psychology, design, and engineering, will need to collaborate.
The paradigm of intelligent agents has shown initial promise for handling some of the above problems, especially information location, filtering, and integration. Although a precise definition of an intelligent agent is still forthcoming, the current working notion is that intelligent software agents are programs that act on behalf of their human users in order to perform laborious information-gathering tasks, such as locating and accessing information from various on-line information sources, resolving inconsistencies in the retrieved information, filtering away irrelevant or unwanted information, integrating information from heterogeneous information sources, and adapting over time to their human users' information needs and the shape of the infosphere.
To make intelligent agents a reliable part of interface technology, some important research issues must be addressed. They include the following:
Additional research issues include:
PROJECTS
To make progress along these issues, a variety of projects should
be instituted. Small projects to investigate circumscribed issues (e.g., how
to structure agents, interaction protocols, how to determine
information credibility) and larger projects that should involve collaboration
between academic institutions, government, and industry. For the moment,
industry seems to lead in the NII. The smaller projects should involve
only academia and should investigate fundamental longer-term issues, so
that the longer-term goals (e.g., computer systems becoming real
collaborators and partners of humans) should be addressed.
INTELLIGENT INFORMATION AGENTS
Johanna D. Moore
University of Pittsburgh
INTRODUCTION
The evolving national information infrastructure (NII) has made
available a vast array of on-line services and networked information and
networked information resources in a variety of forms (text, speech,
graphics, images, video). At the same time, advances in computing
and telecommunications technology have made it possible for an
increasing number of households to own (or lease or use) powerful personal
computers that are connected to this resource. Accompanying this progress
is the expectation that people will be able to more effectively solve
problems because of this vast information resource. Unfortunately, development
of interfaces that help users identify the information that is relevant to
their current needs and present this information in ways that are most
appropriate given the information content and the needs of particular users
lags far behind development of the infrastructure for storing, transferring,
and displaying information. As Grosz and Davis (1994) put it, "the
good news is that all of the world's electronic libraries are now at your
disposal; the bad news is that you're on your ownthere's no one at the
information desk." In this paper I provide desiderata for an interface that
would enable ordinary people to properly access the capabilities of the NII.
I identify some of the technologies that will be needed to achieve
these desiderata and discuss current and future research directions that
could lead to the development of such technologies. In particular, I focus
on ideas related to agents and system intelligence and ways in which
advances in these areas could enhance eventual interfaces to the NII.
DESIDERATA FOR AN EVERY-CITIZEN INTERFACE TO THE NII
As I envision it, an every-citizen interface would consist of intelligent information agents (IIAs) that can:
INTELLIGENT INTERACTIVE QUERY SPECIFICATION
Database query languages allow users to form complex queries that request information involving data entities and relationships among them. Using a database system, users can typically find the information they require or determine that the database does not contain such information. However, to use a database system, users must know which data resource(s) to access and must be able to specify a query in the appropriate language. That is, the users must essentially form a plan to identify and access the information they require to achieve their information-seeking goals. In contrast, keyword-based search engines for the World Wide Web allow users to search many information resources at once by specifying their queries using combinations of keywords (and indications of whether or not the keywords are required to occur in the document, whether they must occur in sequence, etc.). Such search engines do not require users to form a detailed plan, but they often turn up many irrelevant documents and users typically do not know what data resources have been examined. Moreover, keyword-based search engines provide users with a very crude language for expressing their information-seeking goals. To provide the kind of interface I envision, IIAs must be able to work with users to help them express their information-seeking goals in terms that the system understands and can act on. The IIA should then form a plan to find information that may help users achieve their goals. That is, we would like to provide technology that would allow users to tell their systems what information-related tasks they wish to perform, not exactly what information they need, and where and how to find it. For example, as an associate editor for a journal, I often need to find reviewers for papers on topics outside my area of expertise. I know that information is out there in the NII that could help me identify appropriate reviewers, but finding it is a difficult task. What I'd like is an IIA that could accept a goal such as "find me highly qualified, reliable reviewers for a paper on parsing using information compression and word alignment techniques'' and perhaps a preference on the ranking of solutions, such as "and disprefer reviewers who have recently written a review for me.'' An interactive agent that did not know how to determine whether a researcher is "highly qualified" could engage in a dialogue with its user to determine how to assess this. For example, the user may tell the agent to assess this by counting articles in well-respected journals or by counting citations in the citation index. Again, if the agent did not know how to determine what the user considered well-respected journals for this particular situation, it would work with the user to define this term and so on.
As a more "every-citizen" example, imagine a patient who has
just been prescribed a drug and then catches the tail end of a news
story suggesting that several people have become critically ill after taking
the drug. This user would likely have a goal such as: "tell me about the
side effects of Wonderdrug" and "show me the serious side effects first." If
no information on "serious side effects" were found, the agent should
work with the user to define the term more precisely. For example, the
agent could provide the user with a list of the types of side effects it
encountered and ask the user which type(s) he or she considers serious.
PLANNING FOR INFORMATION ACCESS
Once the agent has worked with the user to identify his or her goals, it must be able to form a plan to acquire the information that will aid the user in achieving those goals. IIAs must be equipped with strategies that tell them how to form such plans and must also be able to trade off the urgency of the request against the cost of accessing different information sources and the likelihood that a particular plan will be successful. In the journal editor example I gave earlier, the agent may need to be capable of determining which information sources would be most likely to help find an appropriate reviewer before the end of the day. In the drug example the agent may need to take into account the cost of accessing databases put out by pharmaceutical companies. Agents must also reason about how much advance planning to do before beginning to act and how much information they should acquire before planning or acting in order to reduce uncertainty.
Making progress on these issues will require integrating several ideas coming out of the planning community, including planning under uncertainty (Kushmerick et al., 1995); reasoning about the trade-off between reactive and deliberative behavior (Bratman et al., 1988; Boddy and Dean, 1994); planning for contingencies (Pryor and Collins, 1996); and techniques that integrate planning, information gathering, execution, and plan revision (Draper et al., 1994; Zilberstein and Russell, 1993).
To support agents in forming such plans, new types of
automatic indexing schemes must be devised. Data may need to be indexed
in multiple waysfor example, reflecting different purposes the data
may serve or different levels of detail. In the World Wide Web, links
going into and out of a document characterize that document and may be
useful in forming indexes to it (as is done in citation search systems). In
addition, automatic indexing schemes that work across modalities are needed.
INTELLIGENT MULTIMEDIA PRESENTATION OF INFORMATION
IIAs will be able to acquire information from many different information sources in a variety of media. These systems will need to be able to plan multimedia presentations that most effectively communicate this information in ways that support users in achieving their goals and performing their tasks. For example, an IIA helping a visitor to the Washington, D.C., area identify good Thai restaurants may provide a Consumer Reports-like chart rating the 10 best restaurants on a variety of features, a city map showing where the restaurants are located relative to the user's hotel, and spoken excerpts from restaurant reviews that are coordinated with highlighting of the row in the chart and dots on the map that correspond to the restaurants being described. We would also like such multimedia presentations to be tailored to the user's background and preferences, the task at hand, and prior information displays the user has viewed. In the restaurant example, if the system can determine that the user is not familiar with the D.C. area, specific directions to the various restaurants may be given, whereas for a D.C. native an address may be sufficient. If the user has previously requested detailed directions to one restaurant and then requests directions to another restaurant nearby, the system may describe the location of the second restaurant relative to the location of the first.
Owing to the vast information resources that are now available, improved networking infrastructure for high-speed information transfer, and higher-quality audio, video, and graphics display capabilities, intelligent multimedia presentation is an active area of research. As Roth and Hefley (1993) define them, intelligent multimedia presentation systems (IMMPSs) take as input a collection of information to be communicated and a set of communicative goals (i.e., purposes for communicating information or the tasks to be performed by the user requesting the information). An IMMPS typically has a knowledge base of communicative strategies that enable it to design a presentation that expresses the information using a combination of the available media and presentation techniques in a way that achieves the communicative purposes and supports users in performing their tasks. Roth and Hefley argue that IMMPSs will be most effective in situations where it is not possible for system developers to design presentation software because they cannot anticipate all possible combinations of information that will be requested for display. This is clearly the case for an every-citizen interface to the NII.
IMMPSs must perform several complex tasks. They typically consist of a presentation planner, a number of media generators, and a media coordinator. The presentation planner uses presentation design knowledge to select content to be included in a display intended to achieve a set of goals for a particular user in a given context. It uses its knowledge of techniques available to the various media generators to apportion content to media and generate a sequence of directives for individual media generators. Media generators (e.g., for natural language text, speech, and graphics) must determine how to convey the content given the directives they receive from the planner and then report back their results to the presentation planner and media coordinator. The coordinator must manage interactions among individual media generators, resolve conflicts, and maintain presentation consistency.
Considerable progress has been made toward systems that perform these tasks for limited domains, user tasks, data, and presentation types. For example, extant prototype systems can coordinate text and graphical depictions of devices for generating instructions about their repair or proper usefor example, Comet (Feiner and McKeown, 1991) and WIP (Wahlster et al., 1993). These systems generate multimedia presentations from a representation of intended presentation content and represent progress toward some of the functionality desired in an every-citizen interface. For example, these systems can effectively coordinate media when generating references to objects (e.g., "the highlighted knob"; McKeown et al., 1992; Andr and Rist, 1994) and can tailor their presentations to the target audience and situation (McKeown, 1993; Wahlster et al., 1993). In addition, it generates its presentation in an incremental fashion. This allows it to begin producing the presentation before all of the input is received and to react more promptly if the goals or inputs to the generator are changed. These are important features for an IMMPS that will be used in an interface that is presenting information from the NII. Another important area of recent research is in coordinating temporal media (e.g., speech and animation), where information is presented over time and may need to be synchronized with other portions of the presentation in other media (Feiner et al., 1993; Andr and Rist, 1996).
Ideally, an IMMPS would have the capability to flexibly construct presentations that honor constraints imposed by media techniques and that are sensitive not only to characteristics of the information being presented but also to user preferences and goals and the context created by prior presentations. Researchers working in text generation have developed systems that are capable of using information in a discourse history to point out similarities and differences between material currently being described and material presented in earlier explanation(s), to omit previously explained material, to explicitly mark repeated material so as to distinguish it from new material (e.g., "as I said before, 1dots"), and to use alternative strategies to elaborate or clarify previously explained material (Carenini and Moore, 1993; Moore, 1995; Moore et al., 1996).
This prior research requires rich representations of the information that is presented, as well as models of the user's goals, tasks, and preferences. Extending this work for an interface to the NII will require research on standardized data modeling languages and/or translation kits and reusable models of common tasks. In addition, IMMPSs capable of operating with shallower representations must be developed.
Finally, we cannot expect and may not even want IMMPSs to
be monolithic systems that completely design presentations according
to their own criteria. Thus, systems must be devised that can provide
many levels of assistance to users in the presentation design process.
Users cannot be expected to fully specify presentation design choices; it is
more natural for them to learn a language for expressing their tasks and
goals than to learn a language for describing presentation techniques. In
some cases, users will have preferences about presentation design in advance
of display generation. In other cases they will want the ability to alter
the way information is presented once they have seen an initial presentation.
Research is needed to develop natural, flexible interfaces to support
interactive design, such as those described by Roth et al. (1994, 1995).
USER INTERFACE ENVIRONMENTS FOR
INFORMATION EXPLORATION
Even if IIAs can be provided that accept the type of queries I envision, users will want the capability to browse or explore the NII. This may be because they could not articulate a query (even interactively) until they saw some of what was available or because the information they received led them to want further information. In addition, users may want to see some of the information in more detail or see it presented in a different manner. For example, a user who is relocating to a new area might request a visualization that shows several attributes of a set of available houses and relationships between them (e.g., number of rooms, lot size, neighborhood, and asking price). Once this display is presented, the user may then want to select some subset of the particular houses contained in the original display, pick up this set, and drag-and-drop it on a map tool to see more precisely where the houses in the set are located.
To provide these kinds of capabilities, software environments need to be developed for exploring and visualizing large amounts of diverse information. As Lucas et al. (1996) argue, this requires moving from an application-centric architecture to an information-centric approach. The distinction hinges on differences in the basic currency through which users interact with a system. In application-centric architectures the basic currency is the file, and users must rely on applications to fetch and display information from files. Each application has its own user interface that defines the types of files people can manipulate and what they can do with them. With the introduction of graphical user interfaces and the desktop metaphor, files became concrete visual objects, directly manipulated by the user, stored on the desktop or in folders, and, to a limited extent, arranged by users and software in semantically meaningful ways. But the contents of those files is still out of direct reach of the user.
In their Visage system, Lucas et al. (1996) take an
information-centric approach in which the basic currency is the data element. Rather
than limiting the user to files (or documents) as targets of direct
manipulation, Visage permits direct drag-and-drop manipulation of data at any level
of granularity. A numerical entry in a table, selected bars from a bar
chart, and a complex presentation graphic are all first-class candidates for
user manipulations, and all follow the same "physics'' of the interface.
Users can merge individual data items into aggregates and summarize
their attributes or break down aggregated data along different dimensions
to create a larger number of smaller aggregates. These capabilities form
the foundation for a powerful interface for data navigation and visualization.
ADAPTIVE INTERFACES
Although the Visage approach has proven successful for the simple graphics implemented in the Visage prototype (i.e., text in tables, bars in charts, symbols in maps), continued research is needed to handle the wide range of data and presentation types that populate the NII. In particular, new approaches that allow richer analysis of the information contained in hypertext documents are needed. One area that is developing technology relevant to this need is research on adaptive hypertext and hypermedia systems, which exploit information about a particular user (typically represented in the user model) to adapt both the hypermedia displays and the links presented to the user. Adaptive hypermedia is useful in situations where the hyperspace is large or the system is expected to be used by people with different knowledge and goals. This is clearly the case for the NII.
Researchers in text generation (Moore and Mittal, 1996) are
working on interfaces in which system-generated texts are structured objects.
During the generation process, the system applies abstract rules to
determine which text objects should be selectable in the final presentation
(i.e., which text objects will have "hyperlinks'' associated with them). To
pose questions, the user moves the mouse over the generated text, and
those portions that can be asked about become highlighted. When the
user selects a text object, a menu of questions that may be asked about this
text appears. Question menus are generated on the fly using a set of rules
that reason about the underlying concepts and relationships mentioned in
the selected text (as represented in a knowledge base). Because the
system has a record of the plan that produced the text as well as a user model,
it can reason about the context in which the selected text occurs and
provide a menu of follow-up questions that are sensitive to both the
discourse context and the individual user. In this system, texts are
synthesized from underlying knowledge sources by the system in response to
the user's question or the system's need to communicate with the user.
Because the text is generated dynamically, the system cannot in
advance identify the particular text objects that should have associated links
or links to other texts. Indeed, in this framework, traversing a link
corresponds to asking the system to generate another text. Moreover,
the follow-up questions, which correspond to the links in traditional
hypertext systems, cannot be precoded and fixed in advance but are
generated dynamically using heuristics that are sensitive to domain
knowledge, the user model, and the discourse context. As with many
other artificial intelligence approaches, this technology depends on the
system having a rich underlying representation of the domain content
described in the generated text as well as a model of the textual structure. But
we can easily imagine adapting this technology for use with the NII.
Techniques exist for automatically generating indexes from unrestricted
text for information retrieval (Evans and Zhai, 1996), so we can expect
that such indexes will (or could) be available for many, if not all,
documents on the NII. In addition, parsers and part-of-speech taggers can
robustly identify the constituents of sentences (Brill, 1993). Building on these
existing technologies would allow an interface in which, say, all noun
phrases in a document become mouse sensitive, and the hyperlinks to other
documents are determined on demand by using the noun-phrase
(synonyms, etc.) as an index to find related documents. Techniques developed in
the area of adaptive hypermedia may also be employed to allow the
selection of links to be sensitive to the user's knowledge and goals.
REFERENCES
Andr, E., and T. Rist. 1994. Referring to World Objects with Text and Pictures. Pp. 530-534 in Proceedings of the 15th International Conference on Computational Linguistics.
Andr, E., and T. Rist. 1996. Coping with Temporal Constraints in Multimedia Presentation Planning. Proceedings of the National Conference on Artificial Intelligence. Menlo Park, Calif.: AAAI Press.
Boddy, M., and T.L. Dean. 1994. Deliberation Scheduling for Problem Solving in Time-Constrained Environments. Artificial Intelligence, 67(2):345-385.
Bratman, M.E., David J. Israel, and M.E. Pollack. 1988. Plans and Resource-Bounded Practical Reasoning. Computational Intelligence, 4(4):349-355.
Brill, E. 1993. Automatic Gammar Induction and Parsing Free Text: A Transformation-Based Approach. Pp. 259-265 in Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics.
Carenini, G., and J.D. Moore, 1993. Generating Explanations in Context. Pp. 175-182 in W.D. Gray, W.E. Hefley, and D. Murray, Eds. Proceedings of the International Workshop on Intelligent User Interfaces. New York: ACM Press.
Draper, D., S. Hanks, and D. Weld. 1994. Probabilistic Planning with Information Gathering and Contingent Execution. Pp. 31-36 in K. Hammond, Ed., Proceedings of the 2nd International Conference on Artificial Intelligence and Planning Systems. Menlo Park, Calif.: AAAI Press.
Evans, D.A., and C. Zhai. 1996. Noun-Phrase Analysis in Unrestricted Text for Information Retrieval. Pp. 17-24 in Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics. Somerset, NJ: ACL.
Feiner, S.K. and K.R. McKeown. 1991. Automating the Generation of Coordinated Multimedia Explanations. IEEE Computer, 24(10):33-41.
Feiner, S.K., D.J. Litman, K.R. McKeown, and R.J. Passonneau. 1993. Towards Coordinated Temporal Multimedia Presentation. Pp. 139-147 in M.T. Maybury, Ed., Intelligent Multimedia Interfaces. Menlo Park, Calif.: AAAI Press.
Grosz, B., and R. Davis. 1994. A Report to ARPA on Twenty-First Century Intelligent Systems. AI Magazine, 15(3):10-20.
Kushmerick, N., S. Hanks, and D. Weld. 1995. An Algorithm for Probabilistic Least-Commitment Planning. Artificial Intelligence, 76(1-2):239-286.
Lucas, P., S.F. Roth, and C.C. Gomberg. 1996. Visage: Dynamic Information Exploration. In Proceedings of the Conference on Human Factors in Computing Systems. New York: ACM Press.
McKeown, K.R. 1993. Tailoring Lexical Choice to the User's Vocabulary in Multimedia Explanation Generation. Pp. 226-233 in Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics. Somerset, N.J.: ACL.
McKeown, K.R., S.K. Feiner, J. Robin, D. Seligmann, and M. Tanenblatt. 1992. Generating Cross-References for Multimedia Explanation. Pp. 9-16 in Proceedings of the National Conference on Artificial Intelligence, Menlo Park, Calif.: AAAI Press.
Moore, J.D. 1995. Participating in Explanatory Dialogues: Interpreting and Responding to Questions in Context. Cambridge, Mass.: MIT Press.
Moore, J.D., and V.O. Mittal. 1996. Dynamically Generated Follow-Up Questions. IEEE Computer, Vol. 75-86. (July).
Moore, J.D., B. Lemaire, and J.A. Rosenblum. 1996. Discourse Generation for Instructional Applications: Identifying and Exploiting Relevant Prior Explanations. Journal of the Learning Sciences, 5(1):49-94.
Pryor, L., and G. Collins. 1996. Planning for Contingencies: A Decision-Based Approach. Journal of Artificial Intelligence Research, 4:287-339.
Roth, S.F., and W.E. Hefley. 1993. Intelligent Multimedia Presentation Systems: Research and Principles. Pp. 13-58. in Mark T. Maybury, Ed., Intelligent Multimedia Interfaces. Menlo Park, Calif. AAAI Press.
Roth, S.F., J. Kolojejchick, J. Mattis, and J. Goldstein. 1994. Interactive Graphic Design Using Automatic Presentation Knowledge. Pp. 112-117 in Proceedings of the Conference on Human Factors in Computing Systems. New York: ACM Press.
Roth, S.F., J. Kolojejchick, J. Mattis, and M. Chuah. 1995. Sagetools: An Intelligent Environment for Sketching, Browsing, and Customizing Data Graphics. Pp. 409-410 in Proceedings of the Conference on Human Factors in Computing Systems. New York: ACM Press.
Wahlster,W., E. Andr, W. Finkler, J.J. Profitlich, and T. Rist. 1993. Plan-Based Integration of Natural Language and Graphics Generation. Artificial Intelligence, 63(1-2):387-428.
Zilberstein, S., and S.J. Russell. 1993. Anytime Sensing, Planning and Action: A Practical Model for Robot Control. Pp. 1402-1407 in Proceedings of the 13th International Joint Conference on Artificial Intelligence. Chambery, France.
RESOURCE DISCOVERY AND RESOURCE DELIVERY
Kent Wittenburg
Bellcore
The promise of a national information infrastructure (NII) includes the ability for every citizen to access certain fundamental information and services. I would assume that examples of fundamental information and services would include directory information (analogous to the present white and yellow pages); local, state, and federal government information, ranging from tax help to local library and social services to environmental regulations; and, lastly, a panoply of commercial services, such as audio/video broadcasting and n-way conferencing, virtual storefronts, and banking services that will emerge on some form of an infrastructure that combines elements from present telephony, broadcasting, and data networks.
Today's World Wide Web is the closest approximation we have to the NII of the future. One way to proceed in revealing the pertinent research issues is to ask where the current World Wide Web falls short in light of the goal of universal access. Besides the problem of physical access via networks and user premise hardware, I see two major problem areas: (1) the resource discovery problem, which I take to be a product of the chaotic, fundamentally democratic nature of the Web, and (2) the resource delivery problem, which has many dimensions, including complications brought about by differing bandwidth capacities, differing user interactive devices, and differing user cognitive (dis)abilities.
The goals that I see in these two areas can be stated simply:
RESOURCE DISCOVERY RESEARCH ISSUES
The resource discovery problem is one that is patently obvious to anyone trying to use the World Wide Web today. Currently, the strategies for finding resources are:
The first two strategies actually may be more promising as a basis for future research activities and tools than we may think at first blush. I would hazard a guess that most of today's knowledge workers whose responsibilities include keeping apace with developments in their areas of expertise have evolved a strategy of managing a personal view of the World Wide Web that is populated largely by resources found through serendipitous contacts with colleagues and friends as well as received as through netnews, mailing lists, and print or electronic publications. One of the research challenges I would pose is to create tools and methods for larger communities to leverage the millions of individual efforts that are already taking place for organizing information. Two exemplars of such community-based efforts at resource discovery are Bellcore's Group Asyncronous Browsing project (http://www.w3.org/Journal/1/wittenburg.098/paper/098.html) and ATT's PHOAKS work (http://weblab.research.att.com/phoaks/).
Search-based services are faced with the problems of how to manage and structure the astoundingly huge hit sets returned by their queries, how to include some form of quality control, and how to surmount, or shall we say circumvent, the inevitable precision/recall trade-offs. Further, there is the problem of combining and manipulating results from different search services and other relevant information broker sources. Efforts to achieve some standardized distributed object-like protocols so that different search services can be integrated is a step in the right direction (http://www-db.stanford.edu/~gravano/standards). Another needed direction is in how to integrate search with structural browsing and in fact with community-based sources of information as above. In general, there needs to be much more work on how to integrate filters and views over a domain, so that, for instance, a user does not have to deal with the results from a general query whose domain is the world when all he or she is looking for is the library down the street. For an every-citizen interface there is also the fundamental difficulty that effective use of today's general-purpose search engines requires a degree of sophistication beyond the reach of a substantial part of the population.
As for subject cataloging efforts, the major problems are the magnitude of manual labor required to keep up with the rapidly changing Web and the self-evident truth (which not everyone agrees with) that a single universal hierarchical classification of every piece of information on the Web, even if it existed, would not be very useful. The private customized subject catalogs one fin