Read "More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure" at NAP.edu

Page 307 Cite

Suggested Citation:"On Functions." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 307

On Functions

Computer-Mediated Collaboration

Loren Terveen

At&T Research

Interface...Interaction...Collaboration

A narrow view of the human-computer interface focusing on superficial "look-and-feel" issues is unproductive. It offers neither deep understanding nor practical design guidance. Even simple interface decisions may require significant knowledge about people's interaction with a system. Three interfaces provide practical examples: the popcorn button on microwave ovens, the VCR (video cassette recorder)+ system, and the ATM (automated teller machine) fast cash withdrawal button. Each of these interfaces was added years into the product cycle in response to people's actual use of the products. At a theoretical level, Hutchins et al. (1986) couched their seminal analysis of direct manipulation interfaces in terms of users' cognitive situation and resources, a general model of tasks, and the coupling between user goals and interface features. Their analysis shows why interface design decisions cannot be made on the basis of look and feel alone.

Indeed, we also begin to see that people may require even more from

Page 308 Cite

Suggested Citation:"On Functions." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 308

systems, namely help with tasks they don't know enough about to do on their own. Norman (1986) observed that the Pinball Construction Set makes it easy to design computerized pinball games but not good games; this takes knowledge about pinball design. More generally, Schoen (1983) discussed how skilled professionals can interpret the state of their work objects to make good decisions; they act and the situation "talks back." The problem is that less skilled people may not be able to understand what the situation is "saying." Fischer and Reeves (1992) studied interactions between customers and sales agent in a large hardware store. They identified crucial knowledge only the sales agents possessed, which they used to help customers. The knowledge included knowing that a tool existed, how to find a tool, the conditions under which a particular tool should be used, and how to combine tools for a specific situation.

People often work together on tasks. Thus, in addition to collaborating with users, another appropriate role for systems is to support human collaboration. The field of computer-supported cooperative work (CSCW) seeks to understand the nature of joint work and design technologies to support it. Important technologies include shared editors, group discussion support tools, and awareness systems.

Even when people do not work together explicitly, they still can benefit from the prior experience and opinions of others. Computational techniques for mining such information and turning it into a reusable asset raise the potential for a form of "virtual collaboration," with some of the benefits of collaboration without the costs of communication or personal involvement.

To summarize, there are three fundamental motivations for collaborative systems and a research approach built on each one:

•	Tasks require specialized skills and knowledge -> Intelligent collaborative agents
•	Work is inherently social -> Computer-supported cooperative work
•	People can reuse the experience of others -> Virtual collaboration

Next I discuss the prospects for collaboration in common tasks supported by the national information infrastructure (NII).

The Nii-What People Use It For, Where Collaboration Is Needed

The change from stand-alone to networked computers is transforming computers from desktop tools into windows on the world, from information containers and processors into communication devices. The

Page 309 Cite

Suggested Citation:"On Functions." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 309

World Wide Web is the primary innovation ushering ordinary citizens into this new world, so much of my discussion focuses on the Web.

The World Wide Web was designed expressly to support communication and collaboration among geographically distributed colleagues (Berners-Lee et al., 1994). Specifically, it supports information sharing, with the dual aspects of publishing and finding information. As the Web has expanded to embrace a diverse population of users and a broad range of uses, more activities have become important:

•	Person-person communication (e.g., through e-mail or "newsgroups"; entertainment, arts, and advertising; from Web sites for the latest movies to high-quality on-line magazines to serious (or not-so-serious) artistic sites).
•	Commerece-offering items for sale, finding items that match one's interests; brokering between buyers and sellers.
•	Education-for example, on-line course materials, interactive tutorials, and distributed science experiments.

Let us next consider the role of collaboration in these activities:

•	Information sharing. Information seekers need assistance in finding high-quality, relevant information in the vast, ever-changing sea of Web sites. Information publishers need assistance in designing functional and attractive interactive applications.
•	Person-person communication. All the major CSCW issues arise here (e.g., shared document access, discussion support, awareness aids).
•	Entertainment, arts, and advertising. There is great potential for computational agents in interactive fiction, social role-playing environments, and games (Lifelike Computer Characters Conference-http://www.research.microsoft.com/lcc.htm; Maes, 1995).
•	Commerce. Computational match-making agents can bring together buyers and sellers. Support for communication protocols such as auctions also is important.
•	Education. Computational agents can engage learners. Teachers and students need support for communicating and working together (e.g., to complete assignments and carry out experiments).

Computer-Mediated Collaboration: A Unifying Perspective

A unified research framework offers two main benefits: (1) it advances communication and understanding among researchers by helping

Page 310 Cite

Suggested Citation:"On Functions." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 310

them to share and compare methods and results, and (2) it makes it easier to explore designs that integrate different types of collaboration. I propose a perspective of "computer-mediated collaboration"-people collaborating with people, mediated by computation.

A given instance of computer-mediated collaboration can be characterized by using the following dimensions:

•	roles and responsibilities of the human participants;
•	nature of the computational mediation, including;
	-	how information is acquired, processed, and distributed;
	-	whether the information evolves during (and in response to) system usage;
	-	temporal properties of the mediation (e.g., synchronous vs. asynchronous, time delays); and
	-	nature of the human-computer interaction.

This framework is adequate for describing CSCW and virtual collaboration; both of these explore computational techniques for mediating human collaboration. As applied to collaborative agents, it highlights the involvement of the people who create the agents, both domain experts whose knowledge is modeled in the agents and knowledge engineers (or artificial intelligence researchers) who work with the experts to articulate the knowledge and develop representations and algorithms for using it. It also reminds us of the time and resource costs of the design process.

More deeply, the framework guides us to consider combinations of various types of collaboration. For example, users of a computational agent may not think about its designers when things work; however, when the user-agent interaction breaks down, an effective remedy may be to provide the user access to a knowledgeable human expert, such as the domain expert involved in designing the agent (Terveen et al., 1995). Or when an agent has inadequate knowledge to perform a task on behalf of its user, it might be able to obtain assistance from other agents (Lashkari et al., 1994).

Research Issues

Dividng Responsibility Among People and Computational Agents

People and computers have fundamentally different abilities. Thus, a basic issue is creating divisions of responsibility that maximize the strengths and minimize the weaknesses of each (Terveen, 1995). "Critics" (Fischer et al., 1993) represent a well-known approach that responds to this issue. Critics are agents who observe users as they work in a computational

Page 311 Cite

Suggested Citation:"On Functions." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 311

environment and offer assistance from time to time. Users are responsible for the overall course of the work, while critics use domain expertise to help users solve problems and evolve their conception of the problem. While much interesting work has been done in this area, most of it still consists of proof-of-concept explorations. The next step is to develop robust generalizations that can be embedded in toolkits.

Collecting and Evaluating Data Necessary for Virtual Collaboration

Two major approaches to virtual collaboration have been explored. Systems like the Bellcore Recommender (Hill et al., 1995) and Firefly (http://www.firefly.com) ask users to rate objects of interest, such as movies or music. The systems maintain a database of raters and their ratings, compute similarities among raters, and recommend objects to people that were rated highly by other people with similar tastes. Datamining approaches (Hill and Hollan, 1994; Hill and Terveen, 1996) attempt to extract useful information automatically from people's normal activities, such as reading and editing documents or discussing topics on netnews. (One goal is to require little or no extra data entry from users.) Abstracted versions of this information are then made available to other people engaged in the same activity.

One of the major issues for both types of approaches is obtaining the necessary information. For ratings systems the question is: Will enough people rate? For data-mining systems, the questions include: Can useful information be extracted automatically? Can it be extracted efficiently (important since quality often comes from aggregating over large amounts of data)? Can it be extracted and reused without violating the privacy of the people who produced it?

Once data-recommendations or ratings-are available, the problem is to evaluate them. One good way to do this is to consider the source; some people are more credible for any given topic. Therefore, computing a person's credibility from available data is a second major problem. One complication is that most interaction on the World Wide Web is anonymous; if one cannot even attribute particular actions or opinions to a person, it is hard to compute his or her credibility. This again raises a potential conflict in values between the privacy of on-line interaction and the attempt to mine information that could be used to enhance interaction.

The credibility problem can be further refined into that of determining good sources (raters/recommenders) for a specific person. Developing effective algorithms for this is precisely what the ratings approach does. However, the problem is harder for data-mining approaches: they

Page 312 Cite

Suggested Citation:"On Functions." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 312

operate only on already-available data, and existing data may not always be an adequate source for computing similarities among people.

Introducing Computational Agents into On-line Communities

When an agent participates in an on-line community, such as a newsgroup or text-based virtual reality (e.g., a MUD or MOO), interesting issues arise beyond those faced in single-user human-computer collaboration. I illustrate these issues using PHOAKS (Hill and Terveen, 1996), which serves as a group memory agent that maintains recommended Web pages for a group.

•	Will the community accept the agent's participation? Every community has behavioral norms. An agent ought not violate these norms (the norms for an agent may well be different than those for people). Some concern has been expressed that the PHOAKS ranking of Web resources by frequency of mention might distort community behavior (e.g., including people to post many spurious messages recommending their favorite resources). Thus, one must consider not only whether an agent respects community norms but also whether its participation may cause others to violate the norms.
•	Does the agent make a useful contribution to the community? In Foner's (http://foner.www.media.mit.edu/people/foner/Julia/Julia.html) discussion of the interesting social characteristics of the "Julia" agent, he points out that "she" serves useful functions, including taking and delivering messages, giving navigation advice, and sharing gossip. We also should consider to whom an agent is to be useful (e.g., community insiders, new-comers, or outsiders), especially since their interests may be different. For example, PHOAKS could make it easy to contact community participants by e-mail (or even by telephone). While outsiders might find this an attractive way to get information, presumably the community participants would be displeased.
•	Will the community help make the agent smarter? An agent begins its participation in a community with some knowledge. If the agent has the capability to learn, and the community will offer necessary input the agent can improve over time. For example, PHOAKS maintains a ranked set of Web pages for each newsgroup based on its categorization of URL mentions in messages. However, it will miscategorize some URLs, and some important URLs may not be mentioned or may be mentioned infrequently, perhaps because they are so well known (e.g., they may be in the FAQ). Therefore, PHOAKS contains forms that allow people to give opinions on the Web pages it links to and to add additional links. The general problem is to create techniques that let the system obtain

Page 313 Cite

Suggested Citation:"On Functions." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 313

performance-enhancing feedback and that people are willing and able to use. Or machine learning techniques may be used that let agents learn on their own.

•

Who stands behind the agent? Sometimes community members want to talk to the people behind the agent. Maybe they want more information, or maybe the agent has done something that makes them angry. We have seen both in PHOAKS. People ask questions about the topic of a newsgroup, like where to find a bagpiper. People complain about the way PHOAKS has categorized Web pages; for example, in rare cases a condemnation of a Web page (e.g., from a hate group) is categorized as a recommendation. Sometimes, we have manually changed the PHOAKS databases, and less frequently we have modified the categorization algorithms or interface. The general problem is how to provide needed human backup for agents who may be participating in many (e.g., thousands of) different communities at once.

Conclusion

I would like to conclude with two claims. First, if we take the argument of this paper seriously, we need not one but many every-citizen interfaces to the NII. It is specific appropriate types of computer-mediated collaborations that have the potential to increase the access and power of ordinary citizens, not a standard look-and-feel. Second, research must move into the real world. Many of the PHOAKS issues discussed here are ones not anticipated, but discovered only by wading into the uncontrolled, unpredictable, messy World Wide Web. We have been able to formulate issues, hone our tools, and evaluate our results in ways that we could not have done if we had stayed in our laboratories. At some stage, all promising new research ideas will have to take the same plunge to prove their benefits to the ordinary citizen engaging in life on the NII.

Acknowledgments

I thank Will Hill for our collaboration on PHOAKS and for our many conversations developing and exploring the issues mentioned here.

References

Berners-Lee, T., Cailliau, R., Luotonen, A., Nielsen, H.F., and Secret, A. (1994) The WorldWide Web. Communications of the ACM, 34(12), 321-347.

Fischer, G., and Reeves, B. (1992) Beyond Intelligent Interfaces: Exploring, Analyzing, and Creating Success Models of Cooperative Problem Solving. Applied Intelligence, 1, 311-332.

Page 314 Cite

Suggested Citation:"On Functions." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 314

Fischer, G., Nakakoji, K., Ostwald, J., Stahl, G., and Sumner, T. (1993) Embedding Critics in Design Environments. The Knowledge Engineering Review Journal, 4(8), 285-307.

Hill, W.C., and Hollan, J.D. (1994) History-Enriched Digital Objects: Prototypes and Policy Issues. The Information Society, 10, 139-145.

Hill, W.C., Stead, L., Rosenstein, M., and Furnas, G. (1995) Recommending and Evaluating Choices in a Virtual Community of Use. Pp. 194-201 in CHI'95. ACM Press, New York.

Hill, W.C., and Terveen, L.G. (1996) Using Frequency-of-Mention in Public Conversations for Social Filtering. CSCW'96. ACM Press, New York. (See also http://www.phoaks.com/phoaks/)

Hutchins, E.L., Hollan, J.D., and Norman, D.A. (1986) Direct Manipulation Interfaces. Pp. 87-124 in Norman, D.A., and Draper, S.W., Eds., User Centered System Design. Erlbaum, Hillsdale, N.J.

Lashkari, Y., Metral, M., and Maes, P. (1994) Collaborative Interface Agents. In AAAI'94. AAAI Press, Seattle, Wash.

Maes, P. (1995) Artificial Life Meets Entertainment: Interacting with Lifelike Autonomous Agents. Communications of the ACM, 38(11), 108-114.

Norman, D.A. (1986) Cognitive Engineering. Pp. 31-61 in Norman, D.A. and Draper, S. W., Eds., User Centered System Design. Erlbaum, Hillsdale, N.J.

Schoen, D. (1983) The Reflective Practitioner. Basic Books, New York.

Terveen, L.G. (1995) An Overview of Human-Computer Collaboration. Knowledge-Based Systems, 8(2-3), 67-31.

Terveen, L.G., Selfridge, P.G., and Long, M.D. (1995) Living Design Memory: Framework, System, and Lessons Learned. Human-Computer Interaction, 10(1), 1-37.

Page 315 Cite

Suggested Citation:"On Functions." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 315

Creating Interfaces Founded On Principles Of Discourse Communication And Collaboration

Candace Sidner

Lotus Development Corporation

Today's user interfaces are just too hard to use: they are too complex even for the narrow range of users for whom they were designed. At the same time, they also are impoverished in the range of modalities which they provide to users. While new modalities are becoming available, they could make interfaces even harder to use. What's the solution to this dilemma? Principles of human discourse communication and of human-to-human collaboration are two critically overlooked sources for simplifying interfaces. They offer a means of integrating various modalities and of extending the range of computer users.

Until recently user interface technology has not made use of what is now understood about the principles of discourse that govern human communication or the principles of collaboration that model joint work. This may seem surprising because interfaces are "communication engines" to the functionality software applications; interfaces are how we get our work done. While the field of computer-supported cooperative work has directed the majority of its concerns at understanding how computers can be used to help people work together, the computer has not been seen as a full-fledged partner in the human collaboration. Interfaces are designed to make collaboration between people better and to some extent they succeed, but the computer is not a collaborator with any of the people.

The current model of communication in interfaces is rudimentary at best. It is the "interaction" model, which is to say the user invokes a command and gets some, perhaps expected, performance by the computer, rather like when one's dog does a trick on the basis of a command such as "roll over." To communicate, users must choose one- or two-word commands from a menu with a mouse or incant a line of mumbo-jumbo that is meant to command the computer to run a program. Any clarification with the user results from the user responding to "dialogue boxes."

This interaction model of communication is, in the weakest sense, a dialogue: some information flows between two agents who are capable of acting on that information. While an interface to a given application may have hundreds of so-called dialogue boxes, dialogue in the human

Page 316 Cite

Suggested Citation:"On Functions." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 316

sense does not take place. There is no structure to the overall dialogue between user and interface from one dialogue box to the next and no memory of past dialogues or commands. Each command and action pairing is taken as completely independent of the next, so that there is no overall organization around the purposive intent of the overall set of ''interactions" between the user and the machine.

Just as a dog doesn't always do what you tell it to, computers don't either. The interface is meant to inform users about what the computer can do, but, as we all know, short phrases are especially ambiguous in human language. Users have little means to resolve this ambiguity. If the meaning of a command is not obvious to them, they can at best try it out and hope that it does what they want, or they can make their way through a help system to determine if they are on the right track. All the while, they are required to be very explicit about every reference to objects, such as a files, that they make. While the user bears the burden of being explicit, the interface often communicates with ambiguity back to the user. For example, what should the user conclude is the meaning of the "ok" button in a dialogue box? "Yes, that's fine with me, I agree," or "I understood the words," or "well, I read the words" are possible, though in human discourse, these uses of "ok" convey very different responses to the content of utterances in the dialogue. Because users cannot communicate these distinctions, it becomes clear to them that the computer interface does not really know what it is doing. It's just a dumb machine.

Being able to be a collaborator is three steps up on the ladder of communication and work. The first is minimal interaction. Today's interfaces do not pass the "minimality test" because they do not know enough to do so. Only the user does any modeling or remembering of the interaction and its parts. Whatever role the machine has played in the interaction it completely forgets when it completes the action requested. It also is completely unaware of any difficulty the user may have had in determining the meaning of a command. Capturing this level of interaction provides a bare minimum of interactive understanding-the interface would have a more complete model of the back-and-forth nature of the communication than it does now even if it did not know why the user wanted to communicate in the first place.

The second step on the ladder of communication and collaboration is slave-like interaction. To perform this way, the interface must have a model of what the user wants to do. Current interfaces do not have such a model. The user's goals and tasks lie completely outside of interface, and there is no means to say anything about them. No part of the user's goals and tasks is recorded or even recognized.1 Instead, all of this information must reside only in the head of the user. None of it can be found in the application and its interface.

Page 317 Cite

Suggested Citation:"On Functions." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 317

Some interfaces seem to be useful and quite satisfying to users. The metaphors on which they are based are highly predictive for users in determining what to do next. One such example is the interface for checkbook activities. I believe this is because the metaphor has been used to build in a model of the task the user is doing and to represent aspects of the task. The metaphor has also been used to keep the user narrowly focused on the task at hand-to balance a checkbook, write checks, or produce reports based on the checkbook information. As a result, the interface metaphor helps users work and also helps them predict what the interface is likely to do. While the interface is not aware of the task the user is doing, it is designed to do that task and to keep the user highly focused.

While it would be wise to continue to design interfaces carefully using well-thought-out metaphors, it will not solve the larger problems concerning interactivity, communication, and collaboration. No one metaphor is powerful enough for all work. Furthermore, lots of smaller applications each with an interface for performing one set of tasks leaves the user with lots of tasks to juggle. We still need an interface that communicates and collaborates, one that's at step three on the ladder. How do we get there from here? There is a great deal more known about human discourse communication that could be used in interfaces today. Recent work in linguistics, natural language processing, and psychology offers principles of communication that can be embodied in interfaces, even when they do not speak full human language. All discourse, of which dialogue is an example, is purposive behavior, and the structure of the discourse is organized and segmented according to purpose. The focus of attention of the discourse is used, among other things, to provide context, which means creating locality in the segments of the discourse for interpreting recent references and to help discourse participants assure that each of them is paying attention to the same items in the discourse (Grosz and Sidner, 1986). Grounding of utterances in the human-computer dialogue² makes conversation more efficient by allowing people to leave out what is truly obvious to both participants, as well as to slow the conversation down in order to reestablish focus, correct for unwanted ambiguity, and determine the next participant who has the floor.

It is possible to build interfaces that make use of these principles (and associated algorithms, which I will not discuss here) while at the same time simplifying the interface itself. We are doing that in our current work on collaborative interface agents (Rich and Sidner, 1996). To do so, designers will need to think in terms of user purposes (not just what actions the interface permits), the structure of purposes, and the relationship between what the user must communicate and the purposes of the communication. Maintenance of focus of attention will provide users

Page 318 Cite

Suggested Citation:"On Functions." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 318

with a local context in which to complete their subpurposes and may even make it possible to introduce implicit means of referring to the objects of the application.

Far more research is needed. Most linguistics and natural language processing work is directed at progress in natural language/speech understanding and generation, in machine translation, or at more applied concerns such as language-based information retrieval. Uncovering the principles of human communication requires considerably more effort than has been undertaken so far. Applying those principles to interfaces is a largely untouched area of research. Little of this work is likely to occur in industrial research settings, as it is not near enough term for the needs of applied research now typical in industrial labs.

Recent work in understanding human collaboration and user modeling offers two sources of value to the interface: (1) the near-term ability to ground the interface in the users' goals and tasks and (2) the more futuristic ability to make the machine a collaborator with the user once those goals and tasks are available. What is known about collaboration makes it clear that collaborators come to share mutual beliefs about their ways of doing something (called the recipe), about their ability and intentions to do things, and about their commitment to completing the goal. The models of collaboration can also be linked to discourse not simply by saying that collaborators must communicate their beliefs to others, but through much more detailed models of the relationship of belief and intention as purposes for segments of discourse (cf. Lochbaum, 1995).

Recent work in modeling collaboration and discourse in interfaces (Biermann et al., 1993; Stein and Maier, 1995; Rich and Sidner, 1996; cf. Terveen, 1995, for an overview of human-computer collaboration) indicates this is a promising direction for research. While industrial groups as well as some university work has been directed at these problems, only the first steps have been taken in modeling interfaces after human collaboration. However, to extend this work to applications that real users would use on an everyday basis would require further research on human collaboration and more system experimentation in building interfaces. Two critical issues in human collaboration are more exploration of the means by which humans negotiate in collaboration (cf. Guinn and Biermann, 1993; Sidner, 1994a; Chu-Carroll and Carberry, 1995) and human collaboration in task domains that are richer than the simple ones (e.g., building simple physical equipment, gathering simple information) considered so far.

A new technology, which after long delay, is about to splash on the scene: speech. While I will confine my comments to speech input (recognition and understanding), similar comments apply to speech output. Speech will force issues about communication and collaboration. At the

Page 319 Cite

Suggested Citation:"On Functions." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 319

same time, if applied wisely, speech offers a valuable modality to many users who do not fit the profile of use for current interfaces. To use speech adequately, researchers must continue to address a number of technical problems in speech recognition and understanding. There are many technical problems in using speech well in interfaces. Some are related to speech recognition per se; others concern how interfaces are designed to use speech as the communication medium. Concerning the first set of problems, one must consider modeling the speech of small dialect populations. This is possible to do but may be overlooked because the cost may seem too high for market return for industrial labs to concern themselves with such populations. Yet small dialect groups make up part of the citizenship of our nation.

Having good speech recognition/production and understanding/generation technology is only the tip of the interface iceberg. A great deal more research is needed in understanding users and their interaction needs in the presence of speech.

Speech technology for desktop and telephony interfaces offers the potential of using computers in ways that users interact with other people. It also opens up a host of metaphorical uses³ that could enhance computer use or exacerbate our current use of metaphors in interfaces. Careful studies of users and applications with speech (such as the SpeechActs work of Martin et al., 1996) will provide speech interfaces that make use of communication principles. Special attention must be paid to the needs of users who communicate with special limitations. Research on the use of speech interfaces by visually or motor-impaired users and linguistically limited users⁴ is not likely to come from industry (as is evident from the current problems with recent operating systems providing interfaces for the blind) and will require industry/university/government collaboration to be feasible. Finally, user populations never considered before, such as the multiple millions of semiliterate and illiterate Americans, will require careful study in speech interfaces; this research is also not likely to occur in industry and will require joint research between industry and universities under government funding.

Speech as a modality naturally suggests speaking to someone. The speaking face is compelling not only because it is so imprinted on us from birth, but because it appears quite valuable to users in communicating. Recent research on faces, human or otherwise, has now captured the imagination of some interface researchers (e.g., Ball et al., 1996; Nagao and Takeuchi, 1994; Waters and Levergood, 1995). While much of the associated research concerns believable agents, that is, research on representing agents visually that are generally designed to have some effect on users (e.g., being persuasive, friendly, or entertaining), faces have inherent value for communication. While little is understood at the computational

Page 320 Cite

Suggested Citation:"On Functions." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 320

level about these matters, faces provide a locus for spoken communication and a means of introducing efficiency in the grounding aspects of dialogue. However, these issues are poorly understood, as are the means by which we find faces natural in terms of their micro-level changes (but see work by Thorisson, 1994, and Walker et al., 1994, for some directions to pursue). Research in these areas will require the combined efforts of researchers in a number of disciplines, including psychology, computational linguistics, linguistics, computer science, and media arts. Those aspects that apply to media and believable agents will probably be heavily funded because of their potential payoff in the new entertainment/computer industry. Communication-related matters will require some government funding to keep industrial applied research focused on this matter.

While current interfaces are hard to use and give few choices of modality, we are on the brink of having available many new technologies that can change the nature of interfaces. We must bring to bear our knowledge of human collaboration and discourse communication on these interfaces so that they serve a wider range of users. We must extend our knowledge of collaboration and communication so that our interfaces can grow into better collaborative partners as our work needs change.

Notes

1. Although interface product groups are now aware of many of the micro-actions that users perform in a given software application, the only method they have come up with to help users is bottom-up recognition of micro-actions. They will never be able to do more because extending this solution to "higher-level" actions is computationally too hard.

2. By this I mean the process by which dialogue participants establish that the message was understood and determine who speaks next in the conversation and when (cf. Clark and Shaefer, 1987; Sidner, 1994a,b; Traum and Heeman, 1996).

3. A near-term example is name-dialing, which is the ability to call a person on the phone by simply saying the name to a telephone prompt.

4. By "linguistically limited," I mean people who have less-than-perfect knowledge or use of the majority culture language because they are nonnative speakers, have some cognitive/physical handicap, or have not yet been trained in the full range of the language owing to age or economic circumstances.

References

Ball, Gene, Dan Ling, David Kurlander, John Miller, David Pugh, Tim Skelly, Andy Stankosky, David Thiel, Maarten Van Dantzich, and Trace Wax. 1996. Lifelike Computer Characters: The Persona Project at Microsoft Research. In Software Agents, Jeffrey M. Bradshaw (Ed.). AAAI/MIT Press, Cambridge, Mass.

Biermann, Alan W., Curry I. Guinn, D. Richard Hipp, and Ronnie W. Smith. 1993. Efficient Collaborative Discourse: A Theory and Its Implementation. Proceedings of the ARPA Human Language Technology Workshop. March. Princeton, NJ.

Chu-Carroll, Jennifer, and Sandra Carberry. 1995. Response Generation in Collaborative

Page 321 Cite

Suggested Citation:"On Functions." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 321

Negotiation. Pp. 136-143 in Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics. ACL, Somerset, N.J.

Clark, H.H., and E.F. Shaefer. 1987. Collaborating on Contributions to Conversations. Language and Cognitive Processes, 11(1):1-23.

Grosz, B., and C.L. Sidner. 1986. Attention, Intention and the Structure of Discourse. Computational Linguistics, 12(3).

Guinn, Curry, and Alan W. Biermann. 1993. Conflict Resolution in Collaborative Discourse. Proceedings of the 1993 International Joint Conference on Artificial Intelligence Workshop: Computational Models of Conflict Management in Cooperative Problem Solving, August. Chambery, France.

Lochbaum, Karen, E. 1994. Using Collaborative Plans to Model the Intentional Structure of Disclosure. Technical Report, Harvard University. Available on http://liinwww.ira.uka.de/ searchbib/index.

Lochbaum, Karen E. 1995. "The Use of Knowledge Preconditions in Language Processing," Proceedings of the 14th International Joint Conference on Artificial Intelligence, Morgan-Kaufmann, San Mateo, CA, pp. 1260-1266.

Martin, P., F. Crabbe, S. Adams, E. Baatz, and N. Yankelovich, 1996. SpeechActs: A Spoken Language Framework. Computer, 29 (7):33-40.

Nagao, K., and A. Takeuchi. 1994. Speech Dialogue with Facial Displays: Multimodal Human-Computer Conversation. Pp. 102-109 in Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics. Morgan-Kaufman. San Francisco.

Rich, C., and C.L. Sidner. 1996. "Adding a Collaborative Agent to Direct-Manipulation Interfaces," Proceedings of UIST.

Sidner, C. 1994a. An Artificial Discourse Language for Collaborative Negotiation. Pp. 814-819. in Proceedings of the National Conference on Artificial Intelligence-94, Seattle. MIT Press, Cambridge, Mass.

Sidner, C. 1994b. Negotiation in Collaborative Activity: A Discourse Analysis. Knowledge-Based Systems, 7(4): 265-267.

Stein, A., and E. Maier. 1995. Structuring Collaborative Information-Seeking Dialogues. Knowledge-Based Systems, 8(2-3):82-93.

Terveen, L.G. 1995. An Overview of Human-Computer Collaboration. Knowledge-Based Systems, 8(2-3).

Thorisson, K.R. 1994. Face-to-Face Communication with Computer Agents. Pp. 86-90 in AAAI Spring Symposium on Believable Agents, March 19-20, Stanford University, Palo Alto, Calif.

Traum, D., and P. Heeman. 1996. Utterance Units and Grounding in Spoken Dialogue. ICSLP, October.

Walker, J., L. Sproull, and R. Subramani. 1994. Using a Human Face in an Interface. Pp. 85-91 in Proceedings of the Human Factors in Computing Systems Conference. ACM Press, New York.

Waters, K., and T. Levergood. 1995. DECface: A System for Synthetic Face Applications. Journal of Multimedia Tools and Applications, 1(4):349-366.

Page 322 Cite

Suggested Citation:"On Functions." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 322

Digital Maps

Lance McKee and Louis Hecht

Open GIS Consortium Inc.

Goal: Integrate The National Spatial Data Infrastructure And The National Information Infrastructure

The National Spatial Data Infrastructure (NSDI), when opened up through geoprocessing interoperability interfaces based on the Open GIS Consortium's (OGC) OpenGISTM specification, will expand out of the domain of geographic information system (GIS) experts into the day-to-day lives of the general population. OGC's research and development goal is the development of the OpenGIS specification.

One goal of others in the national information infrastructure (NII) research and development community ought to be to examine the ways in which digital spatial data (geodata) can be most effectively used by citizens in their everyday way finding and transportation, electronic consumer purchasing, education, and interactive entertainment and also in the many existing and future jobs that will involve geodata and geoprocessing. Another research goal ought to be to seek new ways in which designers of virtual environments and visualization tools can make use of humans' spatial visualization abilities, including our almost innate ability to understand maps and aerial views.

Taking a longer psychological, social, and historical view of every citizen, we should also research the various "media effects" of digital maps. Maps of all kinds powerfully condition our thinking about the world beyond our immediate viewspace. GISs, which enable interactive viewing and intersection of multiple spatially coincident maps representing diverse cultural and natural themes, promote holistic, cross-disciplinary thinking. Widespread viewing and use of geographic information potentially promote broad public global awareness in the same way that views from orbiting spacecraft expand the world views of astronauts, as reported by astronauts. If we assume that human-machine interfaces and interactions affect consciousness, and if we care about the evolution of consciousness, we ought to study and characterize these effects with an eye toward developing high-level design principles that support the development

Page 323 Cite

Suggested Citation:"On Functions." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 323

of interfaces and uses that nudge us toward greater awareness of our relationships with each other and our planet.

Market And Technology Drivers

Various market and technology drivers are converging to make geodata and geoprocessing a more important part of the NII.

Current producers of geoprocessing software have long looked for an expansion of their markets commensurate with the benefits their technology has to offer in many segments of society. That expansion has been inhibited by noninteroperability and difficulties in sharing data held in diverse proprietary formats. Open GIS interfaces will remove those barriers.

Society has a growing need for geoprocessing owing to growing population and worsening environmental problems; geographically distributed government and business activities; rapid globalization of many markets and activities; and increasing pressure on businesses, governments, and individuals to operate more efficiently.

There is a growing realization that much data (70 percent to 85 percent of data in all databases) has a spatial component that can be exploited in a variety of ways for more effective analysis and display.

Faster CPUs (central processing units) and high-performance image processing and graphics processing finally provide a base capable of supporting distributed geoprocessing, which often involves intense computation and large data files. Wider-bandwidth networks and distributed computing infrastructure (OLE/COM, CORBA, Java, etc.) and middleware and componentware architectures are important because so many geoprocessing applications benefit from transparent access to remote geodata stores and remote specialized geoprocessing functions and from integration of geoprocessing functions into other workflow. "gIS" with a lowercase g expresses the potential for open systems architectures and object technology to enable integration of geoprocessing as one (increasingly cost-effective) subordinated component of applications and decision support systems. Growth in the use of geoprocessing will occur as middleware and componentware approaches release geoprocessing from the confines of large, expensive, complex monolithic software systems.

Geoprocessing technology is proceeding as rapidly as the general computing and telecommunications technologies and not only in the area of geoprocessing interoperability interfaces. All of the following support the wider use of geodata and geoprocessing by every citizen: powerful spatial database technologies introduced by major database vendors; smaller and cheaper geographic positioning systems (GPSs); sophisticated, inexpensive, and abundant commercial earth imaging data products;

Page 324 Cite

Suggested Citation:"On Functions." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 324

advances in digital orthophotogrammetry for satellite earth imaging and aerial still and video imaging; continuing specialization and product differentiation in the areas of GIS, CAD (computer-aided design), and digital cartography; distributed interactive simulation; and three-dimensional spatial data visualization techniques (including interactive virtual reality approaches). These technologies hybridize in many ways. For example, high-resolution satellite images and digital orthophotogrammetry permit quite precise automatic generation of three-dimensional views of the earth's surface.

Geodata And Geographic Interfaces

Simple geodata accumulation is also a driver. There is only one earth, and the set of all geodata is referenced to this one finite spherical volume, like a rapidly growing onion of thematic maps of cultural and natural phenomena. As network-accessible geodata accumulate in tens of thousands of archives around the world, it becomes an ever-richer, ever-more-significant basis for an ever-growing number of local and global activities. It becomes one of the foundations of the new world culture of the information age.

Network-Based Geospatial Information

Below are some examples of how network-resident geodata and geoprocessing resources will be used by every citizen. Most will involve simple, specialized, stylized interactive map displays. A set of research issues can be derived by examining the user interface requirements of categories of applications, such as simplicity, information density, and interactivity modes.

Citizens will use the NII to help them get from A to B. GPSs in car and cell phones will provide the coordinates of A, and a car's map display and the cell phone's multimedia yellow pages will show the way to B. The necessary geodata will be stored remotely and downloaded on demand, transparently to the user.

Geoprocessing middleware and componentware will compare the distances to multiple possible destinations. The multimedia yellow pages, for example, will show driving time or walking time to a selected set of nearby restaurants. The software need not be stored permanently in the information appliance.

Not just car drivers, but hikers, boaters, and visitors to a city will see on a little screen where they are and how to get to where they want to go. A numbered package en route from A to B will show up on a digital map display, showing where it is now on its route. (Some shippers already

Page 325 Cite

Suggested Citation:"On Functions." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 325

provide this service.) People waiting for buses and airplanes will see where the bus or airplane is, on a digital map, with estimated minutes till arrival.

More than 70 percent of database records contain spatial information. Every database and spreadsheet, and the compound documents and work environments in which these functions are embedded, will be able to make maps based on spatial information (usually street addresses) in data records. Spatial display and analysis will be important in many workflow scenarios.

Listed below are other geographic applications used by every citizen during daily life. Each has particular user interface requirements:

•	Education/training, distance learning, research collaboration
•	Electronic libraries, electronic museums and galleries
•	On-line government geographic information for informed citizens
•	Maintenance of the individual's information context and connection (personal logical network) as the individual moves through space, bridging media and modality; mapping electronic locations of devices (addresses) to their physical locations; using concepts of reach space, colocation, and nearby
•	Virtual reality landscapes from earth images for interactive entertainment
•	Security monitoring and intrusion response
•	Special way finding for elderly and disabled people
•	Product distribution/warehousing optimization
•	Intelligent vehicle highway systems (IVHSs) and parking place location
•	Traffic/weather information
•	Route guidance and planning, multimodal trip planning, traveler services
•	Locale-specific resources and recommendations for small farms and gardens
•	City information services
•	Finding jobs and clients available locally.

Some geographic applications used by citizens in various jobs are as follows:

•	Emergency road services and 911 emergency response systems
•	Virtual reality landscapes from earth images for military, disaster relief, and rescue preparedness; civil engineering and landscape architecture
•	Agriculture and forestry
•	Climate research, agronomy, biology, ecology, geology, other sciences

Page 326 Cite

Suggested Citation:"On Functions." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 326

•	Urban and regional planning
•	Automated mapping and facilities management
•	Military surveillance
•	Natural resource discovery, exploitation, and management
•	Water resource management
•	Parolee tracking
•	Global and local environmental monitoring, advance of environmental sciences
•	Support for ''green" standards, local waste-as-resource arrangements
•	Cable, microwave, and cellular transmission installation planning
•	Telemedicine, better care for rural trauma victims
•	Global maritime information and rescue system, air traffic control
•	Commercial vehicle operations
•	Business siting, market research, and other business geographics applications
•	Geographic matching of prospective employees with available jobs or prospective service providers with prospective clients
•	Public administration networks
•	Land tenure systems
•	Precision farming (GPS-guided controlled delivery of nutrients and chemicals based on earth imagery or automated GPS-located soil or crop sampling).

The number of applications for geodata is growing rapidly and will continue to grow as the national and global spatial data infrastructures develop.

Maps

Maps are a part of most cultures because spatial thinking is an essential part of people's relationship with their physical and cultural environments. Even in simpler cultures that do not pass down written records, individuals make temporary maps to remind themselves or to show others how to find their way in unfamiliar territory. All birds and mammals form mental maps, and, as cooperative hunter-gatherers, humans developed sophisticated spatial awareness and spatial communications abilities that came to support other cultural activities besides physical way finding. For example, we say in a figurative sense that "we are on our way" to making the NSDI an integral part of the NII. User interfaces are collections of symbols and metaphors, and the map metaphor is inherently important in cyberspace. Basic research in spatial reasoning, spatial memory, and spatial communication would support development of better user interfaces that employ spatial display and manipulation.

Page 327 Cite

Suggested Citation:"On Functions." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 327

Virtual reality will also help geodata users evaluate data sources. Because so many geodata are available, and because geodata are often complex, we will often be concerned about geodata quality, content, and lineage. A system of geometric shapes could be used to represent certain content parameters, and their shape, color, and motion could represent quality parameters. A human-computer interface could be a map or an image. Once you center on a spot you can call up various basic icons that represent data objects. It is easy on the Internet to find lots of data but hard to sift through all the data. The interface ought to be able to tell every citizen easily and intuitively about the "goodness" of the data. For example: How does software communicate to a skier who wants to see the snow pack at eight Rocky Mountain ski resorts? The skier finds imagery, but it is summer data, not winter data, so an error signal intervenes. Through user configuration of simple preference files, the computer system knows that skiing requires winter data.

Digital Maps

Paper maps are a special form of printed communication, important to motorists, subway riders, explorers, scientists in many disciplines, historians, municipal service agencies, shippers, travelers, property owners and managers, and marketers. The utility of maps is amplified in several ways by computers and networks. A GIS, for example, is like multiple same-region overlaid thematic maps drawn on clear film, a visual-interface spatial database. You can query a GIS to meld thematic maps into a new map showing, for example, all the areas 3,000 feet or higher in elevation, within 50 meters of a standing body of water, within 1,000 meters of a road, where most of the trees are pines, where the slope of the ground is less than 10 percent, and where the population density is less than 1 person per square mile. (More spatial temporal reasoning research needs to be done on how to articulate the conditions of a spatial search.) Digital technology allows storage of (and network access to) huge quantities of geodata; zooming, panning, and other kinds of interactive manipulation that overcome the limitations of paper space and human visual acuity; real-time tracking; input from GPS and earth observation satellites; and instant display of nonspatial data (text, pictures, graphs, etc.) associated with selected map features or locations.

Through paper maps every citizen is familiar with graphic abstraction of large terrestrial spaces. Digital maps apply this helpful information presentation convention to vastly greater information domains. Digital maps and three-dimensional virtual fly-overs and fly-throughs will be an important part of many graphical user interfaces because everyone intuitively understands maps and aerial views and many kinds of information

Page 328 Cite

Suggested Citation:"On Functions." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 328

have a spatial component that makes spatial representation and visualization appropriate.

The new media are "massaging," in Marshall McLuhan's term, our individual minds and collective culture away from text-induced linear sequential thinking toward nonlinear thinking characterized by multiple simultaneous modalities. Spatial display and analysis offers a visual, intuitive, and effective means for solving a wide range of complex problems. Visualization of geographic information, or visualization of information geographically, helps people cope with information glut. Virtual reality applications will employ spatial representations of real spatial phenomena, but they will also employ spatial representations of nonspatial phenomena simply because our brains are hardwired for solving problems in three-dimensional space. Important parts of the software and data for configuring and populating cyberspace will be borrowed from geoprocessing applications and geodata archives and data feeds. Similarly, research into spatial thinking will ultimately benefit both "real space" and cyberspace applications.

Research Issues

Several research issues are identified in the text above. OGC's research and development in the area of geoprocessing interoperability is primary in the sense that spatial data will have a much greater role in the NII when diverse systems can exchange diverse kinds of data and access other systems' geoprocessing resources. Many applications will then be using geodata, and application developers will be looking for ideas and guidance concerning geoprocessing user interface development. Useful research will draw inspiration from traditional cartography and from general ideas about user interfaces.

Over the next 20 years we will learn more about how people function while immersed or partially immersed in virtual environments. We will learn what problem simulation schemes work best and what kinds of problems are most fruitfully addressed by these schemes. Many of these environments, certainly, will include extended landscapes representing real or imaginary spaces, and the role of spatial reasoning, spatial memory, and maps will be of interest.

Everyone views the world differently. This is an issue for Open GIS specification developers because different geodata producers and users give the same geographic feature different names and sets of descriptive parameters and different metadata. Part of the specification proposes semantic translators that domain experts from two different domains will configure to enable semiautomatic translation and integration of geodata. The problem is a difficult one and is much broader than geodata integration

Page 329 Cite

Suggested Citation:"On Functions." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 329

because computer users involved with other computer users need common interfaces to enable effective communication and collaboration. Map interface developers as well as other kinds of interface developers need to address the issue of standard symbology and usage.

Undoubtedly, commercial research and development projects and market activity will generate many of the dominant productizable ideas and standard graphical and conceptual approaches. Academia should take a longer view; it should (1) address the cognitive and broader social effects of developments in the spatial subdomain of the multimedia world and (2) look in very basic ways at how user interface design can layer most elegantly on our legacy wetware and cultural firmware and leverage most powerfully a positive vision for the future. The government has a role in cataloging and tracking evolving research topics of all kinds and supporting those that best serve the nation and the world community. By participating as technology users in industry consortia (such as OGC) that include users in technology planning and specification efforts, government agencies can ensure that the technology provider community meets agency needs and can influence the direction of technology that will become part of the larger economy and culture.

Page 330 Cite

Suggested Citation:"On Functions." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 330

Gathering And Integrating Information In The National Information Infrastructure

Craig A. Knoblock

University of Southern California

Goals And Issues

A critical and unsolved problem for the World Wide Web is how to gather and integrate the huge amount of available information in an automated fashion. Web browsers and search engines are an enormous step forward in providing access to the available information. Yet they rely heavily on hypertext links, which require a human to navigate. I believe an important goal for the national information infrastructure (NII) is to develop the infrastructure, technology, and tools to provide automated gathering and integration of data. Consider an example from the financial domain, where the ability to integrate the large amount of available data would make that information much more accessible and useful. Someone might be interested in investing in the airline industry but is undecided about which airline stock to buy and wants some additional information. Some of the information they might like to have includes the annual report, one-year price history, and current trading price for all U.S. airline stocks, and they might want them presented in order of increasing PE (price/earnings) ratios. All of this information is publicly available, but knowing where to look and integrating the information is not a simple task.

Assuming that users knew where to look (and that is a strong assumption), they could put together this information using the following steps. First, they must determine what all of the airline stocks are. This could be done by examining the prospectus of each publicly traded stock to see if it falls into the SIC (Standard Industrial Classification) category of Scheduled Air Transportation, but that would be very time consuming. Another approach would be to find an information source that contains a list of airlines, such as the EAASY SABRE page that includes all of the airlines for which it provides reservations. This information could then be used on another page to input the name of a company and get back the ticker symbol (the symbol by which the stock is listed in one of the stock exchanges) for that company. Using that information, they can then go to another page and enter the ticker symbol to get the current trade price

Page 331 Cite

Suggested Citation:"On Functions." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 331

and PE ratio. They would go to yet another page to find the one-year price history in a graphical form. And they could go to the Securities and Exchange Commission (SEC) EDGAR Archives to find the annual report filed with the SEC but in this case accessed using the company name. Even within the EDGAR Archives, they would need to know that the code for an annual report is 10K in order to extract it from the 20 or 30 reports available for each company. This entire process would have to be repeated for each airline, and they would not know how to order them until they had the data for each company.

Ideally, instead of going through this tedious process, a user could simply issue the query to a software agent for the financial domain and that agent would know where to retrieve the relevant information and how to process it to produce the data requested by the user. The goal then is to develop the infrastructure and tools required to easily construct and maintain software agents for querying and integrating information in any domain of interest. One could laboriously construct and maintain such software agents by writing specialized software applications. In fact, that is the current state of the art. The limitation of this approach is that such specialized applications are difficult to maintain and extend. If a new source becomes available, a programmer must modify the program in order to exploit that information. Also, each new application domain requires constructing a whole new program.

Research Problems To Be Addressed

There are a number of research issues that must be addressed in order to realize the goal described in the previous section:

•	Modeling the contents of sources. To automate the access to sources requires both a precise syntactic description (e.g., a grammar) of the organization of the data as well as a detailed description of the semantic content of the source. The latter can most naturally be described using a knowledge representation language.
•	Construction of domain ontologies. To describe the information provided by each source such that it can be integrated with other sources requires a comprehensive set of shared terminology.
•	Planning to access and integrate information sources. Given a query and the models of the available sources, we need a system that can automatically select an appropriate set of sources and generate a plan to efficiently integrate the required information.
•	Machine learning of source structure and source contents. The problem of modeling all of the currently available information sources on the Web is far too large to be done manually. Thus, we need to develop

Page 332 Cite

Suggested Citation:"On Functions." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 332

	machine learning technology to automate as much of this process as possible. Of course, the hope is that eventually the added information would be so critical that information providers would provide it just as they currently format their data.
•	Data mining for patterns and relationships in sources. A critical aspect of this problem is how to efficiently process queries, since some sources may be very expensive to access (in either time or money). Data mining can be used to learn relationships in the data for optimizing access to sources.
•	Natural language querying. Ideally, users would be able to write natural language queries. This requires developing natural language systems that can be used to formulate valid queries.

Proposed Research Projects

The various issues described above can be broken down into four possible research projects. The first two projects would provide the core work, and the other two projects would be important in making the resulting research useful to everyday users.

Representation and Ontologies

The problem of representing both the syntax and semantics of sources is central to addressing the entire problem. The goal of the first project is to develop approaches to describing the syntax and semantics of Web pages. Given the size of the task, I would expect that this would be a fairly large, long-term problem. The hope is that this would eventually lead to standards for marking up pages with semantic information and to the development of domain-specific ontologies that can be used for information integration.

Planning Information Gathering and Integration

Given a description of the available sources, the problem still remains as to how to select and integrate the most relevant information. There are a number of challenging aspects to this problem, such as handling semantic discrepancies, resolving syntactic differences, and evaluating reliability and recency. In addition, another critical aspect to this problem is performing the processing efficiently, since access to many Web-based sources can be very slow, and plans that require access to many sources can require hours or days.

Page 333 Cite

Suggested Citation:"On Functions." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 333

Machine Learning and Data Mining of Sources

Because of the need for large amounts of information about sources and how they relate, machine learning and data mining will play a critical role in simplifying the task of the first two projects. Machine learning can be used to help build grammars for parsing sources as well as for building models that describe the contents of sources. Data mining can be used to find relationships in individual sources and between sources, which can then be used to optimize the query processing.

Natural Language Processing and User Interfaces

Natural language querying and/or graphical interfaces are needed to provide user-friendly access to information. In addition, user interfaces are important for displaying various types of multimedia data and for providing tools to help build models of sources.

Potential Impact

The overall project has the potential to make a dramatic impact. If the work is done right, it could change the nature of the NII. Instead of using browsers and search engines, users would have access to a wide range of specialized agents that could quickly locate and integrate the huge amount of available data. Instead of being a data repository, the NII could be a knowledge repository.

Page 334 Cite

Suggested Citation:"On Functions." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 334

Integrating Audiences And Users

John Richards

Turner Le@rning Inc.

Lifelong Learning

The National information infrastructure (NII) is designed-from a technological perspective-to integrate a wide range of communication and information systems including video delivery, telephony, computer networks, and on-line services. These are, for the most part, known technologies, and the technological integration is a matter of time and money. In contrast, we have not yet begun to think about the integration of diverse media from a human perspective. People relate to even a single medium in very different ways, determined by the context of use and by the individual's understanding of the situation. The development of functional multiple media learning environments is not simply the result of combining different media types-an additive process-but consists of creating a brand new kind of media, a transforming process. The development of interfaces for these new media types depends on coming to understand the ways in which people will come to use the media for learning. As real video and truly interactive networking are integrated, television audiences do not simply become network users or vice versa. Instead, there are qualitatively different experiences in store. In this paper I consider some issues that we should try to anticipate in the construction of a voice, video, and data infrastructure that provides the opportunity for just-in-time learning throughout life.

Analog Vs. Digital

Our analog modes of communication by voice, print, and video are gradually being replaced by digital modes. Ultimately, most of human knowledge will be stored in a common digital library

-Kaufman and Smarr [1993] (p. ix)

This transformation from analog to digital will have deep implications for human knowledge, and, in my judgment, even deeper implications for human communication and relationships. Kaufman and Smarr

Page 335 Cite

Suggested Citation:"On Functions." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 335

argue that the fundamental idea of replacing the continuous world of nature with a model of that world, formed of discrete units, has transformed the pursuit of science. They argue that the digital perspective provides a controllable, imaginable representation that, ultimately, provides the key to a more comprehensible world.

How, then, does our understanding of the content of media differ as we move from analog to digital? This is not an arcane question pertaining to the relevance of simulations on supercomputers; rather, it affects all of our interactions with voice video and data technologies. For example, in the most developed of these digitizations, consider how writing e-mail messages differs from writing letters. It certainly seems that only yesterday we were seeing paeans to letter-writing as a dying medium. The commonly accepted explanation was that writing was the problem-culture was deteriorating, and the literacy required for letters was a lost art. In retrospect, given the success of e-mail, the delays in letter delivery were simply not tolerable, especially when contrasted with the immediacy of telephones. But e-mail did not change the nature of letter writing-it replaced letter writing with a full panoply of alternatives. Not only did a new, more informal genre evolve, but entirely new forms of written communication also evolved with entirely new rules for participation. E-mail evolved into bulletin boards and "listservs" that do not have straightforward analogs in the letter-writing or, more generally, precomputer culture. And even more distinct forms of communication are only now appearing. As argued by Sherry Turkle [1995], chats, MUDs, and MUSEs are developing unique, and unprecedented, participation structures. How will these conversations change with the easy availability of voice on the Internet? How will putting telephone (or video-telephone) on the Internet change the nature of a phone call? What new forms will evolve?

In my judgment, though, the most profound differences will occur with video. The control added to video through the digitization process changes the nature of the video-more importantly the digitization inherent in the NII brings together television and computers, two technologies that have been distinct in development and production. More significantly, these technologies are culturally quite distinct. As we talk about an infrastructure that integrates voice, video, and data, we must consider the power of the cultural differences of these technologies and their complex contexts of use.

Cable Modems Vs. Cable Boxes

Prior to coming to Turner, I lived in Newton, Massachusetts, and had cable television installed. The cable came into the basement of my home

Page 336 Cite

Suggested Citation:"On Functions." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 336

and was routed up to my bedroom, into a cable box, through my VCR and into my television-my ''entertainment center." In March of 1996, I became part of an early trial of cable modems by Continental Cablevision and BBN. The cable into the basement was split and part of it routed into my home office, into a cable modem, and then into my computer, providing ethernet-speed access to the Internet-my "work center." The two setups were separated by a thin wall, two different boxes, and two different monitors.

What separates these worlds? Why aren't they going to the same box or the same monitor? When will I see a picture within a picture? While watching TV the 2" square picture in the corner is the Internet. When surfing the Web the 2" square picture is the television signal. What separates these worlds are the viewers/users and the industry standards and expectations.

This is a temporary division. The cable industry has promised a free cable modem to every U.S. school they pass as they provide this service to the community. These will be 5 to 10 feet from television cable boxes. What will the interface be when these are a single box connected to a single monitor? How are we to think of television as an inset in software-and conversely? Are the two streams of data to be integrated for interactive television or video-enhanced software? Once all the technologies are digitized, there is no functional delivery difference between television, e-mail, phone, radio, movies, or even the networked alarm systems in people's homes. From a human perspective, this convergence will represent remarkable transformations in the nature of the media.

Participation Vs. Delivery

Networking has been dominated by a philosophy of participation and user constructibility. From the beginning of the ARPANET, for national security reasons networking has been distributed with no central locus of control.¹ The removal of any node would have no effect on the rest of the system. Moreover, the wild success of the World Wide Web is precisely because it so adeptly fits the underlying participatory philosophy of networking.

Television and cinema, in contrast, have been dominated by centralized delivery models. Beginning with Hollywood domination of moviemaking, and continuing with the "big three" U.S. broadcast companies, television and cinema content has been tightly controlled, produced, and distributed. Even as television audiences are being analyzed as "active meaning producers of texts and technologies ..." (Ang [1996], p. 8), this is seen as a postmodern development that is only now being taken into account. In particular, as the plethora of programs provides choice, the

Page 337 Cite

Suggested Citation:"On Functions." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 337

audience is seen as being freer to construct meaning through participating in these choices. Television itself is evolving as the surrounding technologies change:

The VCR disrupted the modern entanglement between centralized transmission and privatized reception because it displaced the locus of control over the circulation of cultural texts to more local contexts.

-Ang [1996] (p. 12)

Thus, there is a different experience when the same movie is shown in a cinema, on TV, or as a tape played at home on your VCR. In fact, the audience is different, with different expectations regarding interruptions-a movie theater is continuous and commercials are resented, television presentations are "geared" toward the interruptions, and only with a rented tape can the bathroom break be at the discretion of the audience. Replays are possible with the VCR. In short, the nature of the medium is changing because of the role of the active audience.

Audience Vs. Users

Each of the media carries with it different relationships with its users/audiences. And these relationships are not dependent solely on the media in isolation. Consider the distinctions John Ellis [1982] draws between the audience of cinema and the audience of television. The cinema spectator is a voyeur. The television viewers are "uninvolved in the events portrayed" and "... are able to see 'life's parade at their fingertips,' but at the cost of exempting themselves from that parade for the duration of their TV viewing" (Ellis [1982], pp. 169-170). The spectator pays for the cinema, and resents any commercial intrusion during the showing. The viewer accepts television commercials as a part of the basic structure of watching.

How does this compare with the audience for video in software? Or video on the Web? As software developers we have naively assumed that the introduction of the new media types fit in with the nature of the software-videos, pictures, and sounds are included for motivational purposes, or as illustrations of some concept, and have little or no fundamental effect on the user.

Moreover, the deep distinctions between viewers and spectators suggest that the computer/user relationship may, and probably does, change with the introduction of the Web. Typically the computer/user relationship is one-to-one (or two or three at best), essentially an individual participation structure. The Web is somewhat different without many precursors.

Page 338 Cite

Suggested Citation:"On Functions." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 338

It is essentially a social structure. The underlying metaphor is that when we are connecting with other individuals there is a dynamic, changing, unstructured, cluttered world.

There are different educational philosophies that have grown up around the technologies. There are two cultures in technology and education-perhaps this is parallel to the C.P. Snow observation. These two cultures-digital (computers/networking and education) and analog (TV/cable video)-know little of each other.

Video In Software Vs. Software In Video

There is a small stream of research and development that has evolved in the intersection of video and computing. Kristina Hooper Woolsey and Bob Mohl produced the Aspen project. The user drove through the town of Aspen by manipulating a touch-sensitive screen. The branching structures themselves are sufficiently constrained that it is possible to anticipate all choices (at an intersection in the road you can stop or go forward, backward, right, or left). It is possible to film alternatives. The mapping metaphor provided the basis for the more successful commercial product, Palenque. Other early attempts to tie TV and computing were Sam Gibbon's and Bank Street's The Voyage of the Mimi, and John Bransford's Cognition and Technology Group at Vanderbilt's The Jason Project (Cognition [in press]). Another interesting attempt by Woolsey and the Apple Multimedia Lab is the Watson and Crick DNA story, based on a BBC production.

More recently, at Turner, we have experimented with several qualitatively different attempts to integrate the two cultures. CNN Newsroom is broadcast by CNN at 4:30 a.m. for taping by teachers. Traditionally, teachers would receive teachers' guides by fax or through a centralized distribution within a state. More recently, the teachers' guides may be downloaded from cnn.com/newsroom on the World Wide Web. We are working together with researchers at the Center for Educational Computing Initiatives at MIT who have set up a system to automatically digitize the broadcast, separate it into meaningful segments, and make it available through streaming video on the Web. The teachers' guides are also separately available for each segment. This qualitatively changes what teachers can access. They can choose particular segments from the day's broadcast, and they have access to an indexed history.

Turner Le@rning has also been experimenting with electronic field trips. Students participate in two to four weeks of curriculum activities involving videotapes, data disks, electronic chat groups, and print materials. The midpoint of the curriculum is marked by two live broadcasts where experts at the field trip location respond to students' questions,

Page 339 Cite

Suggested Citation:"On Functions." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 339

submitted to an 800 number or over the Web by students. The student's act of asking a question changes the presumption of the broadcast. Moreover, the variety of media is specifically designed to foster active participation on the part of the teacher and student in the construction of their learning.

As the broadcast and cable media become more involved in the Internet, the nature of television is also changing. CNN networks (CNN, CNN Headline News, CNN Airport News, CNN fn) download feeds from CNN bureaus worldwide. In each network a team of editors and writers produce stories that are then televised. CNN Interactive is a Web site that is produced and distributed in much the same way. It is this unique television-oriented model, with constant news-based updates, that accounts for its immense popularity (over 5 million page-views per day). How will television-based concepts translate onto the Web? Currently, Web sites change weekly, if you are fortunate, and daily at the very best sites-CNN's timely updates are very much an anachronism-requiring over 140 programmers/writers. At what point will we be programming the Web as we program television, with sites changing according to the time of day? And how will this be modified by the Web's ability to adjust for your interests and history?

Conclusion

The rise of image in communication is more than a matter of educating ourselves to analyze and interpret visual experiences. Rather, as argued by Taylor and Saarinen [1994], the incorporation of images in presentations has changed the very nature of communication. Text by its very nature is linear, sequential. A picture or video allows for an infinite series of branches.

This may not be a new stage of meaning but a return to an old one. McLuhan [1964] argues that prior to Gutenberg, story telling relied on images and metaphors that were much more generative-taking into account the multiplicity of the audience and the individual construction of meaning.

What I am suggesting in this paper is that the integration of video media with computer technology is not a quantitative difference but a qualitative difference that requires that we begin to rethink learning in this digital world.

Bibliography

Ang, Ien [1996]. Living room wars: Rethinking media audiences for a postmodern world. London: Routledge.

Page 340 Cite

Suggested Citation:"On Functions." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 340

Cognition and Technology Group at Vanderbilt (in press). The Jason series: A design experiment in complex, mathematical problem solving. In J. Hawkins and A. Collins (Eds.), Design experiments: Integrating technologies into schools. New York: Cambridge University Press.

Ellis, John [1982]. Visible Fictions. London: Routledge and Kegan Paul. Kaufman, William J. and Smarr, Larry L. [1993]. Supercomputing and the transformation of science. New York: Scientific American Library.

McLuhan, Marshall [1964]. Understanding media: The extensions of man. New York: New American Library.

Taylor, Mark C. and Saarinen, Esa [1994]. Imagologies: Media philosophy. London: Routledge.

Turkle, Sherry [1995]. Life on the screen. New York: Simon and Schuster.

Note

1. To understand that this is not necessarily true of the technology but has arisen as a deliberate design decision, notice the "star-nets" that MIS departments always try to establish. This design emphasizes the collection of data (attendance and grade information, or inventories, or bank accounts) in one central location, and the distribution of centralized resources or information (paychecks, reports, decisions).

Page 341 Cite

Suggested Citation:"On Functions." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 341

Intelligent Agents For Information

Katia P. Sycara

Carnegie Mellon University

Goals And Issues

The overall goal of the every-citizen interface research program is to provide fundamental research and enabling technologies for the development of computer interfaces that allow easy access to the national information infrastructure (NII) by many citizens, ranging from software experts to physically or mentally handicapped persons. Given the current nature of computer technology and current characteristics of the NII, there are many issues that must be addressed.

Current Problems

Effective use of the Internet by humans or decision support machine systems has been hampered by some dominant characteristics of the infosphere. First, information available from the Net is unorganized, multimodal, and distributed on server sites all over the world. Second, the number and variety of data sources and services are increasing dramatically every day. Furthermore, the availability, type, and reliability of information services are constantly changing. Third, information is ambiguous and possibly erroneous owing to the dynamic nature of the information sources and potential information updating and maintenance problems. Therefore, information is becoming increasingly difficult for a person or machine system to collect, filter, evaluate, and use. Current interface technology, dominated by the Web browser paradigm, besides being slow, lets users do the access, filtering, interpretation of raw data through point and click, and text/graphic cognitive processing. Current NII technology has a number of limitations, including the following:

•	It does not flexibly support information, access, and filtering.
•	It does not easily adapt to user interaction style and information-seeking goals and preferences.
•	It does not flexibly transfer task performance from user to system.
•	It is not user friendly to many people with disabilities.

Page 342 Cite

Suggested Citation:"On Functions." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 342

•	It is not suitable for easy perusal of continuous-time multimedia (e.g., video, audio).
•	It is not easily portable across different display devices.
•	It does not flexibly support user mobility.

Addressing the above set of current interface technology limitations can constitute a list of requirements for short-term (2 to 3 years) and medium-term (3 to 5 years) research requirements. Additional longer-term requirements include supporting the flexible physical realization of "action-at-a-distance" capabilities and making the computer a real collaborative partner of the user.

Research Issues

To address the above-mentioned goals and requirements, researchers from different disciplines, such as computer science, cognitive science, human factors, physiology, psychology, design, and engineering, will need to collaborate.

The paradigm of intelligent agents has shown initial promise for handling some of the above problems, especially information location, filtering, and integration. Although a precise definition of an intelligent agent is still forthcoming, the current working notion is that intelligent software agents are programs that act on behalf of their human users in order to perform laborious information-gathering tasks, such as locating and accessing information from various on-line information sources, resolving inconsistencies in the retrieved information, filtering away irrelevant or unwanted information, integrating information from heterogeneous information sources, and adapting over time to their human users' information needs and the shape of the infosphere.

To make intelligent agents a reliable part of interface technology, some important research issues must be addressed. They include the following:

•

Development of intelligent agents that locate and retrieve information from distributed multimedia and multimodal information sources according to a user specification. To what extend should agents hide information complexity from users? Agent audit trails and behavior explanation also should be investigated. How can agents determine information trustworthiness, so they can give some credibility estimate to users? Is this desirable?

•

Development of intelligent agents that unobtrusively and reliably learn user information goals and preferences as well as display preferences. To what extend should agents be proactively presenting information? How autonomous

Page 343 Cite

Suggested Citation:"On Functions." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 343

	should they be in terms of initiating information searches that their learned user model tells them might be useful at this time?
•	What "face and personality" should intelligent agents present to users? Utility and trade-offs of anthropomorphizing agents-does anthropomorphization create particular expectations on the part of users?
•	Adaptive function allocation and coadaptation. Is adaptation always good? Development of methodology and techniques to manage resulting changes in roles and tasks to avoid undermining system reliability and performance. Identification of characteristics of agent design and presentation that facilitate helpful adaptation and flexibility in use.
•	Because of the vastness of the infosphere, investigation of protocols for collaboration of distributed intelligent agents also is necessary.
•	Development of ontologies and transformation techniques to support semantic interoperation of distributed agents. Can these ontologies be standardized? Are there "ontology czars"?
•	Investigation of the trade-offs associated with agent mobility in support of interface technology.

Additional research issues include:

•	How to display information on different appliances (e.g., workstations, personal digital assistants, wearable computers). While static and mobile telephones present the same interface to users, this is not true for computer displays. Issues of maintaining consistency of interface over appliances with significant differences in display real estate need to be researched.
•	How to deal with temporally continuous data modalities-for example, how to let users quickly skim video objects to locate sections of interest and how to aid users in the analysis and reuse of digital video information. In the hands of a user, every medium has a temporal nature. It takes time to read (process) a text document or a still image. However, in traditional media each user absorbs the information at his or her own rate. One may even assimilate visual information holistically-that is, come to an understanding of complex information nearly at once. Solutions to these problems require an intimate understanding of digital video and digital audio and development of new modes of interfaces based on this model.
•	Research in matching modality to task (e.g., speech is unsuitable for describing information that has a lot of spatial content, such as maps, or mechanical parts).
•	Research in seamless integration of different input (e.g., speech to retrieve a map) and output (e.g., graphic display of the retrieved map) modalities for the same information object.

Page 344 Cite

Suggested Citation:"On Functions." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 344

•

Research in supporting users in specification of information requests (e.g., through visual retrieval specification cues) and development of principles for ecological interface design.

•

Searching and skimming video and audio large-scale information: Just viewing digital video, while useful, is not enough. Once users identify video objects of interest, they will need to be able to manipulate, organize, analyze, and reuse the video. Whether proffered by a computer or human agent, users would like to peruse video much as they flip through the pages of a book. Unfortunately, today's mechanisms are inadequate. Scanning by jumping a set number of frames may skip the target information completely.

Projects

To make progress along these issues, a variety of projects should be instituted. Small projects to investigate circumscribed issues (e.g., how to structure agents, interaction protocols, how to determine information credibility) and larger projects that should involve collaboration between academic institutions, government, and industry. For the moment, industry seems to lead in the NII. The smaller projects should involve only academia and should investigate fundamental longer-term issues, so that the longer-term goals (e.g., computer systems becoming real collaborators and partners of humans) should be addressed.

Page 345 Cite

Suggested Citation:"On Functions." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 345

Intelligent Information Agents

Johanna D. Moore

University of Pittsburgh

Introduction

The evolving national information infrastructure (NII) has made available a vast array of on-line services and networked information and networked information resources in a variety of forms (text, speech, graphics, images, video). At the same time, advances in computing and telecommunications technology have made it possible for an increasing number of households to own (or lease or use) powerful personal computers that are connected to this resource. Accompanying this progress is the expectation that people will be able to more effectively solve problems because of this vast information resource. Unfortunately, development of interfaces that help users identify the information that is relevant to their current needs and present this information in ways that are most appropriate given the information content and the needs of particular users lags far behind development of the infrastructure for storing, transferring, and displaying information. As Grosz and Davis (1994) put it, "the good news is that all of the world's electronic libraries are now at your disposal; the bad news is that you're on your own-there's no one at the information desk." In this paper I provide desiderata for an interface that would enable ordinary people to properly access the capabilities of the NII. I identify some of the technologies that will be needed to achieve these desiderata and discuss current and future research directions that could lead to the development of such technologies. In particular, I focus on ideas related to agents and system intelligence and ways in which advances in these areas could enhance eventual interfaces to the NII.

Desiderata For An Every-Citizen Interface To The NII

As I envision it, an every-citizen interface would consist of intelligent information agents (IIAs) that can:

•	work with users to determine their information needs;
•	navigate the NII to identify and locate appropriate data sources from which to extract relevant information;

Page 346 Cite

Suggested Citation:"On Functions." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 346

•	present information in ways that are most appropriate to the type of information being conveyed, as well as users' goals, time constraints, and current context (i.e., what other information resources are they currently accessing?); and
•	adapt to changes in users' needs and abilities as well as to changes in information resources.

Intelligent Interactive Query Specification

Database query languages allow users to form complex queries that request information involving data entities and relationships among them. Using a database system, users can typically find the information they require or determine that the database does not contain such information. However, to use a database system, users must know which data resource(s) to access and must be able to specify a query in the appropriate language. That is, the users must essentially form a plan to identify and access the information they require to achieve their information-seeking goals. In contrast, keyword-based search engines for the World Wide Web allow users to search many information resources at once by specifying their queries using combinations of keywords (and indications of whether or not the keywords are required to occur in the document, whether they must occur in sequence, etc.). Such search engines do not require users to form a detailed plan, but they often turn up many irrelevant documents and users typically do not know what data resources have been examined. Moreover, keyword-based search engines provide users with a very crude language for expressing their information-seeking goals. To provide the kind of interface I envision, IIAs must be able to work with users to help them express their information-seeking goals in terms that the system understands and can act on. The IIA should then form a plan to find information that may help users achieve their goals. That is, we would like to provide technology that would allow users to tell their systems what information-related tasks they wish to perform, not exactly what information they need, and where and how to find it. For example, as an associate editor for a journal, I often need to find reviewers for papers on topics outside my area of expertise. I know that information is out there in the NII that could help me identify appropriate reviewers, but finding it is a difficult task. What I'd like is an IIA that could accept a goal such as ''find me highly qualified, reliable reviewers for a paper on parsing using information compression and word alignment techniques" and perhaps a preference on the ranking of solutions, such as "and disprefer reviewers who have recently written a review for me." An interactive agent that did not know how to determine whether a researcher is "highly qualified" could engage in a dialogue with its user

Page 347 Cite

Suggested Citation:"On Functions." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 347

to determine how to assess this. For example, the user may tell the agent to assess this by counting articles in well-respected journals or by counting citations in the citation index. Again, if the agent did not know how to determine what the user considered well-respected journals for this particular situation, it would work with the user to define this term and so on.

As a more "every-citizen" example, imagine a patient who has just been prescribed a drug and then catches the tail end of a news story suggesting that several people have become critically ill after taking the drug. This user would likely have a goal such as: "tell me about the side effects of Wonderdrug" and ''show me the serious side effects first." If no information on "serious side effects" were found, the agent should work with the user to define the term more precisely. For example, the agent could provide the user with a list of the types of side effects it encountered and ask the user which type(s) he or she considers serious.

Planning For Information Access

Once the agent has worked with the user to identify his or her goals, it must be able to form a plan to acquire the information that will aid the user in achieving those goals. IIAs must be equipped with strategies that tell them how to form such plans and must also be able to trade off the urgency of the request against the cost of accessing different information sources and the likelihood that a particular plan will be successful. In the journal editor example I gave earlier, the agent may need to be capable of determining which information sources would be most likely to help find an appropriate reviewer before the end of the day. In the drug example the agent may need to take into account the cost of accessing databases put out by pharmaceutical companies. Agents must also reason about how much advance planning to do before beginning to act and how much information they should acquire before planning or acting in order to reduce uncertainty.

Making progress on these issues will require integrating several ideas coming out of the planning community, including planning under uncertainty (Kushmerick et al., 1995); reasoning about the trade-off between reactive and deliberative behavior (Bratman et al., 1988; Boddy and Dean, 1994); planning for contingencies (Pryor and Collins, 1996); and techniques that integrate planning, information gathering, execution, and plan revision (Draper et al., 1994; Zilberstein and Russell, 1993).

To support agents in forming such plans, new types of automatic indexing schemes must be devised. Data may need to be indexed in multiple ways-for example, reflecting different purposes the data may serve or different levels of detail. In the World Wide Web, links going into and out of a document characterize that document and may be useful

Page 348 Cite

Suggested Citation:"On Functions." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 348

in forming indexes to it (as is done in citation search systems). In addition, automatic indexing schemes that work across modalities are needed.

Intelligent Multimedia Presentation Of Information

IIAs will be able to acquire information from many different information sources in a variety of media. These systems will need to be able to plan multimedia presentations that most effectively communicate this information in ways that support users in achieving their goals and performing their tasks. For example, an IIA helping a visitor to the Washington, D.C., area identify good Thai restaurants may provide a Consumer Reports-like chart rating the 10 best restaurants on a variety of features, a city map showing where the restaurants are located relative to the user's hotel, and spoken excerpts from restaurant reviews that are coordinated with highlighting of the row in the chart and dots on the map that correspond to the restaurants being described. We would also like such multimedia presentations to be tailored to the user's background and preferences, the task at hand, and prior information displays the user has viewed. In the restaurant example, if the system can determine that the user is not familiar with the D.C. area, specific directions to the various restaurants may be given, whereas for a D.C. native an address may be sufficient. If the user has previously requested detailed directions to one restaurant and then requests directions to another restaurant nearby, the system may describe the location of the second restaurant relative to the location of the first.

Owing to the vast information resources that are now available, improved networking infrastructure for high-speed information transfer, and higher-quality audio, video, and graphics display capabilities, intelligent multimedia presentation is an active area of research. As Roth and Hefley (1993) define them, intelligent multimedia presentation systems (IMMPSs) take as input a collection of information to be communicated and a set of communicative goals (i.e., purposes for communicating information or the tasks to be performed by the user requesting the information). An IMMPS typically has a knowledge base of communicative strategies that enable it to design a presentation that expresses the information using a combination of the available media and presentation techniques in a way that achieves the communicative purposes and supports users in performing their tasks. Roth and Hefley argue that IMMPSs will be most effective in situations where it is not possible for system developers to design presentation software because they cannot anticipate all possible combinations of information that will be requested for display. This is clearly the case for an every-citizen interface to the NII.

IMMPSs must perform several complex tasks. They typically consist

Page 349 Cite

Suggested Citation:"On Functions." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 349

of a presentation planner, a number of media generators, and a media coordinator. The presentation planner uses presentation design knowledge to select content to be included in a display intended to achieve a set of goals for a particular user in a given context. It uses its knowledge of techniques available to the various media generators to apportion content to media and generate a sequence of directives for individual media generators. Media generators (e.g., for natural language text, speech, and graphics) must determine how to convey the content given the directives they receive from the planner and then report back their results to the presentation planner and media coordinator. The coordinator must manage interactions among individual media generators, resolve conflicts, and maintain presentation consistency.

Considerable progress has been made toward systems that perform these tasks for limited domains, user tasks, data, and presentation types. For example, extant prototype systems can coordinate text and graphical depictions of devices for generating instructions about their repair or proper use-for example, Comet (Feiner and McKeown, 1991) and WIP (Wahlster et al., 1993). These systems generate multimedia presentations from a representation of intended presentation contend and represent progress toward some of the functionality desired in an every-citizen interface. For example, these systems can effectively coordinate media when generating references to objects (e.g., "the highlighted knob"; McKeown et al., 1992; Andr and Rist, 1994) and can tailor their presentations to the target audience and situation (McKeown, 1993; Wahlster et al., 1993). In addition, it generates its presentation in an incremental fashion. This allows it to begin producing the presentation before all of the input is received and to react more promptly if the goals or inputs to the generator are changed. These are important features for an IMMPS that will be used in an interface that is presenting information from the NII. Another important area of recent research is in coordinating temporal media (e.g., speech and animation), where information is presented over time and may need to be synchronized with other portions of the presentation in other media (Feiner et al., 1993; Andr and Rist, 1996).

Ideally, an IMMPS would have the capability to flexibly construct presentations that honor constraints imposed by media techniques and that are sensitive not only to characteristics of the information being presented but also to user preferences and goals and the context created by prior presentations. Researchers working in text generation have developed systems that are capable of using information in a discourse history to point out similarities and differences between material currently being described and material presented in earlier explanation(s), to omit previously explained material, to explicitly mark repeated material so as to distinguish it from new material (e.g., "as I said before, 1dots"), and to use

Page 350 Cite

Suggested Citation:"On Functions." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 350

alternative strategies to elaborate or clarify previously explained material (Carenini and Moore, 1993; Moore, 1995; Moore et al., 1996).

This prior research requires rich representations of the information that is presented, as well as models of the user's goals, tasks, and preferences. Extending this work for an interface to the NII will require research on standardized data modeling languages and/or translation kits and reusable models of common tasks. In addition, IMMPSs capable of operating with shallower representations must be developed.

Finally, we cannot expect and may not even want IMMPSs to be monolithic systems that completely design presentations according to their own criteria. Thus, systems must be devised that can provide many levels of assistance to users in the presentation design process. Users cannot be expected to fully specify presentation design choices; it is more natural for them to learn a language for expressing their tasks and goals than to learn a language for describing presentation techniques. In some cases, users will have preferences about presentation design in advance of display generation. In other cases they will want the ability to alter the way information is presented once they have seen an initial presentation. Research is needed to develop natural, flexible interfaces to support interactive design, such as those described by Roth et al. (1994, 1995).

User Interface Environments For Information Exploration

Even if IIAs can be provided that accept the type of queries I envision, users will want the capability to browse or explore the NII. This may be because they could not articulate a query (even interactively) until they saw some of what was available or because the information they received led them to want further information. In addition, users may want to see some of the information in more detail or see it presented in a different manner. For example, a user who is relocating to a new area might request a visualization that shows several attributes of a set of available houses and relationships between them (e.g., number of rooms, lot size, neighborhood, and asking price). Once this display is presented, the user may then want to select some subset of the particular houses contained in the original display, pick up this set, and drag-and-drop it on a map tool to see more precisely where the houses in the set are located.

To provide these kinds of capabilities, software environments need to be developed for exploring and visualizing large amounts of diverse information. As Lucas et al. (1996) argue, this requires moving from an application-centric architecture to an information-centric approach. The distinction hinges on differences in the basic currency through which users interact with a system. In application-centric architectures the basic

Page 351 Cite

Suggested Citation:"On Functions." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 351

currency is the file, and users must rely on applications to fetch and display information from files. Each application has its own user interface that defines the types of files people can manipulate and what they can do with them. With the introduction of graphical user interfaces and the desktop metaphor, files became concrete visual objects, directly manipulated by the user, stored on the desktop or in folders, and, to a limited extent, arranged by users and software in semantically meaningful ways. But the contents of those files is still out of direct reach of the user.

In their Visage system, Lucas et al. (1996) take an information-centric approach in which the basic currency is the data element. Rather than limiting the user to files (or documents) as targets of direct manipulation, Visage permits direct drag-and-drop manipulation of data at any level of granularity. A numerical entry in a table, selected bars from a bar chart, and a complex presentation graphic are all first-class candidates for user manipulations, and all follow the same "physics" of the interface. Users can merge individual data items into aggregates and summarize their attributes or break down aggregated data along different dimensions to create a larger number of smaller aggregates. These capabilities form the foundation for a powerful interface for data navigation and visualization.

Adaptive Interfaces

Although the Visage approach has proven successful for the simple graphics implemented in the Visage prototype (i.e., text in tables, bars in charts, symbols in maps), continued research is needed to handle the wide range of data and presentation types that populate the NII. In particular, new approaches that allow richer analysis of the information contained in hypertext documents are needed. One area that is developing technology relevant to this need is research on adaptive hypertext and hypermedia systems, which exploit information about a particular user (typically represented in the user model) to adapt both the hypermedia displays and the links presented to the user. Adaptive hypermedia is useful in situations where the hyperspace is large or the system is expected to be used by people with different knowledge and goals. This is clearly the case for the NII.

Researchers in text generation (Moore and Mittal, 1996) are working on interfaces in which system-generated texts are structured objects. During the generation process, the system applies abstract rules to determine which text objects should be selectable in the final presentation (i.e., which text objects will have "hyperlinks" associated with them). To pose questions, the user moves the mouse over the generated text, and those portions that can be asked about become highlighted. When the user selects a text object, a menu of questions that may be asked about this text

Page 352 Cite

Suggested Citation:"On Functions." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 352

appears. Question menus are generated on the fly using a set of rules that reason about the underlying concepts and relationships mentioned in the selected text (as represented in a knowledge base). Because the system has a record of the plan that produced the text as well as a user model, it can reason about the context in which the selected text occurs and provide a menu of follow-up questions that are sensitive to both the discourse context and the individual user. In this system, texts are synthesized from underlying knowledge sources by the system in response to the user's question or the system's need to communicate with the user. Because the text is generated dynamically, the system cannot in advance identify the particular text objects that should have associated links or links to other texts. Indeed, in this framework, traversing a link corresponds to asking the system to generate another text. Moreover, the follow-up questions, which correspond to the links in traditional hypertext systems, cannot be precoded and fixed in advance but are generated dynamically using heuristics that are sensitive to domain knowledge, the user model, and the discourse context. As with many other artificial intelligence approaches, this technology depends on the system having a rich underlying representation of the domain content described in the generated text as well as a model of the textual structure. But we can easily imagine adapting this technology for use with the NII. Techniques exist for automatically generating indexes from unrestricted text for information retrieval (Evans and Zhai, 1996), so we can expect that such indexes will (or could) be available for many, if not all, documents on the NII. In addition, parsers and part-of-speech taggers can robustly identify the constituents of sentences (Brill, 1993). Building on these existing technologies would allow an interface in which, say, all noun phrases in a document become mouse sensitive, and the hyperlinks to other documents are determined on demand by using the noun-phrase (synonyms, etc.) as an index to find related documents. Techniques developed in the area of adaptive hypermedia may also be employed to allow the selection of links to be sensitive to the user's knowledge and goals.

References

Andr, E., and T. Rist. 1994. Referring to World Objects with Text and Pictures. Pp. 530-534 in Proceedings of the 15th International Conference on Computational Linguistics.

Andr, E., and T. Rist. 1996. Coping with Temporal Constraints in Multimedia Presentation Planning. Proceedings of the National Conference on Artificial Intelligence. Menlo Park, Calif.: AAAI Press.

Boddy, M., and T.L. Dean. 1994. Deliberation Scheduling for Problem Solving in Time-Constrained Environments. Artificial Intelligence, 67(2):345-385.

Bratman, M.E., David J. Israel, and M.E. Pollack. 1988. Plans and Resource-Bounded Practical Reasoning. Computational Intelligence, 4(4):349-355.

Brill, E. 1993. Automatic Gammar Induction and Parsing Free Text: A Transformation-Based

Page 353 Cite

Suggested Citation:"On Functions." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 353

Approach. Pp. 259-265 in Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics.

Carenini, G., and J.D. Moore, 1993. Generating Explanations in Context. Pp. 175-182 in W.D. Gray, W.E. Hefley, and D. Murray, Eds. Proceedings of the International Workshop on Intelligent User Interfaces. New York: ACM Press.

Draper, D., S. Hanks, and D. Weld. 1994. Probabilistic Planning with Information Gathering and Contingent Execution. Pp. 31-36 in K. Hammond, Ed., Proceedings of the 2nd International Conference on Artificial Intelligence and Planning Systems. Menlo Park, Calif.: AAAI Press.

Evans, D.A., and C. Zhai. 1996. Noun-Phrase Analysis in Unrestricted Text for Information Retrieval. Pp. 17-24 in Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics. Somerset, NJ: ACL.

Feiner, S.K. and K.R. McKeown. 1991. Automating the Generation of Coordinated Multimedia Explanations. IEEE Computer, 24(10):33-41.

Feiner, S.K., D.J. Litman, K.R. McKeown, and R.J. Passonneau. 1993. Towards Coordinated Temporal Multimedia Presentation. Pp. 139-147 in M.T. Maybury, Ed., Intelligent Multimedia Interfaces. Menlo Park, Calif.: AAAI Press.

Grosz, B., and R. Davis. 1994. A Report to ARPA on Twenty-First Century Intelligent Systems. AI Magazine, 15(3):10-20.

Kushmerick, N., S. Hanks, and D. Weld. 1995. An Algorithm for Probabilistic Least-Commitment Planning. Artificial Intelligence, 76(1-2):239-286.

Lucas, P., S.F. Roth, and C.C. Gomberg. 1996. Visage: Dynamic Information Exploration. In Proceedings of the Conference on Human Factors in Computing Systems. New York: ACM Press.

McKeown, K.R. 1993. Tailoring Lexical Choice to the User's Vocabulary in Multimedia Explanation Generation. Pp. 226-233 in Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics. Somerset, N.J.: ACL.

McKeown, K.R., S.K. Feiner, J. Robin, D. Seligmann, and M. Tanenblatt. 1992. Generating Cross-References for Multimedia Explanation. Pp. 9-16 in Proceedings of the National Conference on Artificial Intelligence, Menlo Park, Calif.: AAAI Press.

Moore, J.D. 1995. Participating in Explanatory Dialogues: Interpreting and Responding to Questions in Context. Cambridge, Mass.: MIT Press.

Moore, J.D., and V.O. Mittal. 1996. Dynamically Generated Follow-Up Questions. IEEE Computer, Vol. 75-86. (July).

Moore, J.D., B. Lemaire, and J.A. Rosenblum. 1996. Discourse Generation for Instructional Applications: Identifying and Exploiting Relevant Prior Explanations. Journal of the Learning Sciences, 5(1):49-94.

Pryor, L., and G. Collins. 1996. Planning for Contingencies: A Decision-Based Approach. Journal of Artificial Intelligence Research, 4:287-339.

Roth, S.F., and W.E. Helfely. 1993. Intelligent Multimedia Presentation Systems: Research and Principles. Pp. 13-58. in Mark T. Maybury, Ed., Intelligent Multimedia Interfaces. Menlo Park, Calif. AAAI Press.

Roth, S.F., J. Kolojejchick, J. Mattis, and J. Goldstein. 1994. Interactive Graphic Design Using Automatic Presentation Knowledge. Pp. 112-117 in Proceedings of the Conference on Human Factors in Computing Systems. New York: ACM Press.

Roth, S.F., J. Kolojejchick, J. Mattis, and M. Chuah. 1995. Sagetools: An Intelligent Environment for Sketching, Browsing, and Customizing Data Graphics. Pp. 409-410 in Proceedings of the Conference on Human Factors in Computing Systems. New York: ACM Press.

Wahlster, W., E. Andr, W. Finkler, J.J. Profitlich, and T. Rist. 1993. Plan-Based Integration of Natural Language and Graphics Generation. Artificial Intelligence, 63(1-2):387-428.

Zilberstein, S., and S.J. Russell. 1993. Anytime Sensing, Planning and Action: A Practical Model for Robot Control. Pp. 1402-1407 in Proceedings of the 13th International Joint Conference on Artificial Intelligence. Chambery, France.

Page 354 Cite

Suggested Citation:"On Functions." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 354

Resource Discovery And Resource Delivery

Kent Wittenburg

Bellcore

The promise of a national information infrastructure (NII) includes the ability for every citizen to access certain fundamental information and services. I would assume that examples of fundamental information and services would include directory information (analogous to the present white and yellow pages); local, state, and federal government information, ranging from tax help to local library and social services to environmental regulations; and, lastly, a panoply of commercial services, such as audio/video broadcasting and n-way conferencing, virtual storefronts, and banking services that will emerge on some form of an infrastructure that combines elements from present telephony, broadcasting, and data networks.

Today's World Wide Web is the closest approximation we have to the NII of the future. One way to proceed in revealing the pertinent research issues is to ask where the current World Wide Web falls short in light of the goal of universal access. Besides the problem of physical access via networks and user premise hardware, I see two major problem areas: (1) the resource discovery problem, which I take to be a product of the chaotic, fundamentally democratic nature of the Web, and (2) the resource delivery problem, which has many dimensions, including complications brought about by differing bandwidth capacities, differing user interactive devices, and differing user cognitive (dis)abilities.

The goals that I see in these two areas can be stated simply:

•

With respect to resource discovery, every literate citizen should have access to directory and finding services that are within that user's capacity to use. For example, every citizen should be easily able to locate the local library or school system's offerings on the NII and select an appropriate service.

•

With respect to resource delivery, every literate citizen, including those with vision or hearing disabilities, should be able to access information and/or services at a certain minimum bandwidth using any device with certain minimum resources (audio, screen size, color, etc.). For example, a blind person who has use of an interactive device based entirely on audio should still be able to find and access the local library or school system's offerings.

Page 355 Cite

Suggested Citation:"On Functions." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 355

Resource Discovery Research Issues

The resource discovery problem is one that is patently obvious to anyone trying to use the World Wide Web today. Currently, the strategies for finding resources are:

•	ask a friend or colleague;
•	subscribe to mailing lists or publications that give updates on resources of interest to subscribers;
•	use general-purpose search services, such as AltaVista (http://www.altavista.com); and
•	use subject-cataloging sites such as Yahoo (http://yahoo.com) or those provided by Internet access providers or mom-and-pop miniguides.

The first two strategies actually may be more promising as a basis for future research activities and tools than we may think at first blush. I would hazard a guess that most of today's knowledge workers whose responsibilities include keeping apace with developments in their areas of expertise have evolved a strategy of managing a personal view of the World Wide Web that is populated largely by resources found through serendipitous contacts with colleagues and friends as well as received as through netnews, mailing lists, and print or electronic publications. One of the research challenges I would pose is to create tools and methods for larger communities to leverage the millions of individual efforts that are already taking place for organizing information. Two exemplars of such community-based efforts at resource discovery are Bellcore's Group Asyncronous Browsing project (http://www.w3.orgpub/Conferences/WWW4/Papers/98/) and ATT's PHOAKS work (http://weblab.research.att.com/phoaks/).

Search-based services are faced with the problems of how to manage and structure the astoundingly huge hit sets returned by their queries, how to include some form of quality control, and how to surmount, or shall we say circumvent, the inevitable precision/recall trade-offs. Further, there is the problem of combining and manipulating results from different search services and other relevant information broker sources. Efforts to achieve some standardized distributed object-like protocols so that different search services can be integrated is a step in the right direction (http://www-db.stanford.edu/˜gravano/standards). Another needed direction is in how to integrate search with structural browsing and in fact with community-based sources of information as above. In general, there needs to be much more work on how to integrate filters and views over a domain, so that, for instance, a user does not have to deal with the results from a general query whose domain is the world when all

Page 356 Cite

Suggested Citation:"On Functions." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 356

he or she is looking for is the library down the street. For an every-citizen interface there is also the fundamental difficulty that effective use of today's general-purpose search engines requires a degree of sophistication beyond the reach of a substantial part of the population.

As for subject cataloging efforts, the major problems are the magnitude of manual labor required to keep up with the rapidly changing Web and the self-evident truth (which not everyone agrees with) that a single universal hierarchical classification of every piece of information on the Web, even if it existed, would not be very useful. The private customized subject catalogs one finds on the home screens of certain access providers or networks are fine as far as they go, but they will not scale. One of the major issues is how to build effective interfaces for browsing multiple hierarchies in an integrated fashion and, again, how to impose views over a massive collection of hierarchies that might be influenced by such factors as quality ratings, personal or group histories, popularity, and geographic locality.

Proposed Projects: Resource Discovery

In the commercial marketplace we can expect that competing directory and search services will fight it out on the Web, but a danger is, of course, that a few companies may monopolize the area and make it difficult for the little guys to make themselves known. The business models for directory services seem at the moment to rely primarily on advertising (or pseudoadvertising) revenues. For instance, one has to pay a lot of money to get a link placed on Netscape's browser menu (http://www.netscape.com). What is difficult to see coming from the private sector are efforts that leverage the search services collectively and that integrate other helpful sources such as public sites (Library of Congress, for instance). Note that such services would violate the current economic models since the private provider no longer has control over the end-user's screen and cannot then leverage advertising revenue. In that light I would suggest that at least one major project coming out of a government funding effort concentrate on proposing standards and economic models for information brokering services so that integrated resource discovery tools become possible. We also need to experiment with building public registries so that nonprofit and governmental bodies can be easily found and their services utilized by the appropriate populations. One representative initiative in the earth science area is the USDAC project sponsored by the National Aeronautics and Space Administration (http://usdac2.rutgers.edu/).

Another major thrust I would see is in projects whose goal is to build interfaces that integrate information-brokering services enabled in part

Page 357 Cite

Suggested Citation:"On Functions." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 357

by the distributed substrate mentioned above. In the relatively short term, we can expect to see incremental improvements from the private sector in interfaces for search and, say, graphical browsing. An appropriate subject for longer-term combined unversity and private research lab efforts is to experiment with prototype interfaces for integrating distributed search services, structurally oriented graphical browsing, community-based sources, geographic location services, and so on.

Lastly, there needs to be more basic research to understand and model human resource discovery behavior, leading to evaluation metrics that might be used to judge competing techniques and systems. The Xerox PARC work on information foraging (http://www.acm.org/sigchi/chi95/Electronic/documents/papers/ppp_bdy.htm) is one exemplar of this line of research. The ultimate test will be success at allowing nonsophisticated users to find what they need, but it is not obvious at this point how to compare and evaluate competing systems and methods.

Research Issues: Resource Delivery

As mentioned, the multidimensional resource delivery problem is characterized by constraints such as differing bandwidth capacities, differing user interactive devices, and differing user cognitive (dis)abilities. There are many ways in which to address these issues. My own perspective is approach it from the information provider's point of view. If I as an information provider need to get fundamentally the same content (and services) presented along so many dimensions, what is the technology that will aid me to do so?

The situation today is simply a horror show, although the advent of standard HTML, tools for format conversion, and cross-platform Java are major breakthroughs. The basic issue is how one moves beyond HTML to generate truly interactive interfaces for devices ranging from high-end workstations at T1 bandwidth to palm-sized wireless devices at less than 2K bits per second, for users with their full faculties and university training to economically disadvantaged and disabled users.

From a design perspective the problem is how to take the same content and design effective presentations and interactions for certain target points in this multidimensional space. This is a huge problem, and it is unreasonable to expect that each information provider must do this themselves from scratch for each piece of information they supply. The meager efforts to address this situation, as far as I can see, are coming from the private sector in the form of proprietary HTML-based tools that represent content in proprietary forms and then provide very limited help in designing for a very narrow range of alternatives.

The research communities have shown some interest in addressing

Page 358 Cite

Suggested Citation:"On Functions." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 358

multimodal and information-based graphical alternatives. I think that the most appropriate issue to address in the context of universal access is how to design effectively in the face of very strong constraints (such as audio-only interfaces) given a common content representation. One must understand not only how to create an appropriate design in general but also how to map to an instance of it from common content. Representing design knowledge (http://www.computer.org/conferen/v195/talks/T1.html) is a crucial element in this enterprise. An example of a proposed mapping architecture can be found at http://community.bellcore.com/kentw/rg-for-ap5-abstract.html. Others can be found in Artificial Intelligence-oriented work on automatic presentation coming out of Columbia, Carnegie Mellon, ISI, and MITRE. However, it strikes me that all of this work is far from being able to deliver on the promise of automatic contruction of presentations and interactive interfaces to meet the needs of universal access. The major bottleneck, as far as I can see, is the lack of a common set of assumptions about the representation of content.

Proposed Projects: Resource Delivery

First, I would propose an effort at defining a common level of representation in the context of one or more particular information sets. For instance, the government might have an interest in seeing certain essential on-line services such as tax help currently available in HTML designed and delivered for a wider range of users and user devices. Rather than attempting to agree on a common representation in the abstract, which will be virtually impossible, I would suggest one or more concrete projects out of which a proposal for a common representation may emerge.

Second, I see a need for cataloging design knowledge relevant to particular points along this multidimensional delivery spectrum. These points need to be identified, of course, and then best-in-class example interfaces need to be built and abstracted. Again, this could be done in the context of a particular information domain of relevance to the government. Defining appropriate forms for design knowledge is, of course, a difficult problem in itself and would need to be addressed.

Lastly, I think basic research in architectures for systems that map from content to presentation instances needs to be fostered. These three proposals in the end require integration, but in the short run I think it is most appropriate to allow some independent efforts that can ultimately be brought together.

Page 359 Cite

Suggested Citation:"On Functions." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 359

Search And Publishing

Robert A. Virzi

GTE Laboratories Incorporated

Summary

The national information infrastructure (NII) is already here. Bits and bytes buzz by us at an astounding rate just about wherever we go. The problem is not with the infrastructure-the databases, protocols, and physical transport media. The problem is that we do not have a national information superstructure-a set of access and publishing tools that are widely distributed, easy to use, cheap, powerful, and accessible through a variety of terminals that include computers, personal digital assistants, telephones, screen phones, cellular phones, pagers, and television sets. In my view the Web of tomorrow will be nothing like the Web of today in terms of who is using it and what they will expect from it. To get there from here, we need to make progress in three key technologies: (1) intelligent agent technologies, including searching and filtering tools, need to be made more comprehensive and transparent; (2) publishing tools need to evolve so that a greater variety of content producers can create information, in a greater variety of media; and (3) access needs to be possible independent of place (office, car, or home) and device (computer, phone, or TV set).

Intelligent Agent Technologies

The current generation of searching and filtering tools is inadequate in two key respects. First, the tools are not comprehensive, restricting searches to a subset of the available sources. For example, searching for a business's name on the Web may return me a pointer to the business's Web site, but it is unlikely to return a phone number that the business can be reached at. This is somewhat surprising considering that I can search for a business's phone number on the Web at any of several sites, yet these searches are not likely to tell me much about a business's Web presence. The net effect is that not only do I need to know the specifics of searching using any engine, I also need to know where to go to begin searching. The second user interface problem I have observed in searching and filtering

Page 360 Cite

Suggested Citation:"On Functions." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 360

mechanisms is that they tend to be rather opaque when it comes to informing the user of what was searched, what was rejected, why items may have been rejected, and how successes are prioritized. An ideal intelligent agent would make it easy for the user to express what he wanted done; confirm that the user's intentions matched the effected actions; convey the breadth of the search, including gaps; and present results in a clear way that matches the user's expectations.

Publishing Tools

An interface to the NII for every citizen implies to me far more than the ability to find information. I have seen many reports in the press regarding the gap between information haves and have-nots. While I do not want to underestimate the importance of gaining access to information, I feel there has been far too little attention given to the gap between the content production capable and the incapable. If we succeed in providing access to information for every citizen, I feel we will have fallen far short of the mark. What is needed are tools and technologies that will allow any citizen to produce and publish content. We need to break down the barriers to production so that the NII doesn't become the next form of television-a one-way pipe for the dissemination of carefully chosen messages by a select few. I believe that the goal of the NII should be not just many-to-many communication but any-to-any communication. For this to happen, we need to find ways to make it much easier for people to produce content. We also need to understand and discuss new publishing models that include broadcasting but that also allow for ''narrowcasting" of information.

I see the current barriers to widespread content production and publishing as the high capital costs of computer ownership; the high level of technical savvy demanded by today's production, editing, and publishing tools; the literacy level required by what is predominantly a textual medium; and a lack of understanding on the part of the general population of the power of publishing content.

Place And Device Independence

As people come to rely on the NII more and more and as commercial endeavors begin to utilize it, the need to provide ubiquitous access will increase. users will come to rely on the technology, and will demand access to it from their offices, homes, cars, hotels, airports, and other public places. In order to provide place independence, we must find a way to provide device independence. It should be possible to obtain or publish information on the NII without a computer. This is not to say that

Page 361 Cite

Suggested Citation:"On Functions." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 361

full functionality needs be provided from all types of terminals, but rather that the data appropriate to a given type of terminal should be presented. Once we have achieved device independence, place independence will follow automatically. And once we have place independence, people will come to rely on and use the NII heavily and automatically.

Research Questions

Based on the three key technological advances I see as necessary to providing an every-citizen interface to the NII, I will outline some research questions that I see as ripe for attack.

•	Search strategies. How do people approach a search task? The design of search engines would benefit greatly from knowing what people expect to happen given a variety of scenarios. Free search engines such as AltaVista seem to demand different kinds of knowledge from the user than a visit to, say, Yahoo!, which relies on defined categories to a greater extent. Do either of these models map more naturally to users' cognitive styles? Are there task dependencies? Questions such as these seem eminently answerable today. Indeed, there is actually a large information sciences literature relevant to this area that has been strangely ignored.
•	Device-independent data structures. Much of the information accessible today is highly tied to a particular class of display devices. Even multimedia objects (e.g., a movie) are not usually separable into independent streams if, for example, my device could play sound but not show moving images. This is unfortunate because in many instances degraded access to information is preferable to no access. This is a difficult problem, but it would be very useful to be able to separate the representation of the object as data from the device used to present it. The goal of this line of research would be to define a common representation for published objects, along with translation mechanisms to provide access to the objects from a variety of devices. Such a research project would have a fairly far horizon.
•	Publishing needs. To create usable publishing tools, we need a better understanding of what it is that people will want to publish. This is more difficult than it might at first sound, because it is difficult to get consumers to think about how they will use something that is foreign to them. However, in order to build tools that support the publishing needs of every citizen, and to evolve the infrastructure in support of those needs, we need a better understanding of how people want to use the system.
	I would emphasize that this could not be accomplished by looking at people's home pages today. We are in an age where the broadcasting model predominates. People and businesses put on the Web what they

Page 362 Cite

Suggested Citation:"On Functions." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 362

are willing to let anyone see. As Web-based security improves, we will move to a narrowcasting model where the publisher can be assured of limiting access to a select group. In some cases, this may lead to pay-for-services sites. In other cases people will use the security to "privately publish" information, so that they may, for example, access an appointment schedule via their car phone.

•

Market research. Although it is likely far outside the bounds of what the National Research Council would consider as a project for study, there is a basic lack of understanding as to what consumers (i.e., citizens) actually want out of the NII. Without this basic understanding we run the considerable risk of building the wrong thing. In the late 1980s, telephone companies were scrambling to find a way to deliver switched video to the home. Luckily, before the huge capital costs were sunk, these same companies began to understand that maybe this was not what their customers actually wanted. In point of fact, something much simpler and cheaper could meet most consumer needs, and that is the direction we are traveling now. I am concerned that we may not have done our homework on the NII. Building it does not ensure that they will come.

Page 363 Cite

Suggested Citation:"On Functions." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 363

Security

Stephen Kent

BBN Corporation

Evolution Of National Information Infrastructure Elements

As the national information infrastructure (NII) moves into the twenty-first century, it is evolving into a more sophisticated collection of information-processing and telecommunications systems. One of the most pressing challenges is to offer greater functionality, and incorporate greater security and privacy for these systems while providing a user interface that is comprehensible, comfortable, and easy to learn by average citizens.

Residential telephone services, which used to provide only simple point-to-point connection, now offer Call Waiting, Call Answering, Caller ID, Call Trace, Call Back, multiplexing of multiple numbers onto a single line (with distinct ring patterns), etc. Cellular telephones and pocket pagers began as expensive devices affordable only by professionals or those whose companies paid the bills. Now they are used by many "everyday" citizens, and they offer increasingly sophisticated capabilities (e.g., two-way nationwide paging). Airline passengers can initiate and receive phone calls on many flights.

Local television was once defined by a small number of channels, dominated by three major networks, and delivered via VHF and UHF signals. Today, cable and satellite (including direct-broadcast satellite) systems offer tens (soon hundreds) of channels, many with specialized program material. Some television delivery systems allow a user to select a channel via interaction with integrated, on-screen, time- and subject-oriented schedules.

Computers, which used to be large, isolated, centralized systems costing millions of dollars, have become inexpensive, portable, and networked. Computer communication, which was initially slow, expensive, and not very extensive, has become fast, cheap, and almost as ubiquitous as telephone service. Local-area networks provide high-speed access in buildings and on campuses, while wide-area nets connect systems around the world. Emerging wireless computer communications systems promise to make mobile computing connectivity as easy and common as cellular phone service.

Page 364 Cite

Suggested Citation:"On Functions." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 364

Citizen interaction with organizations, both businesses and government agencies, used to require face-to-face meetings, filing of handwritten forms, or telephone calls. Automated touch-tone response systems, tied to databases and enabled with synthesized voice response technology, have greatly increased the range of information that citizens can access and have expanded the times at which such data can be accessed. Credit card balances, frequent flier account information, and tax refund status can all be checked through calls to toll-free numbers. Stock and mutual fund trades and bank fund transfers can be initiated via similar means, all without direct interaction with a human being and on a 24-hour basis.

Today, many of the systems that have provided automated telephone access capabilities are moving to Internet-enabled access. This provides a much more powerful and convenient interface, enabling a wider range of data access with faster response and interaction characteristics. Businesses and government agencies are moving rapidly to make information available over the Internet, via the World Wide Web. Massachusetts now supports payment of traffic tickets over the Web, as a first step toward making government more responsive and accessible to citizens. Businesses of every stripe, from financial institutions to mail-order catalog merchants, are providing client access via the Web, in addition to the telephone.

A Vision Of Near- And Intermediate-Term NII Interactions

Within the next 5 to 10 years significant improvements in many NII element interfaces will be implemented and widely deployed. The use of technologies such as cable modems, ADSL, and ISDN will significantly improve Internet access speeds for residential users. Continuing improvements in computer technology will increase local processing speed, enabling more sophisticated user interface software. The advent of very low-cost computers, designed specifically for Web browsing and using a television as a video interface, promises to expand the subscriber base into many more households. Using this model, one can imagine an NII in which citizen interaction with government agencies, businesses, and with one another make substantial use of this environment.

Requests for generic data from a vast array of government databases can be made and instantaneously satisfied via Web browsers interacting with servers coupled to massive databases. Interactions for a variety of personal transactions with government agencies also will be enabled (e.g., filing tax forms, making tax payments, or checking one's Social Security account status).

Many catalogs and periodicals now delivered in hard-copy form can

Page 365 Cite

Suggested Citation:"On Functions." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 365

be delivered via the Web, as some already are. After browsing on-line catalogs, clients will place orders for items to be shipped via the postal system or express delivery services, all via the Web. All forms of financial transactions (e.g., credit and debit card purchases, checks, cash exchanges, stock and mutual fund transactions) will be available via the Internet, and many will make use of these facilities instead of hard-copy instruments.

Security, Interfaces, And The NII

As elements of the NII evolve, as described above, they offer increased functionality and improved performance, usually at lower prices. However, security and privacy concerns often are overlooked in this rush to enhance the NII. Cellular phone calls are not only easy to intercept, but the account information used for billing is even easier to acquire via automated means. It has been estimated that the lack of attention to security, if only for this billing authorization information, has cost the cellular phone industry hundreds of millions of dollars in lost revenue. Digital paging systems are highly vulnerable to interception, raising privacy concerns. Theft of service for cable and satellite TV delivery systems is often decried as depriving those industries of significant amounts of revenue. The Caller ID feature for telephone systems is both a blessing and a curse, from a privacy perspective.

As companies have connected corporate computer and network systems to the Internet to facilitate user access, the overall security of these systems has often been degraded. Most of these Internet connections are secured by firewalls, a technology that usually constrains the Internet so as to reduce its capability (even for authorized users) and which ultimately fails to provide high-quality security for the computers being protected. The ease with which electronic mail can be intercepted or forged is appalling. As the first tentative steps are taken toward consumer-level Internet electronic commerce, the on-line literature is replete with examples of technical opportunities for fraud.

For many of these NII elements the technology for improving security and privacy has been available for some time, but often it has not been implemented. Sometimes the reasons are purely economic (e.g., the cost of adding security technology is perceived to make the resulting product noncompetitive). Sometimes time-to-market concerns prevent incorporation of security features (i.e., the delay imposed by adding security features would allow a competitor to offer a product or service sooner and thereby capture market share). However, in some cases the difficulty of providing a good user interface for security technology has been a major impediment.

Page 366 Cite

Suggested Citation:"On Functions." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 366

As underlying communications and computing systems become more complex, there is a natural tendency for the user interface to become more complex, though that need not always be the case. For example, WIMP (windows, icons, menus, pointers)-based operating system interfaces can mask substantial underlying complexity, as illustrated by the contrast between the Apple Macintosh and DOS interfaces at corresponding points in time. However, within the context of a paradigm such as WIMP, increased functionality often results in increased complexity for users, as both Windows 95 and Mac users can attest.

Computer systems have become more complex, and network interactions have become commonplace in the desktop and laptop systems that users employ in home environments. Providing security for such systems has become increasingly difficult. In the 1970s and 1980s much research was devoted to the development of secure operating systems, primarily for use with multiuser systems (e.g., time-sharing systems and servers). However, all of this research into secure operating systems yielded very little that has been commercially successful or widely deployed. Today, operating systems for the desktop computers most commonly used in home environments (e.g., Windows 95 and Mac), have very few security features. Yet these may be the models for the systems that citizens will most commonly employ in their interactions with the NII.

An alternative model, suggested by the "network computer" paradigm promoted by companies such as Oracle and Sun, is a Java interpreter and a Web browser as the operating system replacement. Given the many security problems that have been discovered in Web browsers such as the Netscape Navigator and the rash of Java-based security problems that have been described in the literature, this is hardly an encouraging alternative paradigm.

In either case, networked computers of some sort will provide an every-citizen interface to many NII elements. There are fundamental and difficult problems associated with developing highly functional and secure networked computer systems; these problems are exacerbated when there is a requirement to make the systems easy to use by all citizens.

What Is The Hard Problem

The fundamentally hard problem, as alluded to above, is one of trying to make an increasingly complex system, operated by untrained users, secure in the face of attacks by sophisticated adversaries. Various aspects of this problem are examined below.

As noted above, firewalls are typically used in corporate environments to provide "secure" connectivity to the Internet. One of the major reasons for adopting this strategy is that those responsible for corporate

Page 367 Cite

Suggested Citation:"On Functions." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 367

computer security find themselves unable to effectively manage security for individual desktop computers. Instead, by inserting a firewall at the perimeter of the corporate network, the site security administrator can focus his or her attention on managing a single (or small number) of computers devoted to a well-defined and limited task (i.e., controlling the flow of Internet traffic across the security perimeter).

In contrast, security management of individual desktop systems is hard because these systems are often directly under the control of users, executing a wide range of software, and based on operating systems that are insecure out of the box.

The control afforded by firewalls is in direct opposition to the Internet goal of facilitating the flow of information between clients and servers. Security administrators are constantly fighting a battle to protect desktop systems and servers by controlling the flow of data (using fairly crude tools), while users clamor for unbridled access to Internet resources. The best a security administrator can hope for is to implement a packet-filtering policy that satisfies most user demands while minimizing attack opportunities. This tug of war has become worse with the advent of the Web and Java. The Java model calls for loading software from servers into users' computers for local execution, rather than transmitting data for display by a browser and so-called helper applications.

In a home environment, if the typical citizen makes use of the same operating system and many of the same applications and is assumed to be even less technically sophisticated that his or her office counterpart, there is even less likelihood that he or she will be able to manage the system in a secure fashion. Moreover, since the system may connect directly to a wide range of Internet servers without the benefit of an intervening firewall managed by a security administrator, the opportunities for successfully attacking such computers are almost boundless. The network computer (NC) model transforms the problem but does not solve it. Proponents of low-cost NCs describe simple systems without local disk storage and with a minimal operating system (e.g., similar to a Web browser). Applications are downloaded onto the NC over the net, for local execution, via high-speed connections.

Historically, one of the most difficult security problems to address is one in which potentially hostile software is imported into a target machine and executed. The "confinement problem" refers to this situation, where the imported software is supposed to be constrained in its access to user data, being granted only the access necessary to perform its advertised task. A Trojan horse is malicious imported software that performs some apparently useful function but also executes some sort of attack on the target system (e.g., destroying data or acquiring data for the attacker). In conventional systems the first challenge for an attacker using a Trojan

Page 368 Cite

Suggested Citation:"On Functions." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 368

horse was the problem of introducing his or her software into the target. If stealing data is the goal of the attack, the second problem faced by the attacker is one of exfiltration (i.e., sending the data back to the attacker). In an Internet environment, especially in the context of Web use, this second challenge essentially disappears. Confinement, if successfully implemented, addresses Trojan horse attacks by limiting the data and system resources available to the imported software.

In the past a security-savvy user would never import software into his or her system from other than well-known sources (e.g., major vendors). The introduction of "shareware" and "freeware" into a desktop computer, distributed over the Internet or downloaded from an on-line service such as AOL or CompuServe, flies in the face of this traditional security convention. Yet software distributed in this fashion has become quite popular and is widely used in corporate as well as home computer environments. Functionality has won out over prudent security practice, even though examples of Trojan horse attacks via these software sources are not unknown.

The Java model takes the imported software notion to its ultimate conclusion; it creates a legitimate path for infiltration of software (Java "applets") into user computers, typically via the Internet and the Web. To make this potentially dangerous situation less so, Java applets are supposed to be constrained in terms of the operations they can perform in the client computer. For example, applets are not supposed to have access to the local file system, to read or write user files. Unfortunately, vulnerabilities in the initial implementations of Java interpreters have not successfully confined applets, as promised. Even if these Java security problems are fixed, it is not clear that this simple model of highly constrained applet behavior will persist. Historically, useful applications have required access to user files, both for reading and writing. If applets are to become powerful tools performing increasingly sophisticated tasks for users, it seems unlikely that this stringent constraint will remain. Thus, one should assume that applets will, in the future, be granted access to user data, whether the data are locally resident on a full-fledged computer or stored on some network file server. Functionality almost always wins out over security.

An even more serious concern is that the user of a Java-enabled Web browser (or of a network computer in the future) may not even know when applets are being loaded into his or her computer. Today, many corporate security administrators urge users to disable Java support in their Web browsers to minimize the potential for this sort of security problem.

While use of Java is still rather minimal on the Internet, in time many Web pages may become Java applets, and disabling Java may prevent access to so many sites that users are forced to permit Java execution. So

Page 369 Cite

Suggested Citation:"On Functions." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 369

even if the user interface were to alert the user when an applet is down-loaded, how would a user know whether that event posed a danger?

In principle, if Java environments evolve to a point where the known vulnerabilities are successfully addressed, the user could control which applets were loaded and what operations they were authorized to perform. But is this a realistic expectation? This represents the critical security user interface issue for the NII.

Previous research on computer security showed that it was quite difficult to establish a constrained execution environment for imported software (i.e., to address the confinement problem). Few operating systems were successfully developed to meet this challenge, and very few are deployed today. Java does help address this problem by providing an interpreted environment, and thus it should be possible to remove many means by which imported software might try to circumvent the constraints imposed by the user. However, so far the Java environment has proven to be vulnerable to circumvention by interpreted code, just as security controls in traditional operating systems have proven vulnerable to circumvention by compiled code.

In those operating systems that have attempted to solve the confinement problem, a user interface capable of administering the fine-grained access control required for confinement has been very complex. Compartmented mode workstations represent the most widely deployed systems that offer some form of operating-system-enforced confinement. These Unix-based systems are exceedingly difficult to administer, and the granularity of confinement offered is relatively crude compared to what a corporate or home user might require for controlled execution of applets. To date, there is no indication of how to structure a user interface to make confinement of imported software generally understandable even to fairly sophisticated computer users.

Research To Support Widespread Access To Digital Libraries And Government Information And Services

Ben Shneiderman

University of Maryland

The rapid growth of the World Wide Web provides compelling testimony to the impact of improved user interfaces. Although FTP (file transfer protocol), Gopher, WAIS (wide area information service), and other services produced active usage, it was the appearance of easy-to-use embedded menu items and appealing graphics that produced the current intensity of use. Public interest continues to grow dramatically, and national policy is being effected in terms of providing access to government information and services.

Early adopters, who are typically technologically sophisticated, are highly motivated to overcome poor designs and push beyond the difficulties to achieve their objectives. However, the much larger number of middle and late adopters are less likely to tolerate chaotic screens; unnecessarily lengthy paths; slow response times; inconsistent terminology; awkward instructions; inadequate help facilities; and missing, wrong, or out-of-date information.

A proactive approach can ensure that the emerging technology will provide accessible, comprehensible, predictable interfaces that serve the needs of the majority. A prompt and moderate level of research effort can shape the evolution of user interfaces to match the skills, needs, and orientation of the broadest users. Topics might include the following:

•	Cognitive design strategies for information-abundant Web sites, including metaphor choice (library, shopping mall, television channels, etc.), navigation design, and visual overviews;
•	Recognition and support for the distinct needs of diverse user communities, such as elderly, young, handicapped, lower-income, minority, and rural users, plus those with poor reading skills;
•	Control panels to allow user tailoring to individual abilities, limitations, and technology;
•	Strategies to cope with efficient construction and maintenance of text and graphic versions, multiple browser support, varied user display devices, and voice output;

Page 373 Cite

Suggested Citation:"On Functions." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 373

•	Empirical studies of high- versus low-fanout strategies (shallow versus deep trees), compact vertical design to reduce scrolling, benefits of reduced/increased graphical treatments, and impact of slow response time;
•	Web site construction languages and templates, software tools to verify visual and textual consistency, Web site management and terminology control, and thesaurus construction;
•	Sequencing, clustering, and emphasis of information items according to designer goals;
•	Web-oriented user interface design to support browsing directories, searching for key phrases in document databases, and performing database searches;
•	Design strategies to support evolutionary learning of complex sites and services;
•	Easy-to-use facilities to permit user construction of informational Web sites, community services, and entrepreneurial initiatives;
•	Low-cost computing devices and low-cost network access the "Web-top computer";
•	Refined feedback and evaluation methods to guide designers, including usability testing, expert reviews, field trials, interviewing users, focus groups, e-mail surveys, and e-mail suggestion boxes;
•	Simple privacy protection and secure transmission of financial, medical, or other data;
•	Image compression methods to reduce file sizes while best preserving image detail, texture, and color richness; and
•	Logging and monitoring software, visualization of usage patterns for individuals and aggregates, and cost-benefit analyses.

Coordination with relevant groups can avoid redundant efforts and support common goals. Current activities include the following:

•	Library of Congress National Digital Library Program, in cooperation with the University of Maryland (ben@cs.umd.edu);
•	National Research Council project on ordinary-citizen interfaces (Alan Biermann, Chair, Duke University, awb@cs.duke.edu)[the project reported on in this volume].
•	Stanford University effort to coordinate database services (contact: Hector Garcia-Molina, hector@cs.stanford.edu);
•	U.S. government efforts such as GILS (Government Information Locator Service);
•	USACM project: The Interface Between Policy and Technology in Providing Public Access to Government Data (contact: Randy Bush, randy@psg.com);

Page 374 Cite

Suggested Citation:"On Functions." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 374

•	Joint effort on digital libraries by the National Science Foundation, National Aeronautics and Space Administration, and Defense Advanced Research Projects Agency; and
•	International efforts (e.g., Canada, Singapore, Italy).