Read "More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure" at NAP.edu

Page 243 Cite

Suggested Citation:"Position Papers: On Interface Specifics." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 243

On Interface Specifics

An Embedded, Invisible, Every-Citizen Interface

Mark Weiser

Xerox Palo Alto Research Center

The nation's information infrastructure is a vast, loosely connected network of informing resources found mostly in people's everyday lives. When considering interfaces to new electronic information sources, and especially when replacing old information sources with new electronic sources, it is crucial to consider how the existing infrastructure really works. Two examples will help.

Consider how you would find a grocery store in a new town. How do you solve this problem on first driving in? Most likely, by looking around, watching the people and the streets, and making a couple of guesses, you could find one in no time. The information infrastructure is everyday knowledge of towns (including economic and practical constraints on layout, walking distances, etc., that are embedded in that knowledge), and physical clues that map that general knowledge into this particular town. Information infrastructure can be physical (see www.itp.tsoa.nyu.edu/˜review/current/focus2/open00.html).

More conventionally, our national information infrastructure today includes tens of thousands of public and school libraries all across the country. These libraries are in nearly every elementary, junior high, and

Page 244 Cite

Suggested Citation:"Position Papers: On Interface Specifics." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 244

high school; and they are in nearly every community, even the very small. Of course, many of these libraries are connected to the Internet. But it is very important to consider the other resources provided by these libraries. Thirty-five percent of all library visitors never use the catalog, and 12 percent use no library materials at all, but bring in their own materials. Clearly, libraries do something more than just supply data that could be gotten over the Web (see www.ubiq.com/weiser/SituationalAspectsofElectronicLibraries.html).

As the above examples illustrate, the existing information infrastructure often functions without calling itself to our attention. It stays out of sight, effectively not even noticed. So the first challenge for every-citizen interface is to be invisible (what I have called ''calm technology" elsewhere; see www.ubiq.com/weiser/calmtech/calmtech.htm).

As the above examples also illustrate, the existing information infrastructure is extremely widespread, found in every nook and cranny of our lives. The second challenge for the every-citizen interface is to be ubiquitous (see www.ubiq.com/ubicomp).

Finally, not addressed by the above examples but presumably clear to everyone, the current Internet is just the beginning. I like to think of it by analogy to television channels. Once upon a time we fretted about how we would manage a TV with 500 channels. How could we ever view them all? The Internet will give us 5 billion channels, one for every person on the planet-only about 30 million so far, but more are coming. And soon these channels will be multimedia, multiway video and sound using the Mbone. This kind of interconnection is a deep technical challenge to the current Web infrastructure, which cannot begin to support even a few multiway Mbone connections, much less 5 billion. I consider this to be a user interface issue because it is just this infrastructure that opens up the Web to use by anyone who can point a camera or talk on the phone. The third challenge for the every-citizen interface is to support billions of multiway real-time interactive connections.

Of these three challenges I believe that the first is currently the most promising of progress, the one most susceptible to interdisciplinary attack, and the one least well addressed by existing projects. How does a technology become invisible? To some degree, anything can, given enough practice. Invisibility is a property of people, technology, and the situation in which we find ourselves (a tiger invisible in the tall grass stands out at the zoo).

Some suggested challenges for developing a "science of invisibility" for a every citizen interface are as follows:

•	Human information processing includes operations at many different levels, the vast majority of them invisible to our conscious thought

Page 245 Cite

Suggested Citation:"Position Papers: On Interface Specifics." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 245

	at any given moment. As we learn a skill, operations that formerly required attention ("turn the steering wheel") become automatic and are replaced by higher level directions ("turn left," "drive to Cincinnati''). Invisible interfaces are those that connect "below" the conscious level, through both learning and natural affinity. What computer interfaces are most appropriate for coupling into a large amount of unconscious information processing? Which ones take a long time to learn but are worth the effort (analogous perhaps to piano playing)? Which ones fit our brain's affinity for information (information browsing as a walk in the woods)?
•	The difference between something being effectively invisible because it is being processed below conscious thought and something being managed for us (e.g., by a computerized agent) is profound. A key advantage of effective invisibility is the quick refocus from peripheral inattention to center attention. For instance, while ordinarily unaware of engine noises in our car, we suddenly become very aware if the noise should change unexpectedly. We can then focus our attention on the noise and make decisions about its danger, the distance to the nearest expressway exit, what lane to be in, and so on. (A silent car with an intelligent agent monitoring engine condition would keep us from any knowledge at all.) Which computer interfaces do well at keeping something invisible most of the time, but allowing quick recentering when appropriate? Which interfaces let the same information be either in the center of our attention or in the periphery without even clicking a button but simply changing our attention?
•	The concept of an intelligent agent can be a very powerful one if it does not take over the function of human judgment and our ability to control the focus of our attention. Can we design intelligent agents in our computers that preserve our ability to refocus? If something has been taken over for me, is there a presentation of what has been taken over that I can bring to the fore whenever I like, including retroactively? Can I have agents that filter for me without losing all of the context of the information after the filter? For instance, if I use a computerized newspaper clipping service, can it show me one or two lines of articles that were physically near the ones it clipped for me in the physical newspaper? What kind of context helps, and what doesn't help, when dealing with a computerized agent?

Page 246 Cite

Suggested Citation:"Position Papers: On Interface Specifics." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 246

Intelligent Multimedia Interfaces For "Each" Citizen

Mark T. Maybury

Mitre Corporation

Future interfaces will take advantage of knowledge of the user, task, and discourse and exploit multiple input/output media and associated human sensory modalities to enable intuitive access to information, tools, and other people (see Figure 1). The more effectively computers can process heterogeneous information and automatically acquire knowledge about users, the more efficient they will become at supporting users' frequently changing tasks, particularly information-seeking ones. Information-seeking tasks range from directed search to casual browsing, from looking up facts to predicting trends. Each of these goals can be achieved more or less effectively by the content, form (i.e., media), and environment that support the user. Our emphasis at Mitre Corporation has been on investigating technologies and tools that enable more effective and efficient information interfaces for a variety of application areas, including command and control, intelligence analysis, and education and training. As a consequence of our experience, we believe we should aim not to

FIGURE 1 Intelligent interfaces.

Page 247 Cite

Suggested Citation:"Position Papers: On Interface Specifics." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 247

build a one-of-a-kind interface for every citizen, but rather common interfaces that can be tailored to each citizen in accordance with his or her goals, knowledge, skills, and preferences.

Challenges

Achieving an intelligent interface vision requires addressing some fundamental technology limitations, including:

•	Lack of a scientific approach to device and user interface design, development, and evaluation.
•	Lack of interface standards that make it easy to pull out one device and plug in a similar one.
•	Lack of general mechanisms for (1) interpreting input from multiple devices (e.g., mouse gesture and speech, as in "put that there ‹click›") and (2) automatic generation of coordinated multimedia output.
•	Lack of general mechanisms for constructing dialogue-based interaction that supports error detection and correction, user models, and discourse models to ensure tailored and robust communication.
•	Few tools or procedures that facilitate porting language-enabled interfaces to new domains (and/or languages); it remains a time-consuming and knowledge-intensive task.

We believe there exist fundamental tasks associated with communication that underlie all interface activities. These can be viewed as a hierarchy of communicative acts, which can be formalized and computationally implemented and by their nature can be realized in multiple modalities (e.g., linguistic/auditory, gestural, visual). The choice among modalities itself is an important, knowledge-based task, as is the broader task of presentation planning.

In supporting information access, our efforts have focused on multimedia analysis (in particular, message understanding and video analysis), including its segmentation into atomic units, extraction of objects, facts and relationships, and summarization into compact form. New requirements have arisen (e.g., resulting in multimedia query mechanisms that raise issues such as how to integrate visual and linguistic queries). Multimedia information-seeking tasks remain perhaps the most important but least well understood area. We believe that careful experimental design, use of instrumented environments, and task-based evaluation (e.g., measuring at least time and accuracy (false positives, false negatives)) will yield new insights.

Page 248 Cite

Suggested Citation:"Position Papers: On Interface Specifics." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 248

TABLE 1 Research Recommendations

Area	State of the Art	Near-Term Research	Long-Term Research
Text processing	Commercial named entity extraction (SRA, BBN); many hand-crafted, domain-specific systems for event extraction; large cost to port to new domains; incremental sentence generation, limited document generation.	Demonstrate portability of TIPSTER technology to support multilingual information extraction and spoken language; incremental text generation; text summarization; topic detection and tracking.	Scaleable, trainable, portable algorithms; document-length text generation.
Speech processing	Commercial small-vocabulary recognizers (Corona, HARK); large-vocabulary (40,000+ words) recognizers exist in research labs (BBN, SRI, Cambridge University).	Speaker, language, and topic identification; prosodic analysis; natural-sounding synthesis.	Large-vocabulary, speaker-independent systems for speech-enabled interfaces; large-vocabulary systems for video and radio transcription, for example.
Graphics processing	Graphical User Interface Toolkits (e.g., object-oriented, reusable window elements such as menus, dialogue boxes).	Tools for automated creation of graphical user interface elements; limited research prototypes of automated graphics design.	Automated, model-based creation and tailoring of graphical user interfaces.
Image/video processing	Color, shape, texture-based indexing and query of imagery.	Motion-based indexing of imagery and video; video segmentation.	Visual information indexing and extraction (e.g., human behavior from video).
Gesture processing	Two-dimensional mice; eyetrackers; tethered body-motion tracking.	Tetherless, three-dimensional gesture, including hand, head, eye, and body-motion tracking.	Intentional understanding of gesture; cross-media correlation (with text and speech processing); facial and body gesture recognition.

Page 249 Cite

Suggested Citation:"Position Papers: On Interface Specifics." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 249

Multimedia integration	Limited prototypes in research and government.	Content selection, media allocation, media coordination, media realization for multimedia generation.	Multimedia and multimodal analysis; multimedia and multimodal generation; investigation of less-examined senses (e.g., taction, olfaction).
Discourse modeling	Limited prototypes in research and government.	Error handling (ill-formed and incomplete input/output), two-party conversational model, discourse annotation schemes, discourse data collection and annotation, conversation tracking.	Context tracking/dialogue management; multiuser conversation tracking, annotation standards; model-based conversational interaction.
User modeling	Fragile research prototypes available from academia; one-user modeling shell (BGP-MS).	Track user focus and skill level to interact at appropriate level; empirical studies in broad range of tasks in multiple media.	Hybrid stereotypical/personalized and symbolic/statistical user models.
Visualization	Some commercial tools (e.g., NetMap), text-based, limited semantics; computationally intensive, often difficult to use.	Improve information access interfaces; visualization generation from extracted (semantic) information; automated graphical encoding of information properties.	Multidimensional visualization; multimedia (e.g., text, audio, video) visualization.
Collaboration tools	Multipoint video, audio, imagery; e-mail-based routing of tasks.	Instrument environments for data collection and experimentation; multiparty collaborative communication; investigate asynchronous, distant collaboration (e.g., virtual learning spaces).	Field experiments to predict impact of collaborative technology on current work processes; tools for automated analysis of video session recordings; flexible, workflow automation.
Intelligent agents	Agent communication (e.g., KQML) and exchange languages (e.g., KIF).	Mediation tools for heterogeneous distributed access.	Shared ontologies; agent integration architectures and/or control languages; agent negotiation.

Page 250 Cite

Suggested Citation:"Position Papers: On Interface Specifics." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 250

A Plan Of Action

There are several important recent developments that promise to enable new common facilities to be shared to create more powerful interfaces. A strategy to move forward should include:

•	Creating architectures and services of an advanced interface server that are defined in the short term using open standard distributed object middleware, namely the Object Management Group's Common Object Request Broker Architecture (CORBA), and that investigate higher-risk architectures, such as agent-based communication and coordination.
•	Fostering interdisciplinary focused science that investigates the nature of multiple modalities, with an aim to understanding the principles of multiple modalities in order to provide insight into such tasks as multimedia interpretation and the generation of coordinated multimedia output.
•	Utilizing, refining, integrating, and extending (to additional media) existing architectures for single media, including (1) the TIPSTER architecture (for document detection and information extraction) and associated tag standards (e.g., Penn Treebank part-of-speech tags, proper name tags, coreference annotations) for language source markup and (2) leverage evolving applications programming interface (API) standards in the spoken language industry (e.g., SRAPI, SAPI).
•	Via an interdisciplinary process, defining common interface tasks and associated evaluation metrics and methods; creating a multimedia corpora and associated markup standards; and fostering interdisciplinary algorithm design, implementation, and evaluation.
•	Fostering emerging user modeling shells (e.g., BGP-MS) and standards.
•	Focusing on creation of theoretically neutral discourse modeling shells.
•	Applying these facilities in an evolutionary fashion to improve existing interfaces, supporting a migration from "dumb" to "smart" interfaces (S. Bayer, personal communication, 1996).
•	Performing task-based, community-wide evaluation to guide subsequent research, measuring functional improvements (e.g., task completion time, accuracy, quality).

Because they affect all who interact with computers, user interfaces are perhaps the single area of computing that can most radically alter the ease, efficiency, and effectiveness of human-computer interactions.

Page 251 Cite

Suggested Citation:"Position Papers: On Interface Specifics." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 251

Recommended Research

Table 1 indicates several functional requirements and associated key technologies that need to be investigated to enable a new generation of human-computer interaction, indicating near-term and far-term research investment recommendations. Key areas for research include:

•	processing and integrating various input/output media (e.g., text, speech and nonspeech audio, imagery, video, graphics, gesture);
•	methods to acquire, represent, maintain, and exploit models of user, discourse, and media; and
•	mechanisms that can provide information visualization, support multiuser collaboration, and intelligent agents.

Reference

Maybury, M.T. (Ed.) 1993. Intelligent Multimedia Interfaces. Menlo Park, Calif: AAAI/MIT Press.

Page 252 Cite

Suggested Citation:"Position Papers: On Interface Specifics." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 252

Interfaces For Understanding

Nathan Shedroff

vivid studios

Over the next 15 years the issues facing interface designers, engineers, programmers, and researchers will become increasingly complex and push farther into currently abstract and, perhaps, esoteric realms. However, we are not without guidance and direction to follow. Our experiences as humans and what little history we have with machines can lead us toward our goals.

Computers and related devices in the future will need to exhibit many of the following qualities:

•	Be more aware of themselves (who and what they are, who they "belong" to, their relationships to other systems, their autonomy, and their capabilities).
•	Be more aware of their surroundings and audiences (who is there; how many people are present or around; who to "listen" to; where and how to find and contact people for help or to follow directions; who is a "regular"; how to adapt to different people's preferences, needs, goals, skills, interests, etc.).
•	Offer more help and guidance when needed.
•	Be more autonomous when needed.
•	Be more able to help build knowledge as opposed to merely process data.
•	Be more capable of displaying information in richer forms-both visually and auditorially.
•	Be more integrated into a participant's work flow or entertainment process.
•	Be more integrated with other media-especially traditional media like broadcast and print.

Funding for research and development, therefore, should concentrate on these issues and their related hardware, software, and understandings. These include research into the following:

•	Display and visualization systems (high-resolution, portable, and low-power displays; HDTV (high-definition television) and standards in related display industries; integration with input/output devices such as

Page 253 Cite

Suggested Citation:"Position Papers: On Interface Specifics." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 253

	scanners, pointing devices, and printers; fast processing systems for n-dimensional data models; standards for these models, display hardware, and software; software capable of easily configuring and experimenting with visualizations and simulations; etc.).
•	Perceptual systems (proximity, sounds, motion, and electronic "eyes" for identification and awareness; standards for formating instructions and specifications to help systems "understand" what they are, what is around them, what and who they can communicate with, and what they are capable of; facilities for obtaining help when necessary; ways of identifying participants by their behavior, gestures, or other attributes; etc.).
•	Communications systems (standards, hardware, and software to help participants communicate better with each other-as well as with computers; natural-language interfaces-spoken and written-and translation systems to widen the opportunities of involvement to more people; hardware and software solutions for increasing bandwidth and improving the reliability, security, privacy, and scalability of existing communications infrastructure; etc.).
•	Understanding of understanding (information and knowledge-building applications; understandings about how people create context and meaning, transform data into information, create knowledge for themselves, and build wisdom; software to help facilitate these processes; standards to help transmit and share information and knowledge with connections intact; etc.).
•	Understanding of interaction (a wider definition of interaction used in the "industry," how participants define and perceive "interactivity"; what they expect and need in interactivity; historical examples of interaction; lessons from theater, storytelling, conversation, improvisation, the performing arts, and the entertainment industry; etc.).
•	Increased education (of both participants and audiences, as well as professionals, and the industry).
•	Better resources for understanding cultural diversity (in terms of gestures, languages, perceptions, and needs of different age, gender, cultural, and nationality groups).

In addition, there are some procedural approaches to these undertakings that can help the overall outcomes to be more valuable:

•	Reduced duplication of research and development by government-sponsored grants and institutions (requiring the disclosure, sharing, and reporting of research efforts, problems, and solutions).
•	More means of coordination and knowledge sharing of research and development scholars and professionals (whether government-sponsored or not).

Page 254 Cite

Suggested Citation:"Position Papers: On Interface Specifics." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 254

•	More adventurous spending in "esoteric" research (such as the nature of understanding and the meaning of "interactivity").
•	Grant proposal templates and procedures that are easier to complete (simplified and clarified requirements, less paperwork, less process-specific "inside" information that requires professional grant writers so that more adventurous people can apply more easily).

Where Did The "User" Go

Although the word "user" is, admittedly, easy to say and use and has some history, it is important that our understanding of those using computers is broadened to emphasize growing participatory aspects of computer use. While historically, people input, managed the processing, and output data and information, the building of knowledge requires more participation and interaction of the type most closely experienced with other people. People are becoming active audiences and participants instead of merely users. They are increasingly communicating with others and creating meaningful things rather than merely "viewing" and watching.

The next 100 million computer users (who may begin using computers over the next 3 years) have different needs and understandings than current users. Their needs are different not because of their capabilities (all of these people are capable of learning to use existing systems) but mostly because of differences in their perceptions, interests, and understandings of computers. One important reason why these people are not now buying or making use of current computers is that, in their minds, computers don't do much that they are interested in doing. Existing computers are not capable of or equipped for helping these people enjoy, expand, or make meaning of their lives. This is the reason why home computer sales have traditionally been dismal and are currently confined to home-office purchases and for kids' educations. Fifteen years ago the best use that computer marketers could come up with for people to buy their own computers was to balance their checkbooks and store recipes. Today, while computers have evolved significantly, many people's perceptions have not, and they understand precious few reasons why computers might enhance the lives of this next "user group." Part of this is an education issue (and, perhaps, a marketing one), but mostly it is a failure of computer systems (hardware and software) to respond to the needs and interests of the general public.

The interface starts much before a computer is turned on. Consider an analogy to shopping. The shopping experience does not start the moment a transaction is made (perhaps an item is bought or ordered). It doesn't even start when someone walks into a store or browses a catalog.

Page 255 Cite

Suggested Citation:"Position Papers: On Interface Specifics." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 255

The shopping experience starts when people perceive the need for something, at least, and often before they encounter others shopping, products and services they do not currently need, and even celebrity athletes sporting brand names on their outfits. Likewise, the interface to a computer begins at the fulfillment of life needs and interests and the education of the participants about the capabilities and possibilities of computers and interfaces. Automobiles are often held up as examples of easy-to-learn, universal interfaces, but in reality they are neither. They take months-sometimes years-to master, are not standardized, and are sometimes never learned sufficiently well. Yet our understanding of driving a car and its fulfillment of our needs make us persevere.

What Is A Computer

What I mean when I use the word ''computer" is a specific device for processing, storing, and transmitting information, aiding the building of knowledge, and/or facilitating communications more sophisticated than a current telephone. To be sure, many objects around us will evolve to be more sophisticated and many already include computers whether this is evident to their users or not. To what extent computers will disappear as distinct devices is not a question that can yet be answered. However, it does not really matter either. The needs and interests of people have changed very little over the past 100 years and will likely change only slightly over the next 15 to 20. Most people will still need to work, create, love, interact, communicate, and be entertained (as well as entertain each other). Interfaces should concentrate on the activities, not the technologies-nor should they be immediately concerned with the nature of a device itself (is it distinct or embedded?). These interfaces may show up in computers, televisions, telephones, door knobs, or devices not yet invented. What will remain fairly constant, however, are the needs themselves.

What Is Interactivity

When I use the word "interactive" I do not mean what has become the standard industry definition of dynamic media or the ability to make choices when using computer programs. Most "interactive media" is nothing more than multimedia presentations (usually with video and animation) with the ability to click to the next screen of material in a nonlinear way. In this sense interactivity has become bad television where the audience must click for more in order to keep the stream coming. To me interactivity is much richer and includes the abilities to create, share, and communicate rather than merely watch. Interactive experiences should change over time and between different people. Sadly, few products

Page 256 Cite

Suggested Citation:"Position Papers: On Interface Specifics." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 256

or experiences do this now, which is the main reason why the CD-ROM industry fell apart over the past few years (the products offered little to do that was interesting).

However, this is merely the starting point. As an industry (academics and professionals alike), we understand this word "interactivity" very little and need to explore greatly what it means to people, what it can be, and how to create it. This is one of the points that grants and funding can apply. Unfortunately, the commercial end of the interactive media industry offers little chance of exploring and experimenting with the whole notion of interactivity, as the demands of an overhyped market, skyrocketing costs, too much publicity, and too many expectations prevent most companies from asking these questions. Likewise, on the academic end of the spectrum, demands to produce work-ready students, lack of interdisciplinary programs, and the history of computer science studies (emphasis on software, programming, engineering, and computer languages) prevent students and professors from asking these questions because they seem esoteric and "light" in the phase of other research.

About the only people who are explicitly trained in the skills of interaction are those in the performing arts: dancers, actors, singers, comedians, improvisational actors, and musicians. However, these fields are hardly seen as complementary or valid courses of study in computer science, multimedia, and even design programs. Yet the experience and knowledge that performers can bring to these disciplines are exactly the answers to the questions that should be asked. Grants for programs that try to explore these issues with the help of many different disciplines would help speed the development of answers badly needed in this industry.

Computers That Are Aware

Interfaces need to become more aware of themselves and those around them. This is true in both a physical sense (where am I, where are you, and who else is here?) and a cognitive sense (who am I, what can I do, and how can I communicate with others?). While computers won't have truly "cognitive" capabilities for a long time-if ever-they already have a few elements of these capabilities and information and need even more. What features these capabilities will eventually create are mostly unpredictable right now, but we can count on the facilities to respond to people in a more adaptable and individual manner to make a major improvement in interfaces. Developing processes, standards, and technologies to build these capabilities upon will prove mandatory.

Other technologies that will be needed to develop these more intuitive and adaptive interfaces include perceptual technologies to support

Page 257 Cite

Suggested Citation:"Position Papers: On Interface Specifics." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 257

computer perception in sound, vision, touch, gesture, environment, temperature, airborne particles, and so forth.

Interfaces To Knowledge

Most interfaces and applications today have sped the transmission, storage, and processing of data but have hardly changed the accumulation, creation, or quantity of either information or knowledge. Certainly, we cannot say that computers have made us more "wise," but the interactions computers offer do give us more chances to communicate our thoughts and build wisdom if we only knew better how to. Our understanding of knowledge and wisdom and its processes is inadequate but also critical to our continued development as a culture and a species. Research into the components of these processes, of our minds, and of our thoughts is needed to advance not only our tools-of which the computer is one of our best-but ourselves. This research is needed not only in terms of software and hardware (perhaps finding form in file formats, applications, operating systems, and products), but in the underlying processes and understandings of how we think-upon which all of the aforementioned are based. This must be coordinated with the fields of education, psychology, and communications, as well as computer science. It may even be helpful-and necessary-to include those in philosophy.

These are the most esoteric and unpredictable of questions-indeed, they have kept us busy for our entire histories-but this should not deter us from seeking their answers. Even if we will never truly answer these questions completely, each part of the answer gives us new insights into building more valuable interfaces that meet more of our needs.

Another aspect of interfaces that facilitate knowledge are the technologies involved with representing and displaying data and information. Present tools commonly available on the market such as spreadsheets, word processors, databases, and graphics programs are hardly adequate for representing or visualizing complex relationshipships and informing communications. The hardware required for better-performing visualization systems includes displays that are high resolution, portable, and low power so that they are more easily used where needed. Standards for evolved displays will need to be established, adopted, and made prevalent so that engineers, programmers, and audiences can come to count on their capabilities and availability. Integration with input/output devices such as scanners, pointing devices, and printers will need to be advanced as well. Devices that enable more direct interaction between display and control are more learnable and more evident to use-in essence, an evolution of what is commonly understood as direct manipulation

Page 258 Cite

Suggested Citation:"Position Papers: On Interface Specifics." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 258

today. Systems for processing and working with these devices will need to rely on more powerful and faster CPUs (central processing units) and hardware as well.

Software for representing and manipulating more complex data visualizations will need to build on new understandings of how people think and work with data. These applications will need to explore new paradigms in representation and manipulation in order to offer the kinds of flexibility and understandability required by more complex processing and less "professional" audiences.

Interfaces Across Media

Where possible, new interfaces should translate well across media and devices, whether portable, stationary, shared, public, private, networked, or personal, and should strive to encompass and integrate traditional media, such as broadcast and print media, where possible-not merely electronic and on-line media. This is not to say that interfaces shouldn't take advantage of their unique capabilities-indeed, they need to do so more than currently-but they also need to relate to each other where possible and it should be recognized that printed interfaces such as newspapers, classifieds, catalogs, documentation, and directories are in just as dire need of evolution as technological ones. I am certainly not calling for computer screens to look like little notebooks of paper with spiral binds nor print paper to look like current computer interfaces with pull-down menus. However, our interfaces to printed information and knowledge have evolved little (outside of stylistic appearances) in the past 100 years, and, since this still represents a huge proportion of information dissemination and interaction (and will likely continue to be), some funding for the evolution of these critical interfaces should be allocated.

Interfaces Across Cultures

As interfaces become more complex and deal with more abstract issues, how they address people from different backgrounds and cultures will become more critical. We have been able to achieve a certain amount of standardization and utilization so far with present interfaces, but this is due mainly to the nature of tasks currently completed with computers. As computers become more involved with knowledge building, communications, and community and as interfaces facilitate these more social purposes, they will need to address how differences between people change their understanding of how to use these devices and of the processes themselves. These differences may be based on age, gender, culture,

Page 259 Cite

Suggested Citation:"Position Papers: On Interface Specifics." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 259

language, or nationality. Interfaces in the future will not have the luxury of requiring the same amount of capitulation on the part of the audience since the next level of computer users (the next 100 million users) will not be as willing to change their approach to problems and their interaction with devices as the enthusiasts and professionals who comprise the present base of computer users. Issues of language, gesture, understanding, privacy, approach, civility, and "life" are not consistent throughout the world-and wonderfully so-and must be discovered and documented. Also, systems, standards, and interfaces must be developed that are sensitive to these differences. Lastly, the knowledge of these difference must be made available to researchers and developers.

Automatic language translation is one of the most critical-and most difficult-problems to solve. It is such a complex problem that it is probably not solvable by conventional programming means. Efforts to "grow" or "evolve" complex software for pattern recognition and processing are probably the best hope for tackling problems of this complexity, and it will probably require several efforts in coordination.

Lastly, greater education is needed to inform researchers, professionals, participants, and the industry of these issues, their importance, the state of their progress, and their details. We cannot merely rely on the media to inform people about computers or their capabilities since the messages usually get dissolved to the lowest common denominator-but one of cynical expectations and far lower than actual capabilities to understand. Movies, news, books, and other instruments of culture often create unrealistic understandings, expectations, and often fears of computers and their uses. We must address these and reverse them ourselves-as no one else will-if we expect future interfaces for everybody to be more effective.

Page 260 Cite

Suggested Citation:"Position Papers: On Interface Specifics." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 260

Interspace And An Every-Citizen Interface To The National Information Infrastructure

Terry Winograd

Stanford University

With the sudden emergence of widely used Internet-centered applications, it has become glaringly obvious that the computer is not a machine whose main purpose is for a person to pursue a task. The computer (with its attendant peripherals and networks) is a machine for communicating all kinds of information in all kinds of media, with layers of structuring and interaction that could not be provided by traditional print, graphic, or broadcast media.

The traditional idea of "interface" implies that we are focusing on two entities-the person and the machine-and on the space that lies between them. But a more realistic view recognizes the centrality of an "interspace" that is inhabited by multiple people, workstations, servers, and other devices in a complex web of interactions. The hardest task in creating an every-citizen interface will be the design of appropriate theories, models, and representations to do justice to the potential richness of this interaction space.

As a simple example, consider the Web. Today we talk of surfing from place to place on the Web, touching home pages, and following links. These metaphors of spatial locomotion are engaging. They opened up new ways of thinking and doing that had not been explored in the predecessor desktop metaphor. But the Web is not the answer to the future of interfaces any more than the desktop was in its day. Each metaphor is another stepping stone, which in turn creates a way of thinking that creates a blindness to new possibilities.

New research and development activities need to enhance our ability to understand, analyze, and create interaction spaces. The work will be rooted in disciplines that focus on people and communication-such as psychology, communications, graphic design, and linguistics-as well as in disciplines that support computing and communications technologies. The work will start from an assumption that a computer system provides a shared space for multiple people, each in a personal and organizational background that shapes and guides interaction with others. The systems we can build based on new research will support communications structures at all levels-from

Page 261 Cite

Suggested Citation:"Position Papers: On Interface Specifics." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 261

the generic document structuring of the Web to highly task-specific interactions, like those that go on in an airport control tower.

Some sample areas for research are discussed below.

Collaboration Structures

There is a body of research on the structure of collaborative work, sponsored under previous National Science Foundation (NSF) initiatives on collaboration and developed by commercial software developers under labels such as "workflow." The current state of the art can be described as having a large "hole in the middle." At the highest level there are very general (and hence very abstract) theories of how people get work done through communication. At the low level, there are thousands, even millions, of specialized applications-from the order system at a fast food restaurant to the NSF proposal application process-that support organized group activities. But we have not yet developed the conceptual and computational tools to make it easy to bring collaboration into the mainstream of applications. When I work with my research group on a joint paper, we use sophisticated word processors, graphics programs, and the like, but our coordination is based on generic e-mail and calendars and often fails to help us at the places where breakdowns occur.

Semantic Alignment

Whenever two people talk, they have only an approximate understanding of each other. When they speak the same language, share intellectual assumptions, and have common backgrounds and training, the alignment may be quite close. As these diverge, there is an increasing need to put effort into constant calibration and readjusting of interpretations. Ordinary language freezes meanings into words and phrases, which then can be "misinterpreted" (or at least differently interpreted).

This problem shows up at every level of computer systems, whenever information is being represented in a well-specified syntax and vocabulary. Even simple databases have this problem. If information is being shared between two company databases that have a table for "employee," they are apparently in alignment. But if one was created for facilities planning and the other for tax accounting, they may not agree on the status of part-time, off-site, on-site contract, or other such "employees." This difference may be nowhere explicit in what is stored on the computer but is a matter of background and context.

Ubiquitous networking is leading us to the point where every computer system supports communication and where every term we use will

Page 262 Cite

Suggested Citation:"Position Papers: On Interface Specifics." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 262

be seen and hence interpreted by others. There are traditional philosophical and linguistic approaches to making sure we have "common understanding," but these tend to be based on highly abstract or idealized examples and settings. We need to develop new theoretical foundations for talking about the kinds of "semantic approximations" that are needed for something as apparently simple as sharing data between two databases and as ubiquitous as the nodes of the Internet.

Building New "Virtualities"

In designing new systems and applications, we are not simply providing better tools for working with the objects in the previously existing world. Computer systems and software are a medium for the creation of virtualities-the worlds in which users of the software perceive, act, and respond to experiences. Software is not just a device with which the user interacts; it is also the generator of a space in which the user lives. Software design is like architecture: when an architect designs a home or an office building, a structure is being specified. More significantly, though, the patterns of life for its inhabitants are being shaped. People are thought of as inhabitants rather than as users of buildings.

The creation of a virtual world is immediately evident in computer games, which dramatically engage the player in exploring the vast reaches of space, fighting off the villains, finding the treasures-actively living in whatever worlds the game designer can imagine and portray. But the creation of worlds is not limited to game designers. There is also a virtual world in a desktop interface, in a spreadsheet, and in a use of the World Wide Web. Researchers in human-computer interfaces have used other terms, such as conceptual model, cognitive model, user's model, interface metaphor, user illusion, virtuality, and ontology, all carrying the connotation that a space of existence, rather than a set of devices or images, is being designed. The term virtuality highlights the perspective that the world is virtual, in a space that is neither a mental construct of the user or a mental construct of the designer.

Today, we are all familiar with the virtuality of the standard graphical user interface, with its windows, icons, folders, and the like. Although these virtual objects are loosely grounded in analogies with the physical world, they exist in a unique world of their own, with its special logic and potentials for action by the user. The underlying programs manipulate disk sectors, network addresses, data caches, and program segments. These underpinnings do not appear-at least when things are working normally-in the desktop virtuality in which the user works.

There is little theoretical grounding today on which to base the design of new virtualities. Obviously, there are considerations from psychology

Page 263 Cite

Suggested Citation:"Position Papers: On Interface Specifics." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 263

about how people perceive new kind of objects and activities. There are philosophical discourses about how we divide the world up into constituent things and properties and how we can formulate our interactions with them. There is also a more common-sense level of understanding how people think about familiar domains and how their expectations from their experiences in life will shape their interactions with computer systems.

As a simple example, consider the three primary modes of interacting with a virtuality that are learned by every normal person in infancy:

•	Manipulation. Perceiving, grasping, modifying, and controlling objects that are in front of the person. This is a "hand-eye" modality that fills much of our daily life (from the kitchen to the cash register). It is at the heart of the current desktop metaphor and most of what has been done with graphic user interfaces.
•	Locomotion. Observing location and moving from place to place. This is an "eye-foot" modality in which the world is primarily stable and the user moves within it. This is the basis for the Web's place-and-link-following metaphor, as well as many popular computer games (where the manipulative aspects are reduced to "shoot weapon").
•	Conversation. Using a language to communicate with another person in a two-sided discourse. This is an "ear-mouth" modality that comes from our experience in encountering other people. It is the basis for the traditional command-line interface as well as speech interaction with machines.

What can be said at a theoretical level about the nature of these modalities and the problems that arise in using-and especially in mixing-them? Are there other conceptual modalities that are fundamentally different from these three that will be understandable and practical for people to use? How does the finer-grained analysis of interaction structure fit into this larger picture? And, finally, what is the difference in the nature of multiperson activity in these different modalities, and how does that map onto the kinds of multiperson collaboration we want to support in an every-citizen interaction space? (See ‹http://www-pcd.stanford.edu/winograd› and ‹http://www-pcd.stanford.edu/winograd/book.html›.)

Bibliography

On Coordination

Denning, Peter and Pamela Dargan, Action-Centered Design, in Terry Winograd (ed.), Bringing Design to Software, Reading, MA: Addison-Wesley, 1996, pp. 105-120.

Page 264 Cite

Suggested Citation:"Position Papers: On Interface Specifics." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 264

Holt, Anatol, Diplans, A Formalism for Action, ACM Transactions on Office Information Systems 6:2 (April 1988).

Malone, Thomas W. and Kevin Crowston, What Is Coordination Theory and How Can It Help Design Cooperative Work Systems?, MIT CCS report 112, 3402/3183, Cambridge, MA: Massachusetts Institute of Technology, April 1990.

Medina-Mora, Raul, Terry Winograd, Rodrigo Flores, and Fernando Flores, "The Action Workflow Approach to Workflow Management Technology" in The Information Society, Volume 9, Number 4, October-December 1993, p. 391.

Verharen, Egon, Nardo van der Rijst, and Jan Dietz (eds.), Proceedings of the Language/Action Perspective: International Workshop on Communication Modeling, Economisch Institut Tilburg, Tilburg, the Netherlands, 1996. ‹http://infolabwww.kub.nl:2080/infolab/lap96/›

Winograd, Terry (1988), Introduction to the Language/Action Perspective, ACM Transactions on Office Information Systems 6:2 (April 1988), pp. 83-86.

On Semantic Alignment

There are papers on this topic from the point of view of artificial intelligence ("ontology" matching), databases ("schema" matching), and information retrieval ("attribute set" matching). I have not put together a good list or found an integrative article that cuts across them. Some general considerations are presented in the following:

Winograd, Terry and Fernando Flores, Understanding Computers and Cognition: A New Foundation for Design, Norwood, NJ: Ablex, 1986, 220 pp. Paperback issued by Addison-Wesley, 1987.

On the Design of "Virtualities"

Hutchins, Edwin, James Hollan, and Donald Norman, Direct Manipulation Interfaces, in D. Norman and S. Draper (eds.), User-Centered System Design, Hillsdale, NJ.: Erlbaum, 1986, pp. 87-124.

Lakoff, George, and Mark Johnson, Metaphors We Live By, Chicago: University of Chicago Press, 1980.

Winograd, Terry, with John Bennett, Laura De Young, and Bradley Hartfield (Eds.), Bringing Design to Software, Reading, MA: Addison-Wesley, 1996. Available on-line at ‹http://www-pcd.stanford.edu/winograd/book.html›. See, especially, Introduction, Chapter 2 (David Liddle, "Design of the Conceptual Model"), and Chapter 4 (John Rheinfrank and Shelley Evenson, "Design Languages").

Page 265 Cite

Suggested Citation:"Position Papers: On Interface Specifics." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 265

Mobile Access To The Nation's Information Infrastructure

Daniel P. Siewiorek

Carnegie Mellon University

Introduction

The focus of this position paper is mobile access to the nation's information infrastructure (NII). The goal should be to provide ''the right information to the right person at the right place at the right time." In order for the NII to reach its potential, the average person should be able to take advantage of the information on or off the job. Even while at work, many people do not have desks or spend large portions of time away from their desks. Thus, mobile access is the gating technology required to make the NII available at any place at any time.

The next section describes the time rate of change of computer technology, indicating what might be expected in the form of technology from the computer industry as well as defining a new class of computers-the wearable computer. The third section describes the importance of a variety of modalities of interaction with wearable computers. The paper concludes with some research challenges.

Time Rate Of Change Of Computer Technology

Computer systems are typically compared using two classes of metrics: capacity and performance. Capacity is how large a component may be or how much information it may store. Performance is measured in functions per unit of time (often referred to as bandwidth or throughput) or, conversely, the time needed to complete a specific function (referred to as latency). Recently, ease of use has become a major differentiating characterisitic between computer systems and hence represents a third class of metrics.

Because they directly reflect the state of technology, hardware capacity and performance metrics are the easiest to determine or derive. These measures are usually associated with individual components in a computer system. There are six basic functions in a computer system. In addition, attributes such as energy consumption and physical size gain increasing importance as computers become more mobile. Table 1 summarizes

Page 266 Cite

Suggested Citation:"Position Papers: On Interface Specifics." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 266

TABLE 1 Examples of Computer System Capacity and Performance

Components	Units	Workstation	Laptop	Palmtop/Personal Digital Assistant
Processor	Instructions/second	100M	10M	1M
Random access memory	Bytes	64M	8M	1M
Disk memory	Bytes	600M	100M	-
Display	Pixels	1M	0.18M	0.045M
Network communications	Bits/second	10M	2M	0.02M
Distance	Meters	-	100	10
Energy	1/kW	5	100	500
Physical size	(Weight × volume) 1/kg × m3	1	16	2,000

eight metrics for a computer system, including their units of measurement. Capacity is usually measured in the number of information units such as bytes or pixels. Bandwidth/throughput is measured in operations per second for the processor and bits per second for communications. Energy is measured as the reciprocal of kilowatts, while physical size is summarized as the reciprocal of the product of the weight times the volume of space occupied. Notice that for all these metrics the larger the number the better. The three columns in Table 1 include a contemporary workstation (an anchored, unmovable system), a contemporary laptop computer (a luggable system), and a palmtop/personal digital assistant (a portable pocketable system).

Since ease of use is so closely associated with human reaction, it is much more difficult to quantify. There are at least three basic functions related to ease of use: input, output, and information representation. Box 1 summarizes several points for each of these basic functions. Note that, unlike the continuous variables for capacity and performance, the ease-of-use metrics are discrete.

Siewiorek et al. (1982) considered the concept of the computer class. A computer class attempts to integrate many computer system details into an overall evaluation, grouping similarly evaluated systems together. Thus, the workstation, laptop, and palmtop in Table 1 can each be considered representative of a computer class. These researchers also observed that computer classes differ in physical dimensions and price by roughly 1.5 orders of magnitude (e.g., approximately a factor of 30). In addition, it was observed that, as each computer class evolves, new members of the class are expected to have increased capacity and functionality. The increases in technology serve to increase the capacity and functionality of a

Page 267 Cite

Suggested Citation:"Position Papers: On Interface Specifics." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 267

Box 1 Ease-of-Use Metrics
Input	Output	Information representation
Keyboard	Alphanumerical display	Textual
Handwriting recognition	Graphical display	Inoic desktop
Speech recognition	Speech synthesis	Multimedia
Gesturing
Position sensing

class. Thus, the boundary of various attributes can be considered to be increasing with time.

On the other hand, technological changes can be used to initiate new computer classes with the same functionality offered by the next higher class several years before. It is extremely important to remember that all classes of computers have followed approximately the same evolutionary paths as their capacity and functionality have increased. The newer computer classes benefit from the evolutionary process of older classes, adapting to proven concepts quickly, where the older classes required a trial-and-error process. Siewiorek and co-workers also observed that computer classes tend to lag each other by approximately 5 years. Thus, the palmtop computer of today could be considered to have approximately the functionality of a laptop 5 years ago or a workstation 10 years ago. Thus, we can expect the palmtop of the year 2006 to have the attributes of today's workstation.

One can speculate on the emergence of a new class of computers called "wearable computers." Wearable computers will weigh less than a few ounces, operate for months or years on a single battery, and have esthetically pleasing shapes that can adorn various parts of the body. Pagers and electronic watches (complete with calculator and memory to store phone numbers/memos) represent the first examples of the wearable class of computers. Thus, the wearable computer of the year 2006 will have at least the functionality of today's laptop (as depicted in Table 1).

As with the capacity and performance metrics in Table 1, the ease-of-use metrics in Box 1 are also moving out with time. For example, the keyboard with an alphanumerical display using textual information is representative of time-sharing systems of the early 1970s. The keyboard and mouse, graphical output, and iconic desktop are representative of personal computers of the early 1980s. The addition of handwriting recognition

Page 268 Cite

Suggested Citation:"Position Papers: On Interface Specifics." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 268

input, speech synthesis output, and multimedia information is emerging in the early 1990s. It takes approximately one decade to completely assimilate new input, output, and informational representations. By the early part of the next decade, speech recognition, position sensing, and eye tracking should be common inputs. Heads-up projection displays should allow superposition of information onto the user's environment.

Modalities Of Interaction With Wearable Computers

The objective of wearable computer designs is to merge the user's information space with his or her work space. The wearable computer should offer seamless integration of information-processing tools with the existing work environment. To accomplish this, the wearable system must offer functionality in a natural and unobtrusive manner, allowing the user to dedicate all of his or her attention to the task at hand with no distraction provided by the system itself. Conventional methods of interaction, including the keyboard, mouse, joystick, and monitor, all require some fixed physical relationship between user and device, which can considerably reduce the efficiency of the wearable system. Among the most challenging questions facing mobile system designers is that of human interface design. As computing devices move from the desktop to more mobile environments, many conventions of human interfacing must be reconsidered for their effectiveness. How does the mobile system user supply input while performing tasks that preclude use of a keyboard? What layout of visual information most effectively describes system state-or task-related data. To maximize the effectiveness of wearable systems in mobile computing environments, interface design must be carefully matched with user tasks. By constructing mental models of user actions, interface elements may be chosen and tuned to meet the software and hardware requirements of specific procedures.

The efficiency of the human-computer interface is determined by the simplicity and clarity of the mental model suggested by the system. By modeling the actual task as well as the human interface, a linkage can be constructed between user and machine that can be examined to improve the overall efficiency of the wearable system. We begin with the assertion that for wearable systems to be efficient the mental model of the interface design must closely parallel that of the user task; there must be minimal interference or obstruction posed by the computer in completing jobs. Although the number of quantifiable metrics suited for interface evaluation is small, a series of basic observations provide a means for comparison. One characteristic of an application interface is the number of user actions required to perform a given subtask. We define a subtask as an

Page 269 Cite

Suggested Citation:"Position Papers: On Interface Specifics." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 269

TABLE 2 Comparison of Number of Steps to Retrieve Information Using Selection Buttons and Speech

	Buttons/Menu Selection	Speech
Get information	4	1
Get photograph	5	1
Navigate to location	3	2

operation, possibly consisting of multiple inputs, that a user completes in the process of performing a larger coherent task. For example, in the course of performing an inspection, a user might wish to return from his or her present location in an application to the main menu. This subtask may require a single input (perhaps a voice command or an on-screen button) or multiple inputs (backing out through a hierarchy of categories to reach the top or main level). We assert that an application requiring few inputs will allow a user to dedicate more attention to the job at hand, while a larger number of inputs will require more concentration on the computing system. A comparison of equivalent subtasks in two wearable computers (Smailagic and Siewiorek, 1996) is shown in Table 2. The speech recognition engine accepts complex commands that allow some subtasks requiring a series of manual inputs to be executed with a single phrase. However, the response time to a spoken input is longer and the accuracy is lower. For these reasons the quantitative aspect of system latency and accuracy must be factored into evaluations of usability.

Research Challenges

There are several challenges that research must address to make mobile access to the NII effective. Following is a partial list of those challenges:

•

User interface models. What is the appropriate set of metaphors for providing mobile access to information (i.e., the next "desktop" or "spreadsheet")? These metaphors typically take over a decade to develop (i.e., the desktop metaphor started in the early 1970s at Xerox PARC and required over a decade before it was widely available to consumers). Extensive experimentation working with end-user applications will be required. Furthermore, there may be a set of metaphors each tailored to a specific application or a specific information type.

•

Input/output modalities. While several modalities mimicking the

Page 270 Cite

Suggested Citation:"Position Papers: On Interface Specifics." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 270

	input/output capabilities of the human brain have been the subject of computer science research for decades, their accuracy and ease of use (i.e., many current modalities require extensive training periods) are not yet acceptable. Inaccuracies produce user frustrations. In addition, most of these modalities require extensive computing resources, which will not be available in low-weight, low-energy wearable computers. There is room for new, easy-to-use input devices, such as the dial developed at Carnegie Mellon University for list-oriented applications.
•	Quick interface evaluation methodology. Current approaches to evaluating a human computer interface require elaborate procedures and scores of subjects. Such an evaluation may take months and is not appropriate for use during interface design. These evaluation techniques should focus especially on decreasing human errors and frustration.
•	Matched capability with applications. The current thought is that technology should provide the highest performance capability. However, this capability is often unnecessary to complete an application, and fancy enhancements such as full-color graphics require substantial resources and may actually decrease ease of use by generating information overload for the user. For example, one informal survey of display requirements for military planning estimated that 85 percent of the applications could be performed with an alphanumerical display and 10 percent with simple graphics and that only 5 percent required full bitmap graphics. Interface design and evaluation should focus on the most effective means for information access and resist the temptation to provide extra capabilities simply because they are available.

References

Siewiorek, D., C. G. Bell, and A. Newell. 1982. Computer Structures: Principles and Examples, McGraw-Hill, Inc., New York.

Smailagic, Asim, and D.P. Siewiorek. 1996. Modalities of Interaction with CMU Wearable Computers. IEEE Personal Communications, Vol. 3, No. 1, February.

Page 271 Cite

Suggested Citation:"Position Papers: On Interface Specifics." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 271

Ordinary Citizens And The National Information Infrastructure

Bruce Tognazzini

Healtheon Corporation

The original working title for this workshop was "Toward an Ordinary-Citizen Interface to the national Information Infrastructure." It was then altered to "Every-Citizen" to be inclusive of all of our citizens. While I support that change, we did lose something quite important in the transition, for of all the people whose lives have been affected by the computer revolution, perhaps none has received as scant attention as our ordinary citizens.

Today, we face the prospect of millions of have-nots shut out of cyberspace, a threat that has little to do with economic status, country of origin, race, creed, color, or physical ability. Instead, it has everything to do with age, gender, education, culture, aptitude, and attitude. If cyberspace today were to have a dead-honest advertising slogan, it would read: Built by Boys, for Boys!

As Margie Wylie (1995) says: "Far from offering a millennial new world of democracy and equal opportunity, the coming Web of information systems could turn the clock back 50 years for women." The 18- to 39-year-old males with technological talent and above-average intelligence and education who built today's cyberspace built it for themselves. Large parts of it reflect the delicate ambiance of an automobile junkyard. We must make fundamental changes in the direction of computer design if the true have-nots of cyberspace are not to be those rare individuals who do not feel instantly comfortable clattering over mounds of twisted metallic wreckage-in other words, ordinary people.

Somewhere along the line, many technology designers lost track of the real goal: empowering users. From video cassette recorders to clock radios, designers are adding every button, switch, and other power-user doodad they can in the mistaken belief that the true power of technology is to be measured in the number of features and controls rather than the impact on people's lives. Our computer software has tracked this trend. Systems and applications today are festooned with every "wangdoodle" imaginable, offering users plenty of power to blow themselves up while at the same time inhibiting them from accomplishing their task.

If the desktop computer is a dark and mysterious closet, the Internet is a positively terrifying, sucking black hole. The advent of the World

Page 272 Cite

Suggested Citation:"Position Papers: On Interface Specifics." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 272

Wide Web is helping to address part of the problem by making at least the waystations on the Internet visible, but just the sheer immensity of today's Cyberspace is frightening to all but a small group of people. Sure, the kinds of tasks users attempt on their computers have become more complex, but something else is leading to the increased difficulty of using our machines, something we need to address: we are designing our systems for power users, to the exclusion of everyone else.

Power Users Versus Expert Users

Most people want to be seen as power users, but then we have the real thing. Power users typically consist of bipedal, testosterone-soaked life forms between the ages of 18 and 39. Yes, I said testosterone-soaked life forms. At the risk of offending certain politically correct parties, there does appear to be a difference, however minor, between boys and girls. And the overwhelming majority of power users I've come across are definitely male.

Let me explain what I mean by power user. A "power user" is a person driven by hormones to want complete and utter control of every function of his or her computer, even if having such control seriously degrades efficiency and productivity. Tim Allen's character on "Home Improvement," the ABC comedy series, is the prototypical power user. He's the only guy in the neighborhood with a 120-horsepower lawn mower that will do 0 to 60 in less than 7 seconds. It's not much use on his suburban lawn, but it makes a really neat noise when he starts it up.

I knew several guys at Apple who had so many weird public-domain extensions in their system folder that virtually none of their applications ran properly. Accomplishing the smallest task was like walking through a mine field. So what? As far as they were concerned, it merely increased the challenge! They wouldn't have thought of paring down their systems.

Most women see their machines as serious productivity tools, there for the express purpose of helping them accomplish their task. Women want to do their work, not "play computer" (Bulkeley, 1994). They are not alone: A high percentage of men don't want to "play computer" either; they just don't dare complain about it.

Many people across the board become expert users. Expert users understand their craft and are competent at using the tools that will help them succeed. They may have no interest in tearing apart their tools, either to understand them or to "improve" them. It's the difference between someone who is an expert at driving a car and someone who looks forward to Saturday morning because that is when he can tear the car apart and perhaps get it back together. The Saturday-morning power

Page 273 Cite

Suggested Citation:"Position Papers: On Interface Specifics." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 273

user may very well not be particularly expert at driving the car (although he will claim to be).

Changing Times

Thirty years ago computer users consisted of two classes: young male programmers and operators and powerless, minimum-wage females who endlessly keypunched 80-column cards. (Of course, not all keypunch operators were female. I was one of the few powerless male key-punch operators in those days. My cohorts and I quickly escaped, but the women were generally not so lucky.)

Today, two-thirds of personal computer users are women, according to a Logitec Inc. poll (1992), and millions of those female users are now in higher technical and management positions. Those who are not wandering the labyrinths of cyberspace today will be in the very near future.

The majority of users, according to the same poll, are now more than 36 years old. Most of us above the age of 36-male or female-have abandoned changing our own motor oil. And we are no longer quite as amused by the prospect of spending 10 hours tracking down the reason why our World Wide Web connection has become unresponsive ever since we installed our new tax planner.

The Economic Penalty For "Boytoyism"

A 1992 survey by Nolan, Norton and Company pegged the annual cost of ownership for a standard business personal computer at as much as $21,500 per year. That's a lot of money for a $5,000 computer. Where's the money going? A disproportionate percentage can be traced to direct and indirect training costs. The study found that the known visible costs per computer ranged from $2,000 to $6,000 annually. These direct costs cover hardware, software, initial installation, scheduled maintenance, and people taking time from their regular jobs to attend training classes.

The indirect costs are more complex: users waste time pressing buttons and flailing through manuals trying to figure out what went wrong with their machine, when the problem is that they inadvertently triggered some unknown and less-than-obvious system state. They waste time wandering around looking for a warm body in another office to ask for help. Finally, when they find someone who can help, the other party ends up wasting their own time, too. This peer-to-peer training is expensive. The study pegged the cost at $6,000 to $15,000 per year.

According to Bulkeley (1992):

Page 274 Cite

Suggested Citation:"Position Papers: On Interface Specifics." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 274

"We all just about fell out of our chairs when we saw the amount of mutual support," says David J. Baker, a process consultant for Sprint who participated in organizing the study. "Everyone knew [peer-to-peer training] was taking place, but when we guessed what the amount would be beforehand, we missed by a factor of 65."

Supportive Family And Friends

The effect of interface complexity can be felt beyond the workplace. Ordinary people's access to cyberspace is a direct function of their access to a family member or friend who can carry out informed peer-to-peer activities. Should the knowledgeable family member die, divorce, or grow up and move out, the other family members may lose their access to cyberspace with the first disk problem or mandatory upgrade. Even if their hardware and software systems continue to function, they may well be unable to gain the full benefit of services offered, just because they do not have the technical skills to access them.

Ordinary citizens who do not live in a high-tech area of the country, who had no one to lean on from the beginning, are just as effectively disenfranchised from the cyber revolution as those who are economically disadvantaged.

Research Issues

While the solution to the problem of today's excessive complexity will involve applied technology, the problem has not arisen from a lack of technology, and it will not be solved by blindly throwing more technology at it. The solutions, I believe, lie more in the areas of sociology and psychology.

Macintosh used to have the slogan, "the computer for the rest of us." Macintosh was not. From the beginning, the Macintosh was designed to be "the computer for the rest of them." The Macintosh team, like the Lisa team, Alan Kay's Xerox PARC Altos team, and Doug Engelbart's original SRI team before them, was keenly aware that they were designing not for themselves but for others. All these teams held a common understanding of who their users were and chief in that understanding was a rock-solid belief that users were not like themselves.

Ten years later we are expecting ordinary citizens traveling on the World Wide Web to follow a naming convention so foreign to human experience as to be completely incomprehensible: http://www.goliath.com/&sim;grandma.

We need research projects that will enable us to form a bridge between the needs of ordinary people and the inventiveness of our young technological minds. Several studies have shown startling differences

Page 275 Cite

Suggested Citation:"Position Papers: On Interface Specifics." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 275

between software engineers and ordinary citizens on Jungian psychological-type tests (Sitton and Chmelir, 1984; Tognazzini, 1992). These tests need to be repeated and expanded on, using larger populations and varied and more exacting instruments, answering the question: How are engineers different from ordinary citizens? Those engineers who do want to make technology accessible to ordinary people have little to go by since we still know remarkably little about our ordinary citizens. What are the capabilities, needs, and wants of ordinary citizens?

Today, software engineers master systems of amazing complexity during the course of their education and often graduate with the attitude that others can and should experience the same complexity they have: In what ways can we improve the education of our engineers so they are better able to understand and provide for the needs and wants of ordinary citizens?

Much of the complexity our computer science students face is necessary-they are doing complex things. However, much of it is just bad design, bad design they often end up emulating in their own products. How prevalent is bad design in the systems that computer science students use? What can be done to improve those systems? What can be done to sensitize our students to bad design and its consequences, so they will case emulating it?

We need case studies of projects that resulted in approachable products or services versus those that only an engineer could love. What makes a project result in a system that people can use? What caused that project to succeed? What changes could have been made early on in projects that resulted in difficult-to-use systems that would have made them more approachable?

While many organizations have embraced the idea of human interaction design as a profession, many still see human interaction design as something to be done by engineers in the normal course of their work. What are the comparative outcomes of projects done in conjunction with human interaction designers versus engineers acting alone? Is the investment in human interface specialists worth it? Does that investment result in designs that are more approachable by ordinary citizens?

Our interfaces to the national information infrastructure must be accessible to ordinary people. The Star-Lisa-Macintosh interface made a fundamental shift in design away from the earlier ''black cave" interfaces. People had previously been expected to navigate blindly through cyberspace, leaping from menu to menu, building in their mind's eye an image of what their cyberspace looked like. The new interfaces swept all of that aside. The "lights were turned on," with cyberspace objects and actions represented by icons, menu selections, and other visible objects in the interface. People no longer navigated at all: everything was brought to

Page 276 Cite

Suggested Citation:"Position Papers: On Interface Specifics." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 276

the user, with the user staying always in one place-seated before the desktop.

The Web represents a step backward to the old black cave metaphor. True, people can see one home page at a time, but they are back to navigating their way around cyberspace and, once again, can see no visual evidence of their movement. It's like the tunnel of love: you see a lot of objects jumping out at you, but you don't really know where you are.

In the early days of the personal computer, we were attempting to sell an unproven technology to a skeptical world. We could not depend on people investing weeks or months of self-education in a system they did not yet know would improve their lives. We had to make things easy. However, sometimes, what was easy in that first 20 minutes was not necessarily the right solution for maximum efficiency over the long haul.

People now recognize the value of the personal computer or workstation. They are willing to make a reasonable commitment toward learning. The interface of today does not necessarily need the training wheels that the Star, Lisa, and Macintosh provided, but we need research to find out how much we can increase the difficulty of the learning experience in an effort to further empower users. We need to establish the relevance today of the principles that drove the design of the original graphical user interfaces, as embodied in such lists as the Principles of Macintosh Design (Apple Computer, Inc., 1986). Which of the design principles for early graphical user interfaces represented "training wheels," and which represented needs, wants, and limitations of ordinary citizens that are just as important today as they were then?

Since the advent of those early graphical user interfaces, users have faced increasing complexity. What are the areas of today's technology that act as a barrier to ordinary citizens? What seeming complexities are not acting as barriers but are in fact embraced by ordinary citizens?

It takes 16 years to learn how to drive, with a lot of formal and informal education along the way. How much education should our ordinary-citizen children be receiving in school? What form should this education take? Based on an understanding that our children would be formally educated in information retrieval and other complex computer tasks, how far can we increase the learning burden for such tasks in our aim to improve overall efficiency and productivity?

Finally, we need to explore what, if anything, government or industry needs to do to bring simple power to our systems. Will competition by itself eventually result in approachable systems, or will we need an "Underwriters' Laboratory" type of institution that can certify our technological efforts? Will we need fast-moving standards organizations that can

Page 277 Cite

Suggested Citation:"Position Papers: On Interface Specifics." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 277

stay abreast of developments? If so, could there ever be any such thing as a fast-moving standards organization?

We have recently seen the result of the computer industry "putting its foot down" on the issue of the digital versatile disk (DVD, nee digital video disk). Instead of two competing systems being thrown on the market, there will be only one, and that one is better than either of the two that would have arrived. The DVD was also designed, rather than just kind of evolving. It had input from marketers, as well as engineers, marketers who actually went out and spoke to clients. Industry cooperation can work.

On the other hand, industry has had a miserable overall record of cooperation and consistency, from the VHS versus Beta wars to the Windows versus Macintosh wars, with ordinary citizens not only paying the price along the way but often ending up with an inferior alternative in the end. We know governmental supervision can work. We've seen it with the standards for NTSC video, for "compatible color," and, more recently, for HDTV (high-definition television) standards. The question always is whether any of us will live long enough to see the results. How can we achieve the certainty of governmental supervision with the mercurial speed of industry cooperation?

Summary

Many of the above explorations could be carried out by a variety of agencies-industry, government, or academia. Who does what study is probably relatively unimportant. What is critical is that it become a matter of public policy that we make our national information infrastructure accessible to ordinary citizens as well as the technologically gifted. What is critical is that people drop the belief that the realm of cyberspace should rightly be the exclusive province of those boys who worked on cars, that cyberspace is by nature, not by design, a dark, dangerous, and forbidding place.

Today, every aspect of computers, from the out-of-the-box experience to surfing the Internet, is a joy to "technoguys" and an unpleasant challenge to ordinary citizens. We have the technology to make the national information infrastructure accessible and attractive to the vast majority of our citizens. The time has come to make the investment in research and education that will enable all of our citizens to participate in the future.

References

Apple Computer, Inc. (1986). The Apple Human Interface Guidelines, Addison-Wesley, Reading, Mass.

Page 278 Cite

Suggested Citation:"Position Papers: On Interface Specifics." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 278

Bulkeley, William M. (1992). Study Finds Hidden Costs of Computing, Wall Street Journal, November 2.

Bulkeley, William M. (1994). A Tool for Women, A Toy for Men, Wall Street Journal, March 16.

Logitec Inc. (1992). PC's and People Poll, A National Compatibility Study of the Human Experience with Hardware, sponsored by Logitec Inc., 6505 Kaiser Drive, Fremont, CA 94555, (510) 795-8500.

Nolan, Norton and Company (1992). Managing End-User Computing, Nolan, Norton and Company, Boston.

Sitton, Sarah, and Chmelir, Gerard (1984). The Intuitive Computer Programmer, Datamation, October 15.

Tognazzini, Bruce (1992). Tog on Interface, Addison-Wesley, Reading, Mass.

Wylie, Margie (1995). "No Place for Women: Internet Is a Flawed Model for the Infobahn," Digital Media: A Seybold Report, Vol. 4, No. 8, pp. 3-6, January 2.

Page 279 Cite

Suggested Citation:"Position Papers: On Interface Specifics." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 279

Spoken-Language Technology

Ronald A. Cole

Oregon Graduate Institute of Science and Technology

Spoken-language systems allow people to communicate with machines by using speech to accomplish some task. The development of spoken-language systems is a true multidisciplinary endeavor, requiring expertise in areas of electric engineering, computer science, statistics, linguistics, and psychology. The technologies involved in spoken-language systems include speech coding, speech recognition, natural language understanding, and dialogue modeling and may optionally include speaker recognition, language identification, and speech-to-speech language translation.

Advances in speech technology are of critical importance to the goal of an every-citizen interface to the national information infrastructure (NII). To be sure, speech technology cannot by itself achieve this goal. Many people are unable to speak or hear. Some information (e.g., paintings) is not in a form that can be appreciated using speech. Nevertheless, the vast majority of people in the United States speak and understand a language, and speech is an obvious means for them to access information. As spoken-language technologies mature, we can imagine spoken-language systems performing as cooperative agents, not unlike helpful human operators, to support a wide variety of transactions. For this to take place, significant advances in the technology must occur through fundamental research.

A significant advantage of using speech as an interface modality is that it can be transmitted by existing communications networks using common and inexpensive devices such as telephones and televisions. Today, use of the Internet is limited to people with access to computers and the skills to use them. These requirements exclude a great many Americans: computers are too expensive for many of us to own, and about one-third of our citizens are functionally illiterate (National Center for Education Statistics, 1993). In the future, computers are unlikely to be the major appliance for accessing the NII; telephones, cellular phones, televisions connected to cable networks, and inexpensive information appliances are likely to become the preferred means of access.

The state of the art in human language technology was summarized recently in an international survey entitled State of the Art of Human Language Technology, sponsored by the National Science Foundation

Page 280 Cite

Suggested Citation:"Position Papers: On Interface Specifics." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 280

(NSF) and the European Community (Cole et al., 1996). Each of the 92 authors who contributed to the survey was asked to define a specific area of human language technology, review the state of the art in that area, and identify key research challenges. The survey is available on the World Wide Web at http://cse.ogi.edu/CSLU/HLTsurvey/ HLTsurvey.html. A second source of information on the state of the art of speech technology is the report of a workshop sponsored by the NSF in 1992 and published subsequently as a journal article (Cole et al., 1995). The workshop participants identified eight areas in which research advances are essential to the development of spoken-language systems and the infrastructure needed to support research in those areas: (1) robust speech recognition, (2) automatic training and adaptation, (3) spontaneous speech, (4) dialogue models, (5) natural language response generation, (6) speech synthesis and speech generation, (7) multilingual systems, and (8) multimodal systems.

Given the importance of speech technology to an every-citizen interface and to U.S. economic competitiveness, it is important to ask if the activities of the research community will produce the desired technology in the shortest period of time. In the remainder of this note, I offer my opinions about the major stumbling blocks to the development of spoken-dialogue systems for an every-citizen interface. These are (1) an insufficient focus on interactive systems by speech researchers, (2) limitations of statistical modeling approaches, and (3) lack of tools for research and technology transfer.

Insufficient Focus On Interactive Systems

The defining feature of a spoken-dialogue system is the interaction between human and machine. It follows that progress in developing these systems requires the continued study of how people interact with machines using speech. Such studies will highlight the limitations of speech recognition technology in the context of system use and focus research efforts on ways to overcome these limitations.

Today, the primary focus of speech recognition research in the United States is not interactive systems but the transcription of words in continuous speech. Large-vocabulary continuous speech recognition (LVCSR) is a priority of the defense establishment, which plays a major role in defining the priorities of the speech research community. For the past 12 years, progress in speech recognition research has been measured by recognition performance on benchmark tasks in annual competitions. Current benchmark tasks include recognition of articles read from newspapers, recognition of speech in news broadcasts, and recognition of speech in telephone conversations.

Page 281 Cite

Suggested Citation:"Position Papers: On Interface Specifics." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 281

Transcription of words in continuous speech is both important and challenging, but the challenges are different from those needed to produce spoken-dialogue systems. For example, research in LVCSR does not focus on such issues as how to phrase a system prompt, how to determine if a recognition error has occurred, or how to engage in conversational repair if such a determination is made.

Limitations Of Current Technology

There is growing evidence that current statistical modeling approaches to speech recognition, which treat speech as a sequence of independent time frames, will not scale to acceptable levels of performance on difficult tasks. For example, current systems are able to recognize about 50 percent of words in telephone conversations. This level of performance is achieved by gathering statistics on the frequency of occurrence of word sequences; performance drops to below 20 percent when word sequence constraints are disabled and word recognition is based solely on acoustic information. Significant effort has been devoted to this task in recent years, with only minor improvements in performance. The ability of statistical modeling techniques to recognize words in natural conversations is not encouraging.

A serious limitation of frame-based statistical modeling techniques is the difficulty of incorporating linguistic knowledge into the recognition paradigm. The IBM speech group, one of the pioneers of speech recognition using hidden Markov models, worked with linguists for several years to incorporate syntactic and semantic knowledge into IBM's systems, always with the same result-an increase in word recognition error rates. This led Bob Mercer, then of the IBM speech group, to assert in a keynote address to a speech recognition workshop that the most effective technique IBM has found for decreasing error rates is to fire a linguist.

The difficulty of incorporating linguistic knowledge into the dominant research paradigm stands as a major stumbling block to progress. Accurate speech recognition requires the integration of diverse acoustic cues, such as stop bursts, format movements, changes in pitch and comparison of acoustic features across segments. Similarly, speech understanding requires the integration of these acoustic cues with syntactic, semantic, pragmatic, and situational knowledge. No paradigm exists today that allows these information sources to be combined in a principled way that improves system performance. The result is that those with the most knowledge about human communication and spoken-language are largely excluded from the research process. New paradigms are needed that enable psychologists and linguists to become vital contributors to the development of human language technology.

Page 282 Cite

Suggested Citation:"Position Papers: On Interface Specifics." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 282

Lack Of Tools For Research And Technology Transfer

A final obstacle to progress in spoken-dialogue systems is the lack of available tools to support research and technology transfer. The development of spoken-language systems is a complex activity, requiring significant computer resources, integration of sophisticated signal processing, training and recognition algorithms, and language resources such as speech corpora and pronunciation dictionaries. Because of the resources and expertise required, spoken-language systems research is localized in a few specialized laboratories, which produce only five or six Ph.D. students each year. The result is that all but a few of the most fortunate students are denied the opportunity to participate in this exciting area of research, and we are not training enough researchers in an area of great strategic importance.

Without tools to create and manipulate spoken-dialogue systems and to support technology transfer, progress will be limited to the efforts of relatively few researchers at elite laboratories. For progress in spoken-language systems to occur, researchers need tools to rapidly design working systems and manipulate system parameters to test experimental hypotheses.

Despite these obstacles, I see great hope for the future. This workshop recognizes the importance of interface technologies, as do an increasing number of NSF initiatives in human language technology sponsored jointly by the Defense Advanced Research Projects Agency and other defense agencies. The growing support for interface research is bringing new researchers and new ideas into the field. Some of these researchers will focus their efforts on spoken-dialogues systems, and some will produce more powerful recognition techniques that will limit the amount of engineering required for each new task. There are also efforts under way to develop and distribute tools to support research and development of spoken-language systems. One such toolkit has been released by the Center for Spoken-language Understanding at the Oregon Graduate Institute (Sutton et al., 1996).

References

Cole, R.A., L. Hirschman, et al. 1995. The Challenge of Spoken-language Systems: Research Directions for the Nineties, IEEE Transactions on Speech and Audio Processing, Vol. 3, No. 1, pp. 1-21.

Cole, R.A., J. Mariani, H. Uszkoriet, A. Zaenen, and V. Zue (Eds.). 1996. Survey of the State of the Art in Human Language Technology, Cambridge University Press, Stanford University, Stanford, Calif.

National Center for Education Statistics. 1993. Adult Literacy in America, U.S. Department of

Page 283 Cite

Suggested Citation:"Position Papers: On Interface Specifics." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 283

Education, technical report no. GPO 065-000-00588-3, U.S. Government Printing Office, Washington, DC, September.

Sutton, S., D.G Novick, R. Cole, P. Vermeulen, J. de Villiers, J. Schalkwyk, and M. Fanty. 1996. Building 10,000 Spoken-Dialogue Systems, to appear in Proceedings of the 1996 International Conference on Spoken-Language Processing, Philadelphia, Pa. (Information on the availability of the toolkit is provided on-line at http://www.cse.ogi.edu/CSLU/toolkit/toolkit.html.)

Page 284 Cite

Suggested Citation:"Position Papers: On Interface Specifics." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 284

Toward An Every-Citizen Interface

Steven K. Feiner

Columbia University

Introduction

Building user interfaces to the national information infrastructure (NII) that can fulfill the needs of all users, rather than just a privileged subset, will be a difficult task. In this position paper, I state my understanding of what the NII will be, lay out a set of goals for future NII user interfaces, and describe some research issues and projects associated with these goals.

I take the NII to be the public medium supporting all forms of interaction between people and machines that do not require the transport of physical matter. A user's interactions with the NII are accomplished through displays, interaction devices, and controlling software, which together comprise a user interface. I expect that an interface's displays and interaction devices would in most cases be the property of an individual or private company, as would the facilities needed to communicate and store information within a home, office, car, or pocket. The NII would include public networks that carry information between these private facilities and public sources of information and computation. Moreover, it would also include public displays and interaction facilities (e.g., the global positioning system (GPS) position tracking infrastructure), and software and standards needed to make communication and interaction possible.

Goals

I have tried to capture the properties that I believe user interfaces to the NII should ideally have in the following list of high-level characteristics.

Multimedia

Interactions should be multimedia and multimodal, taking maximal advantage of all our senses to communicate information effectively. I intend this to go beyond the combination of graphics, video, text, sound, and voice typically implied by popular usage of the term multimedia to encompass the goals of research in virtual environments and visualization.

Page 285 Cite

Suggested Citation:"Position Papers: On Interface Specifics." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 285

Adaptive

User interfaces should adapt to the needs and abilities of the individual user and situation, interactively tailoring both the form and content of the material being presented and providing customized help when necessary. An adaptive user interface would take into account factors as diverse as the user's education, skills, previous experience, and physical capabilities or disabilities. Recognizing that many activities are long lived, it should accommodate fluid and frequent changes in all aspects of the environment: who, what, where, when, why, and how.

Integrated

Interaction through the NII should be integrated smoothly and naturally into our daily activities, rather than being, as it is now, a compartmentalized special-purpose activity accomplished only when sitting in front of a workstation running special software. That is, the goal is not just to get the NII into our homes but rather to get it into our lives. In part this means mobility and wearability but without the compromises that are built into current PDAs.

Collaborative

Many of the tasks we perform are group activities, not solitary endeavors. NII user interfaces should support collaborative work and play, regardless of whether users are collaborating in the same place or at the same time.

Instructable

A user should be able to describe tasks that are to be carried out through the NII. Assuming that the lowest-level steps are within the capability of the resources available, the tasks should ideally be as rich and complex as those that the user could describe to other people. I hesitate to use the word ''programmable" here to avoid the implication that this should involve a conventional programming language, such as Java, or even the simpler languages provided by current systems for end-user programming.

Responsive

Large enough quantitative differences in performance can make for qualitative differences in how a user interface feels and how it is used.

Page 286 Cite

Suggested Citation:"Position Papers: On Interface Specifics." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 286

Sufficient resources must be available to all users to allow certain baseline tasks to be accomplished comfortably.

Empowering

Independent of a system's style (e.g., "invisible," "intelligent agent," or "direct manipulation tool"), its users should be able to accomplish more with it than without it and should feel a sense of satisfaction in doing so.

Research Issues And Projects

Each subsection below provides a background overview, followed by a selected set of issues and projects, keyed to the list of characteristics presented above.

Multimedia

Background

Interpreting "multimedia" broadly, I see two major research subgoals here: developing user interfaces that support real-time interaction with true three-dimensional (3D) input/output devices (i.e., virtual environments or virtual worlds) and learning how to use these devices to present information effectively, a task known in the graphics community as visualization. I am partial to the term information visualization (Card et al., 1991), which is rapidly gaining currency (e.g., see Gershon and Eick, 1995) and which stresses the diversity of domains and users that can benefit beyond those targeted by research on scientific visualization. While visualization research embraces work that appeals to senses other than the visual, the term sonification has been used to refer explicitly to the ways in which information can be presented through sound (Kramer, 1994).

Most state-of-the-art commercial user interfaces emphasize the use of 2D Windows, with which users interact using 2D devices such as mice. Increasing CPU (central processing unit) power, combined with the popularization of VRML (Virtual Reality Modeling Language; VRML, 1996) and the introduction of low-priced sound, video, and 3D graphics cards, is transforming personal computers into 3D multimedia workstations. The results thus far are evolutionary: 3D graphics appear in 2D windows and are manipulated under mouse control. Research in 3D user interfaces extends beyond this to address the use of interactive 3D graphics, audio, and haptics, presented with true 3D stereo displays and 3D interaction devices that monitor the user's actions in three-space. The goal is to

Page 287 Cite

Suggested Citation:"Position Papers: On Interface Specifics." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 287

harness the physiological capabilities and training that enable us to perform physical tasks effectively in 3D, and apply them to develop effective user interfaces for visualizing and accomplishing computer-based tasks.

Issues

We must develop real-time operating systems support for highly parallel asynchronous input (from large numbers of 3D trackers) and output (to multiple-display modalities). We need to build effective "augmented realities" (Caudell and Mizell, 1992; Bajura et al., 1992; Feiner et al., 1993b) that enrich the user's existing environment with additional information, merging synthesized material with what the user normally sees, hears, and feels-overlaying or replacing it, as appropriate. We need to develop display and interaction device hardware that matches our abilities better than the current offerings do, including high-quality, high-resolution, wide-field displays (e.g., graphics, sound, force, temperature) and tracking (e.g., hand, body, eye). For example, there is a need for lightweight, comfortable, high-quality, "see-through" displays for use in augmented realities. A general-purpose see-through display technology would allow differential visual accommodation, corresponding to real and virtual objects at different distances in the same image. It would also perform full visible-surface determination with all objects, real and virtual: virtual objects should be able to occlude real objects, and real objects should be able to occlude virtual ones.

How can we map abstract task domains effectively to a 3D environment in which we can visualize and manipulate objects in the domain? How can we take advantage of the richness of 3D gesture to reduce our reliance on icons to express actions in current user interfaces? For example, rather than moving an item to the trash can, could we dispose of it by using an appropriate gesture?

In a world of whole-body computer interaction, there may no longer be any distinction between human factors in general and the human factors of computer interfaces. The existing hardware that limits our capabilities (and that also limits our mistakes) will be gone, making it possible to create user interfaces that are both far better and far worse than anything we can create now. How can we ensure that 3D user interfaces are usable, especially in an environment that supports end-user programming and customization?

Projects

Much of this work is and should be multidisciplinary. For example, the design of display and interaction device hardware and software

Page 288 Cite

Suggested Citation:"Position Papers: On Interface Specifics." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 288

should be informed by research in human psychophysics. The design of user interface software should draw on disciplines that have long explored the design and use of 3D space, such as architecture, industrial design, theater (Laurel, 1993), and dance.

Adaptive

Background

This goal is to develop approaches that make it possible for user interfaces to adapt interactively to the needs of the current user, situation, and hardware. Adaptive multimedia user interfaces should be able to design and present information to people through multiple output media and understand user input provided through multiple input media. They should be able to adapt to the user's work mode, be it direct manipulation and exploration or passive observation.

Issues

To design high-quality adaptive multimedia user interfaces, we must first be able to design ones that function well in a single medium. We must be able to perform high-quality automated generation and understanding of individual media, ranging from those that have long been explored by artificial intelligence researchers (e.g., written text and speech) to less well-charted terrain (e.g., graphics, audio, haptics).

How can we predict and evaluate presentation quality? A system should be able to predict the quality of a presentation in the course of designing it. Based on these predictions, it should be able to refine the presentation until it is adequate. This requires the ability to evaluate the presentation (estimating how it will affect the user) and evaluating the user's response (estimating how it has affected the user). The ability to evaluate the presentation makes possible time-quality tradeoffs. For example, if our time is limited, we might prefer a "rush job" now over a higher-quality presentation later.

Temporal media are those in which information content is presented over time in a way that is controlled explicitly by the producer (Feiner et al., 1993a), such as animation, speech, and audio. We must develop generation and understanding capabilities for temporal media. Issues include how to "phrase" information (e.g., for maximal comprehension). For example, we must develop the ability to generate output and understand input that communicates complex temporal relationships. If a system can convey the relative order of actions, information need not be

Page 289 Cite

Suggested Citation:"Position Papers: On Interface Specifics." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 289

provided in chronological order (e.g., presenting the most important information first, as in a newscast).

We must develop facilities for coordinated generation and understanding of multiple media. The key challenge is to assure that material in different media reinforce, rather than interfere with, each other. Multimedia presentations must be temporally coordinated (especially when using temporal media such as animation and speech), so that information presented in all media is synchronized.

Given the ever-increasing amount of information bombarding us, automated multimedia generation offers the potential for automated summarization, selecting the material most relevant to a user's needs and presenting it in a way that meets their time constraints.

We must develop models that can be used as a basis for customizing the interaction between users and systems so that information is presented and obtained as effectively as possible. These models must represent:

•	Users. Including general human cognitive and physical abilities, individual abilities and preferences, and individual users' knowledge and skills.
•	Dialogue history. Track and maintain a history of the interaction between users and systems. This information makes possible references to things that happened in the recent or distant past.
•	Resources. Model the generation and input resources available to the system, making it possible for the system to choose between different ways of providing or obtaining information, based on what displays and interaction devices are or will be available.
•	Activities. The application knowledge per se, both general and specific to what the users are doing.
•	Situations. Model current situations (e.g., routine versus crisis, individual work versus multiuser interaction).

Rather than requiring that these models be static, they should be able to be updated on the fly. Difficult problems here include being able to determine how a user is affected by a presentation. For example, can the system determine whether a user has actually learned the material that an explanatory presentation is intended to communicate? Ideally, it should be able to do this based on the user's normal interactions with the system, without requiring explicit testing.

We need to develop the facilities to model the rhetorical structure of multimedia dialogues for real, complex multiuser tasks. This includes what a user tells the system, what the system tells the user, and what users tell other users, in addition to what the user(s) and system(s) each

Page 290 Cite

Suggested Citation:"Position Papers: On Interface Specifics." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 290

believes the others have communicated. By studying current multimedia interactions and developing cognitive models that account for how information is being communicated among participants, we can lay the groundwork for developing rules for generating and understanding multimedia.

Projects

This research would center in the artificial intelligence and human-computer inferface communities, especially in the fields of multimedia generation and understanding (Sullivan and Tyler, 1991; Maybury, 1993) and modeling of users (Kobsa and Wahlster, 1989) and how they perform tasks (Card et al., 1983).

Integrated

Background

Integration of the NII into our lives will mean, in part, accommodating users who are mobile, and who use the NII as they move about. As displays of all sizes proliferate, this will also spell the end of the one-user, one-display metaphor that underlies so many current systems. We need to support interaction in a world in which there are many displays and interaction devices: handheld, head-worn, desktop, and wall-mounted. Some will be private, others public (or at least shared). As users walk about, they will move into and out of the presence of some of these peripherals and of other users. We need to build user interfaces that exploit this rich and constantly changing combination of peripherals.

Issues

Drawing an analogy to window management, the term environment management has been used (MacIntyre and Feiner, 1996) to describe the idea of managing large numbers of objects on large numbers of displays. This is a difficult task: unlike the one or two displays that most window managers typically control, the environment may be continually changing as users and resources move and may include displays and devices that are shared, such as a wall-mounted hallway display. From the user's standpoint, however, environment management should ideally be easier than the current task of window management. This could be possible if environment management were to be carried out through systems that used knowledge of the user's needs and effective information presentation

Page 291 Cite

Suggested Citation:"Position Papers: On Interface Specifics." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 291

approaches to determine how to structure the surrounding information environment.

Projects

Research projects should build on ongoing research in mobile computing, wearable computing, ubiquitous computing (Weiser, 1991), and augmented reality.

Collaborative

Background

User interfaces should support collaborative problem solving and interaction among multiple people and computers cooperating in the same task or in coordinated tasks.

Issues

We must design systems that account for the personal presentation needs of individual users while allowing for communication among users based on material they have been presented in common. An important problem here is how is to accommodate users who have been presented with different information and who would like to refer to the presentation as they interact with each other. The system might serve as an intelligent "go-between" that mediates between users so that references made by one user to what she has been presented are translated into references to what another user has been presented.

The NII has the potential to help create a strong sense of national (and global) community. Consider the information infrastructure provided by a residential street, town square, or college dorm hallway. By encouraging citizens to interact with others across the country and providing information about our country's workings on the NII, we could foster a better understanding of how people depend on each other and ultimately provide more opportunity for an informed populace to participate in government. Imagine, for example, a multimedia SimCity-like virtual environment that modeled the country's economy and supported collaborative attempts to see how it responded to different situations and assumptions.

Projects

There are separate communities of researchers in computer-supported cooperative work (many of whom concentrate on the design of multimedia

Page 292 Cite

Suggested Citation:"Position Papers: On Interface Specifics." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 292

systems in the popular sense of the term) and in distributed multiuser virtual environments. Joint research projects could be especially fruitful here.

Instructable

Two key research areas for the creation of instructable systems are programming by demonstration and agents.

Programming by Demonstration

Background. Research in end-user programming attempts to develop ways for end users to "program" an application's behavior without the overhead of learning or using a conventional programming language. One promising line of research is "programming by demonstration," in which users demonstrate the tasks to be performed using the application's interface (Cypher, 1993). A simple example is the keyboard macrofacility in e-macs: the user can specify that a series of keystrokes issued in the course of editing should be saved (and optionally named and bound to a key), so that it can be applied again, typically at another place in the document being edited. Since the demonstration is a specific example, if it is to be applied to other situations, it must be generalized. In the case of the e-macs keyboard macro, generalization is usually achieved solely by using keystroke commands that operate relative to the current position in the file.

An allied notion is that of having the system learn patterns in the user's behavior and volunteer to complete some recognized pattern when it guesses that the user has begun to perform it. Existing research systems monitor the user's interactions during a session, can present "graphical histories" of a session, allow the creation of macros using about-to-be-executed (or previously executed) commands, and can perform primitive inferencing to support simple generalization.

Issues. Most existing end-user programming facilities rely on simple straight-line flow of control (or escape into conventional programming syntax to perform all but the simplest conditionals and loops). How can end-user programs allow complex flow of control without looking and feeling like conventional programming? How can they incorporate multimodal interaction into the programming user interface itself?

How can we generalize demonstrational programs in a way that minimizes the amount of end-user involvement while maximizing the places where the system guesses right? When should generalization be performed-at

Page 293 Cite

Suggested Citation:"Position Papers: On Interface Specifics." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 293

program creation time, at execution time? What sources of information can be used?

If large numbers of user-developed programs exist, how can the user find the ones that are relevant to some specified task? How can the user determine what each does (without necessarily having to execute it)? Note that this is a particularly difficult example of on-line search: the user isn't looking for a match on a text string but rather on a set of capabilities, which may be implicit in the program.

How can a user-developed program be modified? How can one develop an end-user programming capability that intrinsically supports cooperation among multiple users and systems?

Projects. Most work in this domain has concentrated on 2D user interfaces. I think there is much to be gained in trying to take advantage of interactive multimedia and 3D in the design of the language itself.

Agents

Background. One kind of instructable interface is based on the metaphor of an "agent" that carries out a task on the user's behalf, often using knowledge and abilities that the user may not have herself. There has been a fair amount of heated debate in the human-computer interface community, pitting proponents of agent-based user interfaces against those who favor direct-manipulation user interfaces. Among the arguments against agents are claims that people may prefer interfaces over which they feel they have direct control and that agent-based interfaces are being unfairly touted as having some responsibility for their actions beyond that of their programmer or user.

Issues. I believe that much of the controversy is due to the popular conception of the agent as a busy-body anthropomorphic assistant, in the manner of the nattering bow-tied helper in Apple's "Knowledge Navigator" videotape. While the argument has been made that users will not want to sacrifice control to such agent-based systems, people willingly give up control in other matters that do not involve computers. For example, although it is common to compare the relative ease of using cars and computers, consider instead the car's predecessors: horses, mules, and donkeys. Environmental issues aside, would you really send even an experienced driver hurtling down the Grand Canyon's trails on a motorcycle? Yet each year thousands of folks with no previous riding experience travel those same trails safely on mules. A mule's rider exerts only

Page 294 Cite

Suggested Citation:"Position Papers: On Interface Specifics." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 294

discretionary high-level control with regard to general speed and direction, especially so for inexperienced riders. Riders are even told that, if acrophobia sets in, they should just close their eyes, hold on, and let the mule find its way-the original intelligent user interface. Mules are hardly anthropomorphic (although the reverse is sometimes true), yet they are possessed of skills and abilities that we don't have. While we may be amazed at how much more surefooted they are than us, we find this reassuring, not intimidating. Instead of asking why computers can't be more like cars, perhaps we should ask why computers can't be more like mules.

Projects. There is already overlap between the programming-by-demonstration and agent-based systems communities, particularly in addressing how agents can be instructed to perform tasks. Coordinated projects could address how users would determine what these systems can do (including what they can learn and what they already know). There is also potential for joint research with the multimedia generation community.

Responsive

Background

The goal is to build systems that can utilize the power available throughout the NII in a way that doesn't compromise the responsiveness of the user interface.

Issues

Resources needed to make a responsive system include not only network bandwidth and computational power but also appropriately sized and sited storage. While we can assume that users will have personal storage space at home, permanent or temporary mirroring of material at additional sites throughout the NII might be able to significantly decrease network load and response time. For example, we might have a system of large public storage caches located throughout the country to provide users with relatively local copies of frequently referenced material. This could include both conventional "mirror" sites and caches of currently accessed material controlled by some automated paging strategy. This could be the next tier in a caching system that would include the individual local memory and disk caches of current Web browsers.

Projects

Many of the issues here build on research being done in the systems

Page 295 Cite

Suggested Citation:"Position Papers: On Interface Specifics." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 295

(OS and distributed systems) and multimedia storage/transport communities.

Empowering

Background

Plainly put, we need to study the kinds of things that people do and determine how the NII can best assist in doing them. In part, this will involve building the models of activities mentioned previously.

Issues

I trust that falling prices will ultimately put any technology that has the potential to be popular within the reach of all. This has happened with television, microwave ovens, Walkman-style tape players, digital watches, and compact-disc changers. It is about to happen with computers, be they net-tops, set-tops, palm-tops, or something else. Unlike fixed-function devices, however, computers (in particular, computer programs) have an essentially unlimited potential to confuse and intimidate. While much of this potential can be mitigated through better user interface design, there is no substitute for users having the right skills and mindset. Even if we can build powerful systems that are truly ''self-teaching," users will still need time to learn how to use them effectively. We need to ensure not only that computer skills (whatever that might mean in the future) are taught in school but also that there is ample opportunity and time for people who are not in school to acquire them.

Projects

Experimental studies and model building by academic and industrial researchers address only one part of the problem. Enlightened social and governmental policies also will be key.

References

Bajura, M., Fuchs, H., and Ohbuchi, R. 1992. Merging Virtual Objects with the Real World: Seeing Ultrasound Imagery Within the Patient. Computer Graphics 26(2):203-210.

Card, S., Moran, T., and Newell, A. 1983. The Psychology of Human-Computer Interaction. Lawrence Erlbaum Associates, Hillsdale, N.J.

Card, S.K., Robertson, G.G., and Mackinlay, J.D. 1991. Proceedings of the Computer Human Interactions: Human Factors in Computing Systems, pp. 181-188. The Information Visualizer, An Information Workspace. New Orleans, La., April 28-May 2.

Caudell, T., and Mizell, D. 1992. Augmented Reality: An Application of Heads-Up Display

Page 296 Cite

Suggested Citation:"Position Papers: On Interface Specifics." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 296

Technology to Manual Manufacturing Processes, Proceedings of the Hawaii International Conference on System Science, January.

Cypher, A. (Ed.). 1993. Watch What I Do: Programming by Demonstration, MIT Press, Cambridge, Mass.

Feiner, S., Litman, D., McKeown, K., and Passonneau, R. 1993a. Towards Coordinated Temporal Multimedia Presentations. Intelligent Multimedia Interfaces, M. Maybury (Ed.), pp. 139-147. AAAI/MIT Press, Menlo Park, Calif.

Feiner, S., MacIntyre, B., and Seligmann, D. 1993b. Knowledge-Based Augmented Reality. Communications of the ACM 36(7):52-2.

Gershon, N., and Eick, S. (Eds.). 1995. Proc. Information Visualization '95. IEEE Computer Society Press, Los Alamitos, Calif.

Kobsa, A., and Wahlster, W. (Eds.). 1989. User Models in Dialogue Systems. Springer-Verlag, Berlin.

Kramer, G. (Ed.). 1994. Auditory Display: Sonification, Audification, and Auditory Interfaces. Addison-Wesley, Reading, Mass.

Laurel, B. 1993. Computers as Theatre. Addison-Wesley, Reading, Mass.

MacIntyre, B., and Feiner, S. 1996. Future Multimedia User Interfaces. Multimedia Systems.

Maybury, M. (Ed). 1993. Intelligent Multimedia Interfaces. AAAI/MIT Press, Menlo Park, Calif.

Sullivan, J., and Tyler, S. (Eds.). 1991. Intelligent User Interfaces. Addison-Wesley, Reading, Mass.

VRML (Virtual Reality Modeling Language). 1996. The VRML Forum (available on-line at http://vrml.wired.com/).

Weiser, M. 1991. The Computer for the 21st Century. Scientific American 265(3):94-104.

Page 297 Cite

Suggested Citation:"Position Papers: On Interface Specifics." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 297

Nomadicity, Disability Access, And The Every-Citizen Interface

Gregg C. Vanderheiden

University of Wisconsin-Madison

The Challenge

With the rapid evolution of the national information infrastructure (NII) and the global information infrastructure (GII), attention has turned to the issue of information equality and universal access. Basically, if information systems become as integral to our future life-styles as electricity is today, access to these systems will be essential for people to have equal access to education, employment, and even daily entertainment or enrichment activities.

Although the goal of equal access seems noble, it can seem somewhat less achievable when one considers the full range of abilities or disabilities which must be dealt with to achieve an every-citizen interface. It must be usable even if people

•	cannot see very well-or at all;
•	cannot hear very well-or at all;
•	cannot read very well-or at all;
•	cannot move their heads or arms very well-or at all;
•	cannot speak very well-or at all;
•	cannot feel with their fingers very well-or at all;
•	are short, are tall, use a wheelchair, and so forth;
•	cannot remember well;
•	have difficulty learning or figuring things out;
•	have little or no technological inclination or ability; and/or
•	have any combination of these difficulties (e.g., are deaf-blind; have reduced visual, hearing, physical, or cognitive abilities, which occurs in many older individuals).

In addition, the products and their interfaces must remain equally efficient and easy to use and understand for those who (1) have no problems seeing, hearing, moving, remembering, and so forth; and (2) are power users.

Page 298 Cite

Suggested Citation:"Position Papers: On Interface Specifics." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 298

Is It Possible

A list like this can bring a designer up short. At first blush, it appears that even if such an interface was possible it would be impractical or inefficient to use for people with all of their abilities intact. Packages such as the EZ Access approach developed for kiosks (http://trace.wisc.edu/world/kiosk), PDAs (personal digital assistants), and other touchscreen devices, however, demonstrate how close we can come to such an ideal, at least for some types of devices or systems. Using a combination of Talking Fingertip and Speed List technologies, the EZ Access package (for information, see http://trace.wisc.edu/TEXT/KIOSK/MINIMUM.HTM) provides efficient access for individuals with low vision, blindness, and poor or no reading skills. A ShowSounds/caption feature provides access for individuals with hearing impairments or deafness, as well as access for all users in very noisy locations. An infrared link allows the system to be used easily with alternate displays and controllers, so that even individuals who are deaf-blind or paralyzed can access and use the system. Thus, with a relatively modest set of interface variations, almost all the needs listed above can be addressed.

Is It Practical

Practicality is a complex issue which involves cost, complexity, impact on overall marketability, support, and so forth. To use the EZ Access approach as an example, the hardware cost to provide all of these dimensions of accessibility to a standard multimedia kiosk is less than 1 percent of the cost of the kiosk. Addition of this technique does not affect the standard or traditional mode of operation of the kiosk at all. At the same time, it makes the system usable by many visitors as well as new citizens whose native language is not English, and who may have some difficulty with words. Implementing cross-disability interface strategies can take only a few days with the proper tools. EZ Access techniques are currently used on commercial kiosks in the Mall of America and other locations. Other examples of built-in accessibility are the access features that are built into every Macintosh- and Windows 95-based computer.

Thus, if done properly, interfaces that are flexible or adjustable enough to address a wide range of individuals can be very practical. There are, however, approaches to provide additional access or access for additional populations that are not currently practical (e.g., building $2,000 dynamic braille displays into every terminal or kiosk). In these cases, the most practical approach may be to make the information and control necessary for operation of the device available on a standard connector so that a person who is deaf and blind can connect a braille display

Page 299 Cite

Suggested Citation:"Position Papers: On Interface Specifics." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 299

and keyboard. Practicality also is a function of the way the access features relate to and reinforce the overall interface goals of the product.

How Does An Every-Citizen Interface Relate To Nomadic Systems

The devices of tomorrow, which might be referred to as TeleTransInfoCom (tele-transaction/information/communication) devices, will operate in a wide range of environments. Miniaturization, advances in wireless communication, and thin-client architectures are rapidly eliminating the need to be tied to a workstation or carry a large device in order to have access to computing, communication, and information services and functions.

As a result, we will need interfaces for use while driving a car, sitting in an easy chair, sitting in a library, participating in a meeting, walking down the street, sitting on the beach, walking through a noisy shopping mall, taking a shower, or relaxing in a bathtub, as well as sitting at a desk. The interfaces also will have to be usable in hostile environments-when camping or hiking, in factories or shopping malls at Christmas time.

Many of us will also need to access our information appliance (or appliances) in very different environments on the same day-perhaps even during the same communication or interaction activity. These different environments will place constraints on the type of physical and sensory input and output techniques that work (e.g., it is difficult to use a keyboard when walking; it is difficult and dangerous to use visual displays when driving a car; speech input and output, which work fine in a car, may not be usable in a shared office environment, a noisy mall, a meeting, or a library). Systems designed to work across these environments will therefore require flexible input/output options to work in different environments. The interface variations, however, must operate in essentially the same way, even though they may be quite different (visual versus aural). Users will not want to master three or four interface paradigms in order to operate their devices in different environments. The metaphor(s) and the "look and feel" must be continuous even though the devices operate entirely visually at one point (e.g., in a meeting) or entirely aurally at another (e.g., while driving a car). Many users will also want to be able to move from one environment to another, one device to another (e.g., workstation to hand-held), and one mode to another (e.g., visual to voice) in the midst of a task.

Does Nomadicity Equal Disability Accessible

It is interesting to note that most of the issues regarding access for

Page 300 Cite

Suggested Citation:"Position Papers: On Interface Specifics." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 300

people with disabilities will be addressed if we simply address the issues raised by the range of environments described above:

•	When we create interfaces that work well in noisy environments such as airplanes, construction sites, or shopping malls at Christmas, and for people who must listen to something else while they use their device, we will have created interfaces that work well for people who cannot hear well or at all.
•	When we create interfaces that work well for people who are driving a car or doing something that makes it unsafe to look at the device they are operating, we will have created interfaces that can be used by people who cannot see.
•	As we develop very small pocket and wearable devices for circumstances in which it is difficult to use a full-sized keyboard or even a large number of keys, we will have developed techniques that can be used by individuals with some types of physical disabilities.
•	When we create interfaces that can be used by someone whose hands are occupied, we will have systems that are accessible to people who cannot use their hands.
•	When we create interfaces for individuals who are tired, under stress, under the influence of drugs (legal or illegal), or simply in the midst of a traumatic event or emergency (and who may have little ability to concentrate or deal with complexity), we will have interfaces that can be used by people with naturally reduced abilities to concentrate or deal with complexity.

Thus, although there may be residual specifics concerning disability access that must be covered, the bulk of the issues involved are addressed automatically through the process of developing environment/situation-independent (modality-independent) interfaces.

What Is Needed

Interfaces that are independent of the environment or the individual must have the following attributes:

•

Wide variability in order to meet the diversity of tasks that will be addressed. Some interfaces will have to deal only with text capture, transmission, and display. Others will have to deal with display, editing, and manipulation of audiovisual materials. Some may involve VR (virtual reality), but basically be shop-and-select strategies. Others may require full immersion, such as data visualization and tele-presence.

•

Modality independence. Interfaces have to allow the user to choose sensory modalities appropriate to the environment, situation, or user.

Page 301 Cite

Suggested Citation:"Position Papers: On Interface Specifics." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 301

	Text-based systems will allow users to display information visually at some times and aurally at others, on high-resolution displays when available and on smaller low-resolution displays when necessary.
•	Flexibility/adaptability. Interfaces will be required that can take advantage of fine motor movements and three-dimensional gestures when a user's situation or abilities allow but can also be operated by using speech, keyboard, or other input techniques when this is necessary because of the environment, the user's activities, or any motor constraints.
•	Straight forwardness and ease of use. As much of the population as possible must be able to use these interfaces and to master new functions and capabilities as they are introduced.

Some Components Necessary To Achieve Every-Citizen Interfaces

Although this section does not address all possible interface types, particularly freehand graphic production interfaces (e.g. painting), it does address the majority of command-and-control interfaces.

1.	Modality Independence. For a device or system to be modality independent or alt-modal (i.e., the user can choose between alternate sensory modalities when operating the device), two things are necessary:
	a.	All of the basic information must be stored and available in either modality-independent or modality-redundant form.
		Modality independent refers to information that is stored in a form that is not tied to any particular form of presentation. For example, ASCII text is not inherently visual, auditory, or tactile. It can be presented easily on a visual display or printer (visually), through a voice synthesizer (aurally), or through a dynamic braille display or braille printer (tactually).
		Modality redundant refers to information that is stored in multiple modalities. For example, a movie might include a visual description of the audio track (e.g., caption) and an audio and electronic text description of the video track so that all (or essentially all) information can be presented visually, aurally, or tactually at the user's request based on need, preference, or environmental situation.
	b.	The system must be able to display data or information in different modalities. That is, it should provide a mechanism for displaying information in all-visual, or all-auditory, or mixed audiovisual form as well as in electronic form.
2.	Flexibility/Adjustability. The device must also offer alternate selection techniques that can accommodate varying physical and sensory abilities arising from the personal environment or situation (e.g., walking,

Page 302 Cite

Suggested Citation:"Position Papers: On Interface Specifics." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 302

		wearing heavy gloves), and/or personal abilities. Suggested alternate operating modes follow:
	•	Standard mode. This mode often uses multiple simultaneous senses and fine motor movements. It would offer the most effective device for individuals who have no restrictions on their abilities (due to task, environment, or disability).
	•	A list mode. In this mode, the user can call up a list of all the information and action items and use the list to select items for presentation or action. It would not require vision to operate. It could be operated using an analog transducer to allow the individual to move up and down within a list, or a keyboard or arrow keys combined with a confirm button could be used. This mode can be used by individuals who are unable to see or look at a device.
	•	External list mode. This would make the list available externally through a software or hardware port (e.g., infrared port) and accept selections through the same port. It can be used by individuals who are unable to see and hear the display and therefore must access it from an external auxiliary interface. This would include artificial intelligent agents, which are unable to process visual or auditory information that is unavailable in text form.
	•	Select and confirm mode. This allows individuals to obtain information about items without activating them (a separate confirm action is used to activate items after they are selected). It can be used by individuals with reading difficulties, low vision, or physical movement problems, as well as by individuals in unstable environments or whose movements are awkward due to heavy clothing or other factors.
	•	Auto-step scanning mode. This presents the individual items in groups or sequentially for the user to select. It can be used by individuals with severe movement limitations or movement and visual constraints (e.g., driving a car), and when direct selection (e.g., speech input) techniques are not practicable.
	•	Direct text control techniques. These include keyboard or speech input.

Example: Using A Uni-List-Based Architecture As Part Of The Interface

One approach to device design that would support this type of flexibility is the Uni-List architecture. By maintaining a continually updated listing of all the information items currently available to the user, as well as all the actions or commands available, it is possible to provide a very flexible and adjustable user interface relatively easily. All the techniques

Page 303 Cite

Suggested Citation:"Position Papers: On Interface Specifics." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 303

listed above are easy to implement with such an architecture, and it can be applied to a range of devices or systems.

Take, for example, a three-dimensional (3D) virtual reality-based shopping mall. In such an application, a database is used to provide the information needed to generate the image seen by the user and the responses to user movements or actions on objects in the view. If properly constructed, this database could also provide a continually updated listing of all objects in view as well as information about any actionable objects presented to the user at any time. By including verbal (e.g., text) information about the various objects and items, this 3D virtual shopping system can be navigated and used in a variety of ways to accommodate a variety of users or situations.

•	Individuals who are unable to see the screen (because they are driving their car, their eyes are otherwise occupied, or they are blind) can have the information and choices presented vocally (or via braille). They can then select items from the list in order to act on them, in much the same that an individual can pick up or "click on" an object in the environment.
•	Individuals with movement disabilities can have a highlight or "sprite" step around to the objects, or they could indicate the approximate location and have the items in that location highlighted individually (other methods for disambiguating also could be used) to select the desired item.
•	Individuals who are unable to read can touch or select any printed text presented and have it read aloud to them.
•	Individuals with low vision (or who do not have their glasses) can use the system in the same way as a fully sighted individual. When they are unable to see well enough to identify the objects, they can switch into a mode that lets them touch the objects (without activating them) and can thereby have them named or described.
•	Individuals who are deaf-blind could use the device in the same fashion as an individual who is blind. Instead of the information being spoken, however, it could be sent to the individual's dynamic braille display.

Additional Benefits Of Flexible, Modality-Independent Architectures And Data Formats

The two key underlying strategies for providing more universal access are input and display flexibility and the companion availability of information in sensory/modality-independent or parallel form.

Page 304 Cite

Suggested Citation:"Position Papers: On Interface Specifics." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 304

Both input and display flexibility and presentation independence have additional benefits beyond the every-citizen interface. These include the following:

•	Nomadicity support (discussed above).
•	Searchability. Graphic and auditory information that contains text streams can be indexed and found by using standard text-based search engines, which not only can locate items but also can jump to particular points within a movie or a sound file.
•	Alternate client support. The same information can be stored and served to different types of telecommunication and information devices. For example, information could be accessed graphically over the Internet, via a telephone by using a verbal form, or even by intelligent (or not so intelligent) agents using electronic text form.
•	Display flexibility. Presentation-independent information also tends to be display size independent, allowing it to be more easily accessed using very small, low-resolution displays. (In fact, some low-resolution displays present exactly the same issues as low vision.)
•	Low bandwidth. The ability to switch to text or verbal presentation can speed access over low-bandwidth connections.
•	Future Support. Modality-independent servers will also be better able to handle future serving needs that may involve access to information using different modalities. Creating a legacy system that cannot handle or serve information in different modalities may necessitate a huge rework job in the future as systems evolve and are deployed.

Limitations

Today, most of the universal access strategies are limited to information that can easily be presented verbally (in words). However, although the Grand Canyon could be presented in three dimensions through virtual reality, its full impact cannot be captured in words, nor can a Picasso painting or Mahler symphony easily be made sensory modality independent. Also, although planes could be designed to fly themselves, we do not as yet know how to allow a user who is blind to control directly flight that currently requires eye-hand coordination (or its equivalent). There are also situations in which the underlying task requires greater cognitive skills than an individual may possess, regardless of the cognitive skills required to operate the interface. It may be a while before we resolve some of these limitations to access.

On the other hand, we also have many examples where interfaces that were previously thought to be unusable by individuals with a particular disability, were later made easily accessible. The difference was

Page 305 Cite

Suggested Citation:"Position Papers: On Interface Specifics." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 305

simply the presence or absence of an idea. The challenge, therefore, is to discover and develop strategies and tools that can make next-generation interfaces accessible to and usable by greater numbers of individuals and easier for all to use.

Summary

Through the incorporation of presentation-independent data structures, an available information/command menu, and several easy-to-program selection options, it is possible to create interfaces that begin to approximate the anytime-anywhere-anyone (AAA) interface goal. Some interfaces of this type have been constructed and are now being used in public information kiosks to provide access to individuals with a wide range of abilities. The same strategies can be incorporated into next-generation TeleTransInfoCom devices to provide users with the nomadicity they will require in next-generation Internet appliances.

Before long, individuals will look for systems that allow them to begin an important communication at their desk, continue it as they walk to their car, and finish it while driving to their next appointment. Similarly, users will want the ability to move freely between high- and low-bandwidth systems to meet their needs and circumstances. They will want to access their information databases by using visual displays and perhaps advanced data visualization and navigation strategies while at a desk, but auditory-only systems as they walk to their next appointment. They may even wish to access their personal rolodexes or people databases while engaged in conversations at a social gathering (by using a pocket keypad and an earphone to ask, What is Mary Jones' husband's name?).

The approaches discussed will also allow these systems to address issues of equity such as providing access to those with disabilities or those with lower-technology and lower-bandwidth devices and providing support for intelligent (or not-so-intelligent) agent software. The AAA strategies discussed here do not provide full cross-environment access to all types of interface or information systems. In particular, as noted above, fully immersive systems that presented inherently graphic (e.g., paintings) or auditory (e.g., symphonies) information will not be accessible to anyone who does not employ the primary senses for which this information was prepared (text descriptions are insufficient). However, the majority of today's information and most services can be made available through these approaches, and extensions may provide access to even more.

Finally, it is important to note that not only do environment/situation-independent interfaces and disability-accessible interfaces appear to be closely related, but also one of the best ways to explore environment/situation-independent

Page 306 Cite

Suggested Citation:"Position Papers: On Interface Specifics." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.

×

Page 306

nomadic interface strategies may be the exploration of past and developing means for providing cross-disability access to computer and information systems.

Challenges And Research Areas

For a system to be more accessible to and usable by every citizen, it must be (1) perceivable, (2) operable, and (3) understandable.

The following areas of research can help to address these needs:

•	Data structures, compression, and transport formats that allow the incorporation of alternate modalities or modality-independent data (e.g., text embedded in sound files or graphic files);
•	Techniques and architectures for partial serving of information, (such as the ability to fetch only the visual, the auditory, the text, or any combination of these tracks from a multimedia file or to fetch one part of a file from one location and another part from a second location (e.g., fetching a movie from one location and the captions from another);
•	Modality substitution strategies (e.g., techniques for restructuring data so that ear-hand coordination can be substituted for eye-hand coordination);
•	Natural language interfaces (e.g., the ability to have information presented conversationally and to control products with conversation, whether via speech or ''typed" text);
•	Alternate, substitute, and remote interface communication protocols (e.g., robust communication protocols that allow sensory- and presentation-independent alternate interfaces to be connected to and used with devices having less flexible interfaces);
•	Voice-tolerant speech recognition (ability to deal with disarthric and deaf speech);
•	Dynamic tactile displays (two- and three-dimensional tactile and force feedback 3D);
•	Better random access to information/functions (instead of tree walking); and
•	Speed-List (e.g., EZ Access) equivalent access to structured VR environments.