Page 243

On Interface Specifics

An Embedded, Invisible, Every-Citizen Interface

Mark Weiser

Xerox Palo Alto Research Center

The nation's information infrastructure is a vast, loosely connected network of informing resources found mostly in people's everyday lives. When considering interfaces to new electronic information sources, and especially when replacing old information sources with new electronic sources, it is crucial to consider how the existing infrastructure really works. Two examples will help.

Consider how you would find a grocery store in a new town. How do you solve this problem on first driving in? Most likely, by looking around, watching the people and the streets, and making a couple of guesses, you could find one in no time. The information infrastructure is everyday knowledge of towns (including economic and practical constraints on layout, walking distances, etc., that are embedded in that knowledge), and physical clues that map that general knowledge into this particular town. Information infrastructure can be physical (see www.itp.tsoa.nyu.edu/˜review/current/focus2/open00.html).

More conventionally, our national information infrastructure today includes tens of thousands of public and school libraries all across the country. These libraries are in nearly every elementary, junior high, and



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 243
Page 243 On Interface Specifics An Embedded, Invisible, Every-Citizen Interface Mark Weiser Xerox Palo Alto Research Center The nation's information infrastructure is a vast, loosely connected network of informing resources found mostly in people's everyday lives. When considering interfaces to new electronic information sources, and especially when replacing old information sources with new electronic sources, it is crucial to consider how the existing infrastructure really works. Two examples will help. Consider how you would find a grocery store in a new town. How do you solve this problem on first driving in? Most likely, by looking around, watching the people and the streets, and making a couple of guesses, you could find one in no time. The information infrastructure is everyday knowledge of towns (including economic and practical constraints on layout, walking distances, etc., that are embedded in that knowledge), and physical clues that map that general knowledge into this particular town. Information infrastructure can be physical (see www.itp.tsoa.nyu.edu/˜review/current/focus2/open00.html). More conventionally, our national information infrastructure today includes tens of thousands of public and school libraries all across the country. These libraries are in nearly every elementary, junior high, and

OCR for page 243
Page 244 high school; and they are in nearly every community, even the very small. Of course, many of these libraries are connected to the Internet. But it is very important to consider the other resources provided by these libraries. Thirty-five percent of all library visitors never use the catalog, and 12 percent use no library materials at all, but bring in their own materials. Clearly, libraries do something more than just supply data that could be gotten over the Web (see www.ubiq.com/weiser/SituationalAspectsofElectronicLibraries.html). As the above examples illustrate, the existing information infrastructure often functions without calling itself to our attention. It stays out of sight, effectively not even noticed. So the first challenge for every-citizen interface is to be invisible (what I have called ''calm technology" elsewhere; see www.ubiq.com/weiser/calmtech/calmtech.htm). As the above examples also illustrate, the existing information infrastructure is extremely widespread, found in every nook and cranny of our lives. The second challenge for the every-citizen interface is to be ubiquitous (see www.ubiq.com/ubicomp). Finally, not addressed by the above examples but presumably clear to everyone, the current Internet is just the beginning. I like to think of it by analogy to television channels. Once upon a time we fretted about how we would manage a TV with 500 channels. How could we ever view them all? The Internet will give us 5 billion channels, one for every person on the planet-only about 30 million so far, but more are coming. And soon these channels will be multimedia, multiway video and sound using the Mbone. This kind of interconnection is a deep technical challenge to the current Web infrastructure, which cannot begin to support even a few multiway Mbone connections, much less 5 billion. I consider this to be a user interface issue because it is just this infrastructure that opens up the Web to use by anyone who can point a camera or talk on the phone. The third challenge for the every-citizen interface is to support billions of multiway real-time interactive connections. Of these three challenges I believe that the first is currently the most promising of progress, the one most susceptible to interdisciplinary attack, and the one least well addressed by existing projects. How does a technology become invisible? To some degree, anything can, given enough practice. Invisibility is a property of people, technology, and the situation in which we find ourselves (a tiger invisible in the tall grass stands out at the zoo). Some suggested challenges for developing a "science of invisibility" for a every citizen interface are as follows: • Human information processing includes operations at many different levels, the vast majority of them invisible to our conscious thought

OCR for page 243
Page 245   at any given moment. As we learn a skill, operations that formerly required attention ("turn the steering wheel") become automatic and are replaced by higher level directions ("turn left," "drive to Cincinnati''). Invisible interfaces are those that connect "below" the conscious level, through both learning and natural affinity. What computer interfaces are most appropriate for coupling into a large amount of unconscious information processing? Which ones take a long time to learn but are worth the effort (analogous perhaps to piano playing)? Which ones fit our brain's affinity for information (information browsing as a walk in the woods)? • The difference between something being effectively invisible because it is being processed below conscious thought and something being managed for us (e.g., by a computerized agent) is profound. A key advantage of effective invisibility is the quick refocus from peripheral inattention to center attention. For instance, while ordinarily unaware of engine noises in our car, we suddenly become very aware if the noise should change unexpectedly. We can then focus our attention on the noise and make decisions about its danger, the distance to the nearest expressway exit, what lane to be in, and so on. (A silent car with an intelligent agent monitoring engine condition would keep us from any knowledge at all.) Which computer interfaces do well at keeping something invisible most of the time, but allowing quick recentering when appropriate? Which interfaces let the same information be either in the center of our attention or in the periphery without even clicking a button but simply changing our attention? • The concept of an intelligent agent can be a very powerful one if it does not take over the function of human judgment and our ability to control the focus of our attention. Can we design intelligent agents in our computers that preserve our ability to refocus? If something has been taken over for me, is there a presentation of what has been taken over that I can bring to the fore whenever I like, including retroactively? Can I have agents that filter for me without losing all of the context of the information after the filter? For instance, if I use a computerized newspaper clipping service, can it show me one or two lines of articles that were physically near the ones it clipped for me in the physical newspaper? What kind of context helps, and what doesn't help, when dealing with a computerized agent?

OCR for page 243
Page 246 Intelligent Multimedia Interfaces For "Each" Citizen Mark T. Maybury Mitre Corporation Future interfaces will take advantage of knowledge of the user, task, and discourse and exploit multiple input/output media and associated human sensory modalities to enable intuitive access to information, tools, and other people (see Figure 1). The more effectively computers can process heterogeneous information and automatically acquire knowledge about users, the more efficient they will become at supporting users' frequently changing tasks, particularly information-seeking ones. Information-seeking tasks range from directed search to casual browsing, from looking up facts to predicting trends. Each of these goals can be achieved more or less effectively by the content, form (i.e., media), and environment that support the user. Our emphasis at Mitre Corporation has been on investigating technologies and tools that enable more effective and efficient information interfaces for a variety of application areas, including command and control, intelligence analysis, and education and training. As a consequence of our experience, we believe we should aim not to FIGURE 1 Intelligent interfaces.

OCR for page 243
Page 247 build a one-of-a-kind interface for every citizen, but rather common interfaces that can be tailored to each citizen in accordance with his or her goals, knowledge, skills, and preferences. Challenges Achieving an intelligent interface vision requires addressing some fundamental technology limitations, including: • Lack of a scientific approach to device and user interface design, development, and evaluation. • Lack of interface standards that make it easy to pull out one device and plug in a similar one. • Lack of general mechanisms for (1) interpreting input from multiple devices (e.g., mouse gesture and speech, as in "put that there ‹click›") and (2) automatic generation of coordinated multimedia output. • Lack of general mechanisms for constructing dialogue-based interaction that supports error detection and correction, user models, and discourse models to ensure tailored and robust communication. • Few tools or procedures that facilitate porting language-enabled interfaces to new domains (and/or languages); it remains a time-consuming and knowledge-intensive task. We believe there exist fundamental tasks associated with communication that underlie all interface activities. These can be viewed as a hierarchy of communicative acts, which can be formalized and computationally implemented and by their nature can be realized in multiple modalities (e.g., linguistic/auditory, gestural, visual). The choice among modalities itself is an important, knowledge-based task, as is the broader task of presentation planning. In supporting information access, our efforts have focused on multimedia analysis (in particular, message understanding and video analysis), including its segmentation into atomic units, extraction of objects, facts and relationships, and summarization into compact form. New requirements have arisen (e.g., resulting in multimedia query mechanisms that raise issues such as how to integrate visual and linguistic queries). Multimedia information-seeking tasks remain perhaps the most important but least well understood area. We believe that careful experimental design, use of instrumented environments, and task-based evaluation (e.g., measuring at least time and accuracy (false positives, false negatives)) will yield new insights.

OCR for page 243
Page 248 TABLE 1 Research Recommendations Area State of the Art Near-Term Research Long-Term Research Text processing Commercial named entity extraction (SRA, BBN); many hand-crafted, domain-specific systems for event extraction; large cost to port to new domains; incremental sentence generation, limited document generation. Demonstrate portability of TIPSTER technology to support multilingual information extraction and spoken language; incremental text generation; text summarization; topic detection and tracking. Scaleable, trainable, portable algorithms; document-length text generation. Speech processing Commercial small-vocabulary recognizers (Corona, HARK); large-vocabulary (40,000+ words) recognizers exist in research labs (BBN, SRI, Cambridge University). Speaker, language, and topic identification; prosodic analysis; natural-sounding synthesis. Large-vocabulary, speaker-independent systems for speech-enabled interfaces; large-vocabulary systems for video and radio transcription, for example. Graphics processing Graphical User Interface Toolkits (e.g., object-oriented, reusable window elements such as menus, dialogue boxes). Tools for automated creation of graphical user interface elements; limited research prototypes of automated graphics design. Automated, model-based creation and tailoring of graphical user interfaces. Image/video processing Color, shape, texture-based indexing and query of imagery. Motion-based indexing of imagery and video; video segmentation. Visual information indexing and extraction (e.g., human behavior from video). Gesture processing Two-dimensional mice; eyetrackers; tethered body-motion tracking. Tetherless, three-dimensional gesture, including hand, head, eye, and body-motion tracking. Intentional understanding of gesture; cross-media correlation (with text and speech processing); facial and body gesture recognition.

OCR for page 243
Page 249 Multimedia integration Limited prototypes in research and government. Content selection, media allocation, media coordination, media realization for multimedia generation. Multimedia and multimodal analysis; multimedia and multimodal generation; investigation of less-examined senses (e.g., taction, olfaction). Discourse modeling Limited prototypes in research and government. Error handling (ill-formed and incomplete input/output), two-party conversational model, discourse annotation schemes, discourse data collection and annotation, conversation tracking. Context tracking/dialogue management; multiuser conversation tracking, annotation standards; model-based conversational interaction. User modeling Fragile research prototypes available from academia; one-user modeling shell (BGP-MS). Track user focus and skill level to interact at appropriate level; empirical studies in broad range of tasks in multiple media. Hybrid stereotypical/personalized and symbolic/statistical user models. Visualization Some commercial tools (e.g., NetMap), text-based, limited semantics; computationally intensive, often difficult to use. Improve information access interfaces; visualization generation from extracted (semantic) information; automated graphical encoding of information properties. Multidimensional visualization; multimedia (e.g., text, audio, video) visualization. Collaboration tools Multipoint video, audio, imagery; e-mail-based routing of tasks. Instrument environments for data collection and experimentation; multiparty collaborative communication; investigate asynchronous, distant collaboration (e.g., virtual learning spaces). Field experiments to predict impact of collaborative technology on current work processes; tools for automated analysis of video session recordings; flexible, workflow automation. Intelligent agents Agent communication (e.g., KQML) and exchange languages (e.g., KIF). Mediation tools for heterogeneous distributed access. Shared ontologies; agent integration architectures and/or control languages; agent negotiation.

OCR for page 243
Page 250 A Plan Of Action There are several important recent developments that promise to enable new common facilities to be shared to create more powerful interfaces. A strategy to move forward should include: • Creating architectures and services of an advanced interface server that are defined in the short term using open standard distributed object middleware, namely the Object Management Group's Common Object Request Broker Architecture (CORBA), and that investigate higher-risk architectures, such as agent-based communication and coordination. • Fostering interdisciplinary focused science that investigates the nature of multiple modalities, with an aim to understanding the principles of multiple modalities in order to provide insight into such tasks as multimedia interpretation and the generation of coordinated multimedia output. • Utilizing, refining, integrating, and extending (to additional media) existing architectures for single media, including (1) the TIPSTER architecture (for document detection and information extraction) and associated tag standards (e.g., Penn Treebank part-of-speech tags, proper name tags, coreference annotations) for language source markup and (2) leverage evolving applications programming interface (API) standards in the spoken language industry (e.g., SRAPI, SAPI). • Via an interdisciplinary process, defining common interface tasks and associated evaluation metrics and methods; creating a multimedia corpora and associated markup standards; and fostering interdisciplinary algorithm design, implementation, and evaluation. • Fostering emerging user modeling shells (e.g., BGP-MS) and standards. • Focusing on creation of theoretically neutral discourse modeling shells. • Applying these facilities in an evolutionary fashion to improve existing interfaces, supporting a migration from "dumb" to "smart" interfaces (S. Bayer, personal communication, 1996). • Performing task-based, community-wide evaluation to guide subsequent research, measuring functional improvements (e.g., task completion time, accuracy, quality). Because they affect all who interact with computers, user interfaces are perhaps the single area of computing that can most radically alter the ease, efficiency, and effectiveness of human-computer interactions.

OCR for page 243
Page 251 Recommended Research Table 1 indicates several functional requirements and associated key technologies that need to be investigated to enable a new generation of human-computer interaction, indicating near-term and far-term research investment recommendations. Key areas for research include: • processing and integrating various input/output media (e.g., text, speech and nonspeech audio, imagery, video, graphics, gesture); • methods to acquire, represent, maintain, and exploit models of user, discourse, and media; and • mechanisms that can provide information visualization, support multiuser collaboration, and intelligent agents. Reference Maybury, M.T. (Ed.) 1993. Intelligent Multimedia Interfaces. Menlo Park, Calif: AAAI/MIT Press.

OCR for page 243
Page 252 Interfaces For Understanding Nathan Shedroff vivid studios Over the next 15 years the issues facing interface designers, engineers, programmers, and researchers will become increasingly complex and push farther into currently abstract and, perhaps, esoteric realms. However, we are not without guidance and direction to follow. Our experiences as humans and what little history we have with machines can lead us toward our goals. Computers and related devices in the future will need to exhibit many of the following qualities: • Be more aware of themselves (who and what they are, who they "belong" to, their relationships to other systems, their autonomy, and their capabilities). • Be more aware of their surroundings and audiences (who is there; how many people are present or around; who to "listen" to; where and how to find and contact people for help or to follow directions; who is a "regular"; how to adapt to different people's preferences, needs, goals, skills, interests, etc.). • Offer more help and guidance when needed. • Be more autonomous when needed. • Be more able to help build knowledge as opposed to merely process data. • Be more capable of displaying information in richer forms-both visually and auditorially. • Be more integrated into a participant's work flow or entertainment process. • Be more integrated with other media-especially traditional media like broadcast and print. Funding for research and development, therefore, should concentrate on these issues and their related hardware, software, and understandings. These include research into the following: • Display and visualization systems (high-resolution, portable, and low-power displays; HDTV (high-definition television) and standards in related display industries; integration with input/output devices such as

OCR for page 243
Page 253   scanners, pointing devices, and printers; fast processing systems for n-dimensional data models; standards for these models, display hardware, and software; software capable of easily configuring and experimenting with visualizations and simulations; etc.). • Perceptual systems (proximity, sounds, motion, and electronic "eyes" for identification and awareness; standards for formating instructions and specifications to help systems "understand" what they are, what is around them, what and who they can communicate with, and what they are capable of; facilities for obtaining help when necessary; ways of identifying participants by their behavior, gestures, or other attributes; etc.). • Communications systems (standards, hardware, and software to help participants communicate better with each other-as well as with computers; natural-language interfaces-spoken and written-and translation systems to widen the opportunities of involvement to more people; hardware and software solutions for increasing bandwidth and improving the reliability, security, privacy, and scalability of existing communications infrastructure; etc.). • Understanding of understanding (information and knowledge-building applications; understandings about how people create context and meaning, transform data into information, create knowledge for themselves, and build wisdom; software to help facilitate these processes; standards to help transmit and share information and knowledge with connections intact; etc.). • Understanding of interaction (a wider definition of interaction used in the "industry," how participants define and perceive "interactivity"; what they expect and need in interactivity; historical examples of interaction; lessons from theater, storytelling, conversation, improvisation, the performing arts, and the entertainment industry; etc.). • Increased education (of both participants and audiences, as well as professionals, and the industry). • Better resources for understanding cultural diversity (in terms of gestures, languages, perceptions, and needs of different age, gender, cultural, and nationality groups). In addition, there are some procedural approaches to these undertakings that can help the overall outcomes to be more valuable: • Reduced duplication of research and development by government-sponsored grants and institutions (requiring the disclosure, sharing, and reporting of research efforts, problems, and solutions). • More means of coordination and knowledge sharing of research and development scholars and professionals (whether government-sponsored or not).

OCR for page 243
Page 296 Technology to Manual Manufacturing Processes, Proceedings of the Hawaii International Conference on System Science, January. Cypher, A. (Ed.). 1993. Watch What I Do: Programming by Demonstration, MIT Press, Cambridge, Mass. Feiner, S., Litman, D., McKeown, K., and Passonneau, R. 1993a. Towards Coordinated Temporal Multimedia Presentations. Intelligent Multimedia Interfaces, M. Maybury (Ed.), pp. 139-147. AAAI/MIT Press, Menlo Park, Calif. Feiner, S., MacIntyre, B., and Seligmann, D. 1993b. Knowledge-Based Augmented Reality. Communications of the ACM 36(7):52-2. Gershon, N., and Eick, S. (Eds.). 1995. Proc. Information Visualization '95. IEEE Computer Society Press, Los Alamitos, Calif. Kobsa, A., and Wahlster, W. (Eds.). 1989. User Models in Dialogue Systems. Springer-Verlag, Berlin. Kramer, G. (Ed.). 1994. Auditory Display: Sonification, Audification, and Auditory Interfaces. Addison-Wesley, Reading, Mass. Laurel, B. 1993. Computers as Theatre. Addison-Wesley, Reading, Mass. MacIntyre, B., and Feiner, S. 1996. Future Multimedia User Interfaces. Multimedia Systems. Maybury, M. (Ed). 1993. Intelligent Multimedia Interfaces. AAAI/MIT Press, Menlo Park, Calif. Sullivan, J., and Tyler, S. (Eds.). 1991. Intelligent User Interfaces. Addison-Wesley, Reading, Mass. VRML (Virtual Reality Modeling Language). 1996. The VRML Forum (available on-line at http://vrml.wired.com/). Weiser, M. 1991. The Computer for the 21st Century. Scientific American 265(3):94-104.

OCR for page 243
Page 297 Nomadicity, Disability Access, And The Every-Citizen Interface Gregg C. Vanderheiden University of Wisconsin-Madison The Challenge With the rapid evolution of the national information infrastructure (NII) and the global information infrastructure (GII), attention has turned to the issue of information equality and universal access. Basically, if information systems become as integral to our future life-styles as electricity is today, access to these systems will be essential for people to have equal access to education, employment, and even daily entertainment or enrichment activities. Although the goal of equal access seems noble, it can seem somewhat less achievable when one considers the full range of abilities or disabilities which must be dealt with to achieve an every-citizen interface. It must be usable even if people • cannot see very well-or at all; • cannot hear very well-or at all; • cannot read very well-or at all; • cannot move their heads or arms very well-or at all; • cannot speak very well-or at all; • cannot feel with their fingers very well-or at all; • are short, are tall, use a wheelchair, and so forth; • cannot remember well; • have difficulty learning or figuring things out; • have little or no technological inclination or ability; and/or • have any combination of these difficulties (e.g., are deaf-blind; have reduced visual, hearing, physical, or cognitive abilities, which occurs in many older individuals). In addition, the products and their interfaces must remain equally efficient and easy to use and understand for those who (1) have no problems seeing, hearing, moving, remembering, and so forth; and (2) are power users.

OCR for page 243
Page 298 Is It Possible A list like this can bring a designer up short. At first blush, it appears that even if such an interface was possible it would be impractical or inefficient to use for people with all of their abilities intact. Packages such as the EZ Access approach developed for kiosks (http://trace.wisc.edu/world/kiosk), PDAs (personal digital assistants), and other touchscreen devices, however, demonstrate how close we can come to such an ideal, at least for some types of devices or systems. Using a combination of Talking Fingertip and Speed List technologies, the EZ Access package (for information, see http://trace.wisc.edu/TEXT/KIOSK/MINIMUM.HTM) provides efficient access for individuals with low vision, blindness, and poor or no reading skills. A ShowSounds/caption feature provides access for individuals with hearing impairments or deafness, as well as access for all users in very noisy locations. An infrared link allows the system to be used easily with alternate displays and controllers, so that even individuals who are deaf-blind or paralyzed can access and use the system. Thus, with a relatively modest set of interface variations, almost all the needs listed above can be addressed. Is It Practical Practicality is a complex issue which involves cost, complexity, impact on overall marketability, support, and so forth. To use the EZ Access approach as an example, the hardware cost to provide all of these dimensions of accessibility to a standard multimedia kiosk is less than 1 percent of the cost of the kiosk. Addition of this technique does not affect the standard or traditional mode of operation of the kiosk at all. At the same time, it makes the system usable by many visitors as well as new citizens whose native language is not English, and who may have some difficulty with words. Implementing cross-disability interface strategies can take only a few days with the proper tools. EZ Access techniques are currently used on commercial kiosks in the Mall of America and other locations. Other examples of built-in accessibility are the access features that are built into every Macintosh- and Windows 95-based computer. Thus, if done properly, interfaces that are flexible or adjustable enough to address a wide range of individuals can be very practical. There are, however, approaches to provide additional access or access for additional populations that are not currently practical (e.g., building $2,000 dynamic braille displays into every terminal or kiosk). In these cases, the most practical approach may be to make the information and control necessary for operation of the device available on a standard connector so that a person who is deaf and blind can connect a braille display

OCR for page 243
Page 299 and keyboard. Practicality also is a function of the way the access features relate to and reinforce the overall interface goals of the product. How Does An Every-Citizen Interface Relate To Nomadic Systems The devices of tomorrow, which might be referred to as TeleTransInfoCom (tele-transaction/information/communication) devices, will operate in a wide range of environments. Miniaturization, advances in wireless communication, and thin-client architectures are rapidly eliminating the need to be tied to a workstation or carry a large device in order to have access to computing, communication, and information services and functions. As a result, we will need interfaces for use while driving a car, sitting in an easy chair, sitting in a library, participating in a meeting, walking down the street, sitting on the beach, walking through a noisy shopping mall, taking a shower, or relaxing in a bathtub, as well as sitting at a desk. The interfaces also will have to be usable in hostile environments-when camping or hiking, in factories or shopping malls at Christmas time. Many of us will also need to access our information appliance (or appliances) in very different environments on the same day-perhaps even during the same communication or interaction activity. These different environments will place constraints on the type of physical and sensory input and output techniques that work (e.g., it is difficult to use a keyboard when walking; it is difficult and dangerous to use visual displays when driving a car; speech input and output, which work fine in a car, may not be usable in a shared office environment, a noisy mall, a meeting, or a library). Systems designed to work across these environments will therefore require flexible input/output options to work in different environments. The interface variations, however, must operate in essentially the same way, even though they may be quite different (visual versus aural). Users will not want to master three or four interface paradigms in order to operate their devices in different environments. The metaphor(s) and the "look and feel" must be continuous even though the devices operate entirely visually at one point (e.g., in a meeting) or entirely aurally at another (e.g., while driving a car). Many users will also want to be able to move from one environment to another, one device to another (e.g., workstation to hand-held), and one mode to another (e.g., visual to voice) in the midst of a task. Does Nomadicity Equal Disability Accessible It is interesting to note that most of the issues regarding access for

OCR for page 243
Page 300 people with disabilities will be addressed if we simply address the issues raised by the range of environments described above: • When we create interfaces that work well in noisy environments such as airplanes, construction sites, or shopping malls at Christmas, and for people who must listen to something else while they use their device, we will have created interfaces that work well for people who cannot hear well or at all. • When we create interfaces that work well for people who are driving a car or doing something that makes it unsafe to look at the device they are operating, we will have created interfaces that can be used by people who cannot see. • As we develop very small pocket and wearable devices for circumstances in which it is difficult to use a full-sized keyboard or even a large number of keys, we will have developed techniques that can be used by individuals with some types of physical disabilities. • When we create interfaces that can be used by someone whose hands are occupied, we will have systems that are accessible to people who cannot use their hands. • When we create interfaces for individuals who are tired, under stress, under the influence of drugs (legal or illegal), or simply in the midst of a traumatic event or emergency (and who may have little ability to concentrate or deal with complexity), we will have interfaces that can be used by people with naturally reduced abilities to concentrate or deal with complexity. Thus, although there may be residual specifics concerning disability access that must be covered, the bulk of the issues involved are addressed automatically through the process of developing environment/situation-independent (modality-independent) interfaces. What Is Needed Interfaces that are independent of the environment or the individual must have the following attributes: • Wide variability in order to meet the diversity of tasks that will be addressed. Some interfaces will have to deal only with text capture, transmission, and display. Others will have to deal with display, editing, and manipulation of audiovisual materials. Some may involve VR (virtual reality), but basically be shop-and-select strategies. Others may require full immersion, such as data visualization and tele-presence. • Modality independence. Interfaces have to allow the user to choose sensory modalities appropriate to the environment, situation, or user.

OCR for page 243
Page 301   Text-based systems will allow users to display information visually at some times and aurally at others, on high-resolution displays when available and on smaller low-resolution displays when necessary. • Flexibility/adaptability. Interfaces will be required that can take advantage of fine motor movements and three-dimensional gestures when a user's situation or abilities allow but can also be operated by using speech, keyboard, or other input techniques when this is necessary because of the environment, the user's activities, or any motor constraints. • Straight forwardness and ease of use. As much of the population as possible must be able to use these interfaces and to master new functions and capabilities as they are introduced. Some Components Necessary To Achieve Every-Citizen Interfaces Although this section does not address all possible interface types, particularly freehand graphic production interfaces (e.g. painting), it does address the majority of command-and-control interfaces. 1. Modality Independence. For a device or system to be modality independent or alt-modal (i.e., the user can choose between alternate sensory modalities when operating the device), two things are necessary:   a. All of the basic information must be stored and available in either modality-independent or modality-redundant form.     Modality independent refers to information that is stored in a form that is not tied to any particular form of presentation. For example, ASCII text is not inherently visual, auditory, or tactile. It can be presented easily on a visual display or printer (visually), through a voice synthesizer (aurally), or through a dynamic braille display or braille printer (tactually).     Modality redundant refers to information that is stored in multiple modalities. For example, a movie might include a visual description of the audio track (e.g., caption) and an audio and electronic text description of the video track so that all (or essentially all) information can be presented visually, aurally, or tactually at the user's request based on need, preference, or environmental situation.   b. The system must be able to display data or information in different modalities. That is, it should provide a mechanism for displaying information in all-visual, or all-auditory, or mixed audiovisual form as well as in electronic form. 2. Flexibility/Adjustability. The device must also offer alternate selection techniques that can accommodate varying physical and sensory abilities arising from the personal environment or situation (e.g., walking,

OCR for page 243
Page 302     wearing heavy gloves), and/or personal abilities. Suggested alternate operating modes follow:   • Standard mode. This mode often uses multiple simultaneous senses and fine motor movements. It would offer the most effective device for individuals who have no restrictions on their abilities (due to task, environment, or disability).   • A list mode. In this mode, the user can call up a list of all the information and action items and use the list to select items for presentation or action. It would not require vision to operate. It could be operated using an analog transducer to allow the individual to move up and down within a list, or a keyboard or arrow keys combined with a confirm button could be used. This mode can be used by individuals who are unable to see or look at a device.   • External list mode. This would make the list available externally through a software or hardware port (e.g., infrared port) and accept selections through the same port. It can be used by individuals who are unable to see and hear the display and therefore must access it from an external auxiliary interface. This would include artificial intelligent agents, which are unable to process visual or auditory information that is unavailable in text form.   • Select and confirm mode. This allows individuals to obtain information about items without activating them (a separate confirm action is used to activate items after they are selected). It can be used by individuals with reading difficulties, low vision, or physical movement problems, as well as by individuals in unstable environments or whose movements are awkward due to heavy clothing or other factors.   • Auto-step scanning mode. This presents the individual items in groups or sequentially for the user to select. It can be used by individuals with severe movement limitations or movement and visual constraints (e.g., driving a car), and when direct selection (e.g., speech input) techniques are not practicable.   • Direct text control techniques. These include keyboard or speech input. Example: Using A Uni-List-Based Architecture As Part Of The Interface One approach to device design that would support this type of flexibility is the Uni-List architecture. By maintaining a continually updated listing of all the information items currently available to the user, as well as all the actions or commands available, it is possible to provide a very flexible and adjustable user interface relatively easily. All the techniques

OCR for page 243
Page 303 listed above are easy to implement with such an architecture, and it can be applied to a range of devices or systems. Take, for example, a three-dimensional (3D) virtual reality-based shopping mall. In such an application, a database is used to provide the information needed to generate the image seen by the user and the responses to user movements or actions on objects in the view. If properly constructed, this database could also provide a continually updated listing of all objects in view as well as information about any actionable objects presented to the user at any time. By including verbal (e.g., text) information about the various objects and items, this 3D virtual shopping system can be navigated and used in a variety of ways to accommodate a variety of users or situations. • Individuals who are unable to see the screen (because they are driving their car, their eyes are otherwise occupied, or they are blind) can have the information and choices presented vocally (or via braille). They can then select items from the list in order to act on them, in much the same that an individual can pick up or "click on" an object in the environment. • Individuals with movement disabilities can have a highlight or "sprite" step around to the objects, or they could indicate the approximate location and have the items in that location highlighted individually (other methods for disambiguating also could be used) to select the desired item. • Individuals who are unable to read can touch or select any printed text presented and have it read aloud to them. • Individuals with low vision (or who do not have their glasses) can use the system in the same way as a fully sighted individual. When they are unable to see well enough to identify the objects, they can switch into a mode that lets them touch the objects (without activating them) and can thereby have them named or described. • Individuals who are deaf-blind could use the device in the same fashion as an individual who is blind. Instead of the information being spoken, however, it could be sent to the individual's dynamic braille display. Additional Benefits Of Flexible, Modality-Independent Architectures And Data Formats The two key underlying strategies for providing more universal access are input and display flexibility and the companion availability of information in sensory/modality-independent or parallel form.

OCR for page 243
Page 304 Both input and display flexibility and presentation independence have additional benefits beyond the every-citizen interface. These include the following: • Nomadicity support (discussed above). • Searchability. Graphic and auditory information that contains text streams can be indexed and found by using standard text-based search engines, which not only can locate items but also can jump to particular points within a movie or a sound file. • Alternate client support. The same information can be stored and served to different types of telecommunication and information devices. For example, information could be accessed graphically over the Internet, via a telephone by using a verbal form, or even by intelligent (or not so intelligent) agents using electronic text form. • Display flexibility. Presentation-independent information also tends to be display size independent, allowing it to be more easily accessed using very small, low-resolution displays. (In fact, some low-resolution displays present exactly the same issues as low vision.) • Low bandwidth. The ability to switch to text or verbal presentation can speed access over low-bandwidth connections. • Future Support. Modality-independent servers will also be better able to handle future serving needs that may involve access to information using different modalities. Creating a legacy system that cannot handle or serve information in different modalities may necessitate a huge rework job in the future as systems evolve and are deployed. Limitations Today, most of the universal access strategies are limited to information that can easily be presented verbally (in words). However, although the Grand Canyon could be presented in three dimensions through virtual reality, its full impact cannot be captured in words, nor can a Picasso painting or Mahler symphony easily be made sensory modality independent. Also, although planes could be designed to fly themselves, we do not as yet know how to allow a user who is blind to control directly flight that currently requires eye-hand coordination (or its equivalent). There are also situations in which the underlying task requires greater cognitive skills than an individual may possess, regardless of the cognitive skills required to operate the interface. It may be a while before we resolve some of these limitations to access. On the other hand, we also have many examples where interfaces that were previously thought to be unusable by individuals with a particular disability, were later made easily accessible. The difference was

OCR for page 243
Page 305 simply the presence or absence of an idea. The challenge, therefore, is to discover and develop strategies and tools that can make next-generation interfaces accessible to and usable by greater numbers of individuals and easier for all to use. Summary Through the incorporation of presentation-independent data structures, an available information/command menu, and several easy-to-program selection options, it is possible to create interfaces that begin to approximate the anytime-anywhere-anyone (AAA) interface goal. Some interfaces of this type have been constructed and are now being used in public information kiosks to provide access to individuals with a wide range of abilities. The same strategies can be incorporated into next-generation TeleTransInfoCom devices to provide users with the nomadicity they will require in next-generation Internet appliances. Before long, individuals will look for systems that allow them to begin an important communication at their desk, continue it as they walk to their car, and finish it while driving to their next appointment. Similarly, users will want the ability to move freely between high- and low-bandwidth systems to meet their needs and circumstances. They will want to access their information databases by using visual displays and perhaps advanced data visualization and navigation strategies while at a desk, but auditory-only systems as they walk to their next appointment. They may even wish to access their personal rolodexes or people databases while engaged in conversations at a social gathering (by using a pocket keypad and an earphone to ask, What is Mary Jones' husband's name?). The approaches discussed will also allow these systems to address issues of equity such as providing access to those with disabilities or those with lower-technology and lower-bandwidth devices and providing support for intelligent (or not-so-intelligent) agent software. The AAA strategies discussed here do not provide full cross-environment access to all types of interface or information systems. In particular, as noted above, fully immersive systems that presented inherently graphic (e.g., paintings) or auditory (e.g., symphonies) information will not be accessible to anyone who does not employ the primary senses for which this information was prepared (text descriptions are insufficient). However, the majority of today's information and most services can be made available through these approaches, and extensions may provide access to even more. Finally, it is important to note that not only do environment/situation-independent interfaces and disability-accessible interfaces appear to be closely related, but also one of the best ways to explore environment/situation-independent

OCR for page 243
Page 306 nomadic interface strategies may be the exploration of past and developing means for providing cross-disability access to computer and information systems. Challenges And Research Areas For a system to be more accessible to and usable by every citizen, it must be (1) perceivable, (2) operable, and (3) understandable. The following areas of research can help to address these needs: • Data structures, compression, and transport formats that allow the incorporation of alternate modalities or modality-independent data (e.g., text embedded in sound files or graphic files); • Techniques and architectures for partial serving of information, (such as the ability to fetch only the visual, the auditory, the text, or any combination of these tracks from a multimedia file or to fetch one part of a file from one location and another part from a second location (e.g., fetching a movie from one location and the captions from another); • Modality substitution strategies (e.g., techniques for restructuring data so that ear-hand coordination can be substituted for eye-hand coordination); • Natural language interfaces (e.g., the ability to have information presented conversationally and to control products with conversation, whether via speech or ''typed" text); • Alternate, substitute, and remote interface communication protocols (e.g., robust communication protocols that allow sensory- and presentation-independent alternate interfaces to be connected to and used with devices having less flexible interfaces); • Voice-tolerant speech recognition (ability to deal with disarthric and deaf speech); • Dynamic tactile displays (two- and three-dimensional tactile and force feedback 3D); • Better random access to information/functions (instead of tree walking); and • Speed-List (e.g., EZ Access) equivalent access to structured VR environments.