| ||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||
| Copyright © 2009. National Academy of Sciences. All rights reserved. Terms of Use and Privacy Statement |
Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 243
Page 243
On Interface Specifics
An Embedded, Invisible, Every-Citizen
Interface
Mark Weiser
Xerox Palo Alto Research Center
The nation's information infrastructure is a vast, loosely
connected network of informing resources found mostly in people's
everyday lives. When considering interfaces to new electronic
information sources, and especially when replacing old information
sources with new electronic sources, it is crucial to consider how
the existing infrastructure really works. Two examples will
help.
Consider how you would find a grocery store in a new town. How
do you solve this problem on first driving in? Most likely, by
looking around, watching the people and the streets, and making a
couple of guesses, you could find one in no time. The information
infrastructure is everyday knowledge of towns (including economic
and practical constraints on layout, walking distances, etc., that
are embedded in that knowledge), and physical clues that map that
general knowledge into this particular town. Information
infrastructure can be physical (see
www.itp.tsoa.nyu.edu/˜review/current/focus2/open00.html).
More conventionally, our national information infrastructure
today includes tens of thousands of public and school libraries all
across the country. These libraries are in nearly every elementary,
junior high, and
OCR for page 244
Page 244
high school; and they are in nearly every community, even the
very small. Of course, many of these libraries are connected to the
Internet. But it is very important to consider the other resources
provided by these libraries. Thirty-five percent of all library
visitors never use the catalog, and 12 percent use no library
materials at all, but bring in their own materials. Clearly,
libraries do something more than just supply data that could be
gotten over the Web (see
www.ubiq.com/weiser/SituationalAspectsofElectronicLibraries.html).
As the above examples illustrate, the existing information
infrastructure often functions without calling itself to our
attention. It stays out of sight, effectively not even noticed. So
the first challenge for every-citizen interface is to be invisible
(what I have called ''calm technology" elsewhere; see
www.ubiq.com/weiser/calmtech/calmtech.htm).
As the above examples also illustrate, the existing information
infrastructure is extremely widespread, found in every nook and
cranny of our lives. The second challenge for the every-citizen
interface is to be ubiquitous (see www.ubiq.com/ubicomp).
Finally, not addressed by the above examples but presumably
clear to everyone, the current Internet is just the beginning. I
like to think of it by analogy to television channels. Once upon a
time we fretted about how we would manage a TV with 500 channels.
How could we ever view them all? The Internet will give us 5
billion channels, one for every person on the planet-only about 30
million so far, but more are coming. And soon these channels will
be multimedia, multiway video and sound using the Mbone. This kind
of interconnection is a deep technical challenge to the current Web
infrastructure, which cannot begin to support even a few multiway
Mbone connections, much less 5 billion. I consider this to be a
user interface issue because it is just this infrastructure that
opens up the Web to use by anyone who can point a camera or talk on
the phone. The third challenge for the every-citizen interface is
to support billions of multiway real-time interactive
connections.
Of these three challenges I believe that the first is currently
the most promising of progress, the one most susceptible to
interdisciplinary attack, and the one least well addressed by
existing projects. How does a technology become invisible? To some
degree, anything can, given enough practice. Invisibility is a
property of people, technology, and the situation in which we find
ourselves (a tiger invisible in the tall grass stands out at the
zoo).
Some suggested challenges for developing a "science of
invisibility" for a every citizen interface are as follows:
•
Human information processing includes operations
at many different levels, the vast majority of them invisible to
our conscious thought
OCR for page 245
Page 245
at any given moment. As we learn a
skill, operations that formerly required attention ("turn the
steering wheel") become automatic and are replaced by higher level
directions ("turn left," "drive to Cincinnati''). Invisible
interfaces are those that connect "below" the conscious level,
through both learning and natural affinity. What computer
interfaces are most appropriate for coupling into a large amount of
unconscious information processing? Which ones take a long time to
learn but are worth the effort (analogous perhaps to piano
playing)? Which ones fit our brain's affinity for information
(information browsing as a walk in the woods)?
•
The difference between something being effectively
invisible because it is being processed below conscious thought and
something being managed for us (e.g., by a computerized agent) is
profound. A key advantage of effective invisibility is the quick
refocus from peripheral inattention to center attention. For
instance, while ordinarily unaware of engine noises in our car, we
suddenly become very aware if the noise should change unexpectedly.
We can then focus our attention on the noise and make decisions
about its danger, the distance to the nearest expressway exit, what
lane to be in, and so on. (A silent car with an intelligent agent
monitoring engine condition would keep us from any knowledge at
all.) Which computer interfaces do well at keeping something
invisible most of the time, but allowing quick recentering when
appropriate? Which interfaces let the same information be either in
the center of our attention or in the periphery without even
clicking a button but simply changing our attention?
•
The concept of an intelligent agent can be a very
powerful one if it does not take over the function of human
judgment and our ability to control the focus of our attention. Can
we design intelligent agents in our computers that preserve our
ability to refocus? If something has been taken over for me, is
there a presentation of what has been taken over that I can bring
to the fore whenever I like, including retroactively? Can I have
agents that filter for me without losing all of the context of the
information after the filter? For instance, if I use a computerized
newspaper clipping service, can it show me one or two lines of
articles that were physically near the ones it clipped for me in
the physical newspaper? What kind of context helps, and what
doesn't help, when dealing with a computerized agent?
OCR for page 246
Page 246
Intelligent Multimedia Interfaces For
"Each" Citizen
Mark T. Maybury
Mitre Corporation
Future interfaces will take advantage of knowledge of the user,
task, and discourse and exploit multiple input/output media and
associated human sensory modalities to enable intuitive access to
information, tools, and other people (see Figure 1). The more
effectively computers can process heterogeneous information and
automatically acquire knowledge about users, the more efficient
they will become at supporting users' frequently changing tasks,
particularly information-seeking ones. Information-seeking tasks
range from directed search to casual browsing, from looking up
facts to predicting trends. Each of these goals can be achieved
more or less effectively by the content, form (i.e., media), and
environment that support the user. Our emphasis at Mitre
Corporation has been on investigating technologies and tools that
enable more effective and efficient information interfaces for a
variety of application areas, including command and control,
intelligence analysis, and education and training. As a consequence
of our experience, we believe we should aim not to
FIGURE 1 Intelligent interfaces.
OCR for page 247
Page 247
build a one-of-a-kind interface for every citizen, but rather
common interfaces that can be tailored to each citizen in
accordance with his or her goals, knowledge, skills, and
preferences.
Challenges
Achieving an intelligent interface vision requires addressing
some fundamental technology limitations, including:
•
Lack of a scientific approach to device and user
interface design, development, and evaluation.
•
Lack of interface standards that make it easy to
pull out one device and plug in a similar one.
•
Lack of general mechanisms for (1) interpreting
input from multiple devices (e.g., mouse gesture and speech, as in
"put that there ‹click›") and (2) automatic
generation of coordinated multimedia output.
•
Lack of general mechanisms for constructing
dialogue-based interaction that supports error detection and
correction, user models, and discourse models to ensure tailored
and robust communication.
•
Few tools or procedures that facilitate porting
language-enabled interfaces to new domains (and/or languages); it
remains a time-consuming and knowledge-intensive task.
We believe there exist fundamental tasks associated with
communication that underlie all interface activities. These can be
viewed as a hierarchy of communicative acts, which can be
formalized and computationally implemented and by their nature can
be realized in multiple modalities (e.g., linguistic/auditory,
gestural, visual). The choice among modalities itself is an
important, knowledge-based task, as is the broader task of
presentation planning.
In supporting information access, our efforts have focused on
multimedia analysis (in particular, message understanding and video
analysis), including its segmentation into atomic units, extraction
of objects, facts and relationships, and summarization into compact
form. New requirements have arisen (e.g., resulting in multimedia
query mechanisms that raise issues such as how to integrate visual
and linguistic queries). Multimedia information-seeking tasks
remain perhaps the most important but least well understood area.
We believe that careful experimental design, use of instrumented
environments, and task-based evaluation (e.g., measuring at least
time and accuracy (false positives, false negatives)) will yield
new insights.
OCR for page 248
Page 248
TABLE 1 Research Recommendations
Area
State of the Art
Near-Term Research
Long-Term Research
Text processing
Commercial named entity extraction (SRA, BBN);
many hand-crafted, domain-specific systems for event extraction;
large cost to port to new domains; incremental sentence generation,
limited document generation.
Demonstrate portability of TIPSTER technology to
support multilingual information extraction and spoken language;
incremental text generation; text summarization; topic detection
and tracking.
Scaleable, trainable, portable algorithms;
document-length text generation.
Speech processing
Commercial small-vocabulary recognizers (Corona,
HARK); large-vocabulary (40,000+ words) recognizers exist in
research labs (BBN, SRI, Cambridge University).
Speaker, language, and topic identification;
prosodic analysis; natural-sounding synthesis.
Large-vocabulary, speaker-independent systems for
speech-enabled interfaces; large-vocabulary systems for video and
radio transcription, for example.
Graphics processing
Graphical User Interface Toolkits (e.g.,
object-oriented, reusable window elements such as menus, dialogue
boxes).
Tools for automated creation of graphical user
interface elements; limited research prototypes of automated
graphics design.
Automated, model-based creation and tailoring of
graphical user interfaces.
Image/video processing
Color, shape, texture-based indexing and query of
imagery.
Motion-based indexing of imagery and video; video
segmentation.
Visual information indexing and extraction (e.g.,
human behavior from video).
Gesture processing
Two-dimensional mice; eyetrackers; tethered
body-motion tracking.
Tetherless, three-dimensional gesture, including
hand, head, eye, and body-motion tracking.
Intentional understanding of gesture; cross-media
correlation (with text and speech processing); facial and body
gesture recognition.
OCR for page 249
Page 249
Multimedia integration
Limited prototypes in research and government.
Content selection, media allocation, media
coordination, media realization for multimedia generation.
Multimedia and multimodal analysis; multimedia and
multimodal generation; investigation of less-examined senses (e.g.,
taction, olfaction).
Discourse modeling
Limited prototypes in research and government.
Error handling (ill-formed and incomplete
input/output), two-party conversational model, discourse annotation
schemes, discourse data collection and annotation, conversation
tracking.
Context tracking/dialogue management; multiuser
conversation tracking, annotation standards; model-based
conversational interaction.
User modeling
Fragile research prototypes available from
academia; one-user modeling shell (BGP-MS).
Track user focus and skill level to interact at
appropriate level; empirical studies in broad range of tasks in
multiple media.
Hybrid stereotypical/personalized and
symbolic/statistical user models.
Visualization
Some commercial tools (e.g., NetMap), text-based,
limited semantics; computationally intensive, often difficult to
use.
Improve information access interfaces;
visualization generation from extracted (semantic) information;
automated graphical encoding of information properties.
Multidimensional visualization; multimedia (e.g.,
text, audio, video) visualization.
Collaboration tools
Multipoint video, audio, imagery; e-mail-based
routing of tasks.
Instrument environments for data collection and
experimentation; multiparty collaborative communication;
investigate asynchronous, distant collaboration (e.g., virtual
learning spaces).
Field experiments to predict impact of
collaborative technology on current work processes; tools for
automated analysis of video session recordings; flexible, workflow
automation.
Intelligent agents
Agent communication (e.g., KQML) and exchange
languages (e.g., KIF).
Mediation tools for heterogeneous distributed
access.
Shared ontologies; agent integration architectures
and/or control languages; agent negotiation.
OCR for page 250
Page 250
A Plan Of Action
There are several important recent developments that promise to
enable new common facilities to be shared to create more powerful
interfaces. A strategy to move forward should include:
•
Creating architectures and services of an advanced
interface server that are defined in the short term using open
standard distributed object middleware, namely the Object
Management Group's Common Object Request Broker Architecture
(CORBA), and that investigate higher-risk architectures, such as
agent-based communication and coordination.
•
Fostering interdisciplinary focused science that
investigates the nature of multiple modalities, with an aim to
understanding the principles of multiple modalities in order to
provide insight into such tasks as multimedia interpretation and
the generation of coordinated multimedia output.
•
Utilizing, refining, integrating, and extending
(to additional media) existing architectures for single media,
including (1) the TIPSTER architecture (for document detection and
information extraction) and associated tag standards (e.g., Penn
Treebank part-of-speech tags, proper name tags, coreference
annotations) for language source markup and (2) leverage evolving
applications programming interface (API) standards in the spoken
language industry (e.g., SRAPI, SAPI).
•
Via an interdisciplinary process, defining common
interface tasks and associated evaluation metrics and methods;
creating a multimedia corpora and associated markup standards; and
fostering interdisciplinary algorithm design, implementation, and
evaluation.
•
Fostering emerging user modeling shells (e.g.,
BGP-MS) and standards.
•
Focusing on creation of theoretically neutral
discourse modeling shells.
•
Applying these facilities in an evolutionary
fashion to improve existing interfaces, supporting a migration from
"dumb" to "smart" interfaces (S. Bayer, personal communication,
1996).
•
Performing task-based, community-wide evaluation
to guide subsequent research, measuring functional improvements
(e.g., task completion time, accuracy, quality).
Because they affect all who interact with computers, user
interfaces are perhaps the single area of computing that can most
radically alter the ease, efficiency, and effectiveness of
human-computer interactions.
OCR for page 251
Page 251
Recommended Research
Table 1 indicates several functional requirements and associated
key technologies that need to be investigated to enable a new
generation of human-computer interaction, indicating near-term and
far-term research investment recommendations. Key areas for
research include:
•
processing and integrating various input/output
media (e.g., text, speech and nonspeech audio, imagery, video,
graphics, gesture);
•
methods to acquire, represent, maintain, and
exploit models of user, discourse, and media; and
•
mechanisms that can provide information
visualization, support multiuser collaboration, and intelligent
agents.
Reference
Maybury, M.T. (Ed.) 1993. Intelligent Multimedia
Interfaces. Menlo Park, Calif: AAAI/MIT Press.
OCR for page 252
Page 252
Interfaces For Understanding
Nathan Shedroff
vivid studios
Over the next 15 years the issues facing interface designers,
engineers, programmers, and researchers will become increasingly
complex and push farther into currently abstract and, perhaps,
esoteric realms. However, we are not without guidance and direction
to follow. Our experiences as humans and what little history we
have with machines can lead us toward our goals.
Computers and related devices in the future will need to exhibit
many of the following qualities:
•
Be more aware of themselves (who and what they
are, who they "belong" to, their relationships to other systems,
their autonomy, and their capabilities).
•
Be more aware of their surroundings and audiences
(who is there; how many people are present or around; who to
"listen" to; where and how to find and contact people for help or
to follow directions; who is a "regular"; how to adapt to different
people's preferences, needs, goals, skills, interests, etc.).
•
Offer more help and guidance when needed.
•
Be more autonomous when needed.
•
Be more able to help build knowledge as opposed to
merely process data.
•
Be more capable of displaying information in
richer forms-both visually and auditorially.
•
Be more integrated into a participant's work flow
or entertainment process.
•
Be more integrated with other media-especially
traditional media like broadcast and print.
Funding for research and development, therefore, should
concentrate on these issues and their related hardware, software,
and understandings. These include research into the following:
•
Display and visualization systems
(high-resolution, portable, and low-power displays; HDTV
(high-definition television) and standards in related display
industries; integration with input/output devices such as
OCR for page 253
Page 253
scanners, pointing devices, and
printers; fast processing systems for
n-dimensional data models; standards for
these models, display hardware, and software; software capable of
easily configuring and experimenting with visualizations and
simulations; etc.).
•
Perceptual systems (proximity, sounds, motion, and
electronic "eyes" for identification and awareness; standards for
formating instructions and specifications to help systems
"understand" what they are, what is around them, what and who they
can communicate with, and what they are capable of; facilities for
obtaining help when necessary; ways of identifying participants by
their behavior, gestures, or other attributes; etc.).
•
Communications systems (standards, hardware, and
software to help participants communicate better with each other-as
well as with computers; natural-language interfaces-spoken and
written-and translation systems to widen the opportunities of
involvement to more people; hardware and software solutions for
increasing bandwidth and improving the reliability, security,
privacy, and scalability of existing communications infrastructure;
etc.).
•
Understanding of understanding (information and
knowledge-building applications; understandings about how people
create context and meaning, transform data into information, create
knowledge for themselves, and build wisdom; software to help
facilitate these processes; standards to help transmit and share
information and knowledge with connections intact; etc.).
•
Understanding of interaction (a wider definition
of interaction used in the "industry," how participants define and
perceive "interactivity"; what they expect and need in
interactivity; historical examples of interaction; lessons from
theater, storytelling, conversation, improvisation, the performing
arts, and the entertainment industry; etc.).
•
Increased education (of both participants and
audiences, as well as professionals, and the industry).
•
Better resources for understanding cultural
diversity (in terms of gestures, languages, perceptions, and needs
of different age, gender, cultural, and nationality groups).
In addition, there are some procedural approaches to these
undertakings that can help the overall outcomes to be more
valuable:
•
Reduced duplication of research and development by
government-sponsored grants and institutions (requiring the
disclosure, sharing, and reporting of research efforts, problems,
and solutions).
•
More means of coordination and knowledge sharing
of research and development scholars and professionals (whether
government-sponsored or not).
OCR for page 296
Page 296
Technology to Manual Manufacturing
Processes, Proceedings of the Hawaii International Conference on
System Science, January.
Cypher, A. (Ed.). 1993. Watch What I
Do: Programming by Demonstration, MIT Press, Cambridge,
Mass.
Feiner, S., Litman, D., McKeown, K., and
Passonneau, R. 1993a. Towards Coordinated Temporal Multimedia
Presentations. Intelligent Multimedia Interfaces, M. Maybury
(Ed.), pp. 139-147. AAAI/MIT Press, Menlo Park, Calif.
Feiner, S., MacIntyre, B., and Seligmann,
D. 1993b. Knowledge-Based Augmented Reality. Communications of
the ACM 36(7):52-2.
Gershon, N., and Eick, S. (Eds.). 1995.
Proc. Information Visualization '95. IEEE Computer Society
Press, Los Alamitos, Calif.
Kobsa, A., and Wahlster, W. (Eds.). 1989.
User Models in Dialogue Systems. Springer-Verlag,
Berlin.
Kramer, G. (Ed.). 1994. Auditory
Display: Sonification, Audification, and Auditory Interfaces.
Addison-Wesley, Reading, Mass.
Laurel, B. 1993. Computers as
Theatre. Addison-Wesley, Reading, Mass.
MacIntyre, B., and Feiner, S. 1996. Future Multimedia User
Interfaces. Multimedia Systems.
Maybury, M. (Ed). 1993. Intelligent Multimedia
Interfaces. AAAI/MIT Press, Menlo Park, Calif.
Sullivan, J., and Tyler, S. (Eds.). 1991. Intelligent User
Interfaces. Addison-Wesley, Reading, Mass.
VRML (Virtual Reality Modeling Language). 1996. The VRML Forum
(available on-line at http://vrml.wired.com/).
Weiser, M. 1991. The Computer for the 21st Century.
Scientific American 265(3):94-104.
OCR for page 297
Page 297
Nomadicity, Disability Access, And The
Every-Citizen Interface
Gregg C. Vanderheiden
University of Wisconsin-Madison
The Challenge
With the rapid evolution of the national information
infrastructure (NII) and the global information infrastructure
(GII), attention has turned to the issue of information equality
and universal access. Basically, if information systems become as
integral to our future life-styles as electricity is today, access
to these systems will be essential for people to have equal access
to education, employment, and even daily entertainment or
enrichment activities.
Although the goal of equal access seems noble, it can seem
somewhat less achievable when one considers the full range of
abilities or disabilities which must be dealt with to achieve an
every-citizen interface. It must be usable even if people
•
cannot see very well-or at all;
•
cannot hear very well-or at all;
•
cannot read very well-or at all;
•
cannot move their heads or arms very well-or at
all;
•
cannot speak very well-or at all;
•
cannot feel with their fingers very well-or at
all;
•
are short, are tall, use a wheelchair, and so
forth;
•
cannot remember well;
•
have difficulty learning or figuring things
out;
•
have little or no technological inclination or
ability; and/or
•
have any combination of these difficulties (e.g.,
are deaf-blind; have reduced visual, hearing, physical, or
cognitive abilities, which occurs in many older individuals).
In addition, the products and their interfaces must remain
equally efficient and easy to use and understand for those who (1)
have no problems seeing, hearing, moving, remembering, and so
forth; and (2) are power users.
OCR for page 298
Page 298
Is It Possible
A list like this can bring a designer up short. At first blush,
it appears that even if such an interface was possible it would be
impractical or inefficient to use for people with all of their
abilities intact. Packages such as the EZ Access approach developed
for kiosks (http://trace.wisc.edu/world/kiosk), PDAs (personal
digital assistants), and other touchscreen devices, however,
demonstrate how close we can come to such an ideal, at least for
some types of devices or systems. Using a combination of Talking
Fingertip and Speed List technologies, the EZ Access package (for
information, see http://trace.wisc.edu/TEXT/KIOSK/MINIMUM.HTM)
provides efficient access for individuals with low vision,
blindness, and poor or no reading skills. A ShowSounds/caption
feature provides access for individuals with hearing impairments or
deafness, as well as access for all users in very noisy locations.
An infrared link allows the system to be used easily with alternate
displays and controllers, so that even individuals who are
deaf-blind or paralyzed can access and use the system. Thus, with a
relatively modest set of interface variations, almost all the needs
listed above can be addressed.
Is It Practical
Practicality is a complex issue which involves cost, complexity,
impact on overall marketability, support, and so forth. To use the
EZ Access approach as an example, the hardware cost to provide all
of these dimensions of accessibility to a standard multimedia kiosk
is less than 1 percent of the cost of the kiosk. Addition of this
technique does not affect the standard or traditional mode of
operation of the kiosk at all. At the same time, it makes the
system usable by many visitors as well as new citizens whose native
language is not English, and who may have some difficulty with
words. Implementing cross-disability interface strategies can take
only a few days with the proper tools. EZ Access techniques are
currently used on commercial kiosks in the Mall of America and
other locations. Other examples of built-in accessibility are the
access features that are built into every Macintosh- and Windows
95-based computer.
Thus, if done properly, interfaces that are flexible or
adjustable enough to address a wide range of individuals can be
very practical. There are, however, approaches to provide
additional access or access for additional populations that are not
currently practical (e.g., building $2,000 dynamic braille displays
into every terminal or kiosk). In these cases, the most practical
approach may be to make the information and control necessary for
operation of the device available on a standard connector so that a
person who is deaf and blind can connect a braille display
OCR for page 299
Page 299
and keyboard. Practicality also is a function of the way the
access features relate to and reinforce the overall interface goals
of the product.
How Does An Every-Citizen Interface
Relate To Nomadic Systems
The devices of tomorrow, which might be referred to as
TeleTransInfoCom (tele-transaction/information/communication)
devices, will operate in a wide range of environments.
Miniaturization, advances in wireless communication, and
thin-client architectures are rapidly eliminating the need to be
tied to a workstation or carry a large device in order to have
access to computing, communication, and information services and
functions.
As a result, we will need interfaces for use while driving a
car, sitting in an easy chair, sitting in a library, participating
in a meeting, walking down the street, sitting on the beach,
walking through a noisy shopping mall, taking a shower, or relaxing
in a bathtub, as well as sitting at a desk. The interfaces also
will have to be usable in hostile environments-when camping or
hiking, in factories or shopping malls at Christmas time.
Many of us will also need to access our information appliance
(or appliances) in very different environments on the same
day-perhaps even during the same communication or interaction
activity. These different environments will place constraints on
the type of physical and sensory input and output techniques that
work (e.g., it is difficult to use a keyboard when walking; it is
difficult and dangerous to use visual displays when driving a car;
speech input and output, which work fine in a car, may not be
usable in a shared office environment, a noisy mall, a meeting, or
a library). Systems designed to work across these environments will
therefore require flexible input/output options to work in
different environments. The interface variations, however, must
operate in essentially the same way, even though they may be quite
different (visual versus aural). Users will not want to master
three or four interface paradigms in order to operate their devices
in different environments. The metaphor(s) and the "look and feel"
must be continuous even though the devices operate entirely
visually at one point (e.g., in a meeting) or entirely aurally at
another (e.g., while driving a car). Many users will also want to
be able to move from one environment to another, one device to
another (e.g., workstation to hand-held), and one mode to another
(e.g., visual to voice) in the midst of a task.
Does Nomadicity Equal Disability
Accessible
It is interesting to note that most of the issues regarding
access for
OCR for page 300
OCR for page 301
OCR for page 302
OCR for page 303
OCR for page 304
OCR for page 305
OCR for page 306
Representative terms from entire chapter:
information infrastructure
Page 300
people with disabilities will be addressed if we simply address
the issues raised by the range of environments described above:
•
When we create interfaces that work well in noisy
environments such as airplanes, construction sites, or shopping
malls at Christmas, and for people who must listen to something
else while they use their device, we will have created interfaces
that work well for people who cannot hear well or at all.
•
When we create interfaces that work well for
people who are driving a car or doing something that makes it
unsafe to look at the device they are operating, we will have
created interfaces that can be used by people who cannot see.
•
As we develop very small pocket and wearable
devices for circumstances in which it is difficult to use a
full-sized keyboard or even a large number of keys, we will have
developed techniques that can be used by individuals with some
types of physical disabilities.
•
When we create interfaces that can be used by
someone whose hands are occupied, we will have systems that are
accessible to people who cannot use their hands.
•
When we create interfaces for individuals who are
tired, under stress, under the influence of drugs (legal or
illegal), or simply in the midst of a traumatic event or emergency
(and who may have little ability to concentrate or deal with
complexity), we will have interfaces that can be used by people
with naturally reduced abilities to concentrate or deal with
complexity.
Thus, although there may be residual specifics concerning
disability access that must be covered, the bulk of the issues
involved are addressed automatically through the process of
developing environment/situation-independent (modality-independent)
interfaces.
What Is Needed
Interfaces that are independent of the environment or the
individual must have the following attributes:
•
Wide variability in order to
meet the diversity of tasks that will be addressed. Some interfaces will have to deal only with text
capture, transmission, and display. Others will have to deal with
display, editing, and manipulation of audiovisual materials. Some
may involve VR (virtual reality), but basically be shop-and-select
strategies. Others may require full immersion, such as data
visualization and tele-presence.
•
Modality
independence. Interfaces have to allow
the user to choose sensory modalities appropriate to the
environment, situation, or user.
Page 301
Text-based systems will allow users
to display information visually at some times and aurally at
others, on high-resolution displays when available and on smaller
low-resolution displays when necessary.
•
Flexibility/adaptability. Interfaces will
be required that can take advantage of fine motor movements and
three-dimensional gestures when a user's situation or abilities
allow but can also be operated by using speech, keyboard, or other
input techniques when this is necessary because of the environment,
the user's activities, or any motor constraints.
•
Straight forwardness and ease of
use. As much of the population as
possible must be able to use these interfaces and to master new
functions and capabilities as they are introduced.
Some Components Necessary To Achieve
Every-Citizen Interfaces
Although this section does not address all possible interface
types, particularly freehand graphic production interfaces (e.g.
painting), it does address the majority of command-and-control
interfaces.
1.
Modality Independence. For a device or system to be
modality independent or alt-modal (i.e., the user can choose
between alternate sensory modalities when operating the device),
two things are necessary:
a.
All of the basic information must be stored and
available in either modality-independent or modality-redundant
form.
Modality independent
refers to information that is stored in a form that
is not tied to any particular form of presentation. For example,
ASCII text is not inherently visual, auditory, or tactile. It can
be presented easily on a visual display or printer (visually),
through a voice synthesizer (aurally), or through a dynamic braille
display or braille printer (tactually).
Modality redundant
refers to information that is stored in multiple
modalities. For example, a movie might include a visual description
of the audio track (e.g., caption) and an audio and electronic text
description of the video track so that all (or essentially all)
information can be presented visually, aurally, or tactually at the
user's request based on need, preference, or environmental
situation.
b.
The system must be able to display data or
information in different modalities. That is, it should provide a
mechanism for displaying information in all-visual, or
all-auditory, or mixed audiovisual form as well as in electronic
form.
2.
Flexibility/Adjustability. The device must also offer
alternate selection techniques that can accommodate varying
physical and sensory abilities arising from the personal
environment or situation (e.g., walking,
Page 302
wearing heavy gloves), and/or
personal abilities. Suggested alternate operating modes
follow:
•
Standard mode. This mode often uses multiple simultaneous senses and
fine motor movements. It would offer the most effective device for
individuals who have no restrictions on their abilities (due to
task, environment, or disability).
•
A list mode. In this mode, the user can call up a list of all the
information and action items and use the list to select items for
presentation or action. It would not require vision to operate. It
could be operated using an analog transducer to allow the
individual to move up and down within a list, or a keyboard or
arrow keys combined with a confirm button could be used. This mode
can be used by individuals who are unable to see or look at a
device.
•
External list
mode. This would make the list available
externally through a software or hardware port (e.g., infrared
port) and accept selections through the same port. It can be used
by individuals who are unable to see and hear the display and
therefore must access it from an external auxiliary interface. This
would include artificial intelligent agents, which are unable to
process visual or auditory information that is unavailable in text
form.
•
Select and confirm
mode. This allows individuals to obtain
information about items without activating them (a separate confirm
action is used to activate items after they are selected). It can
be used by individuals with reading difficulties, low vision, or
physical movement problems, as well as by individuals in unstable
environments or whose movements are awkward due to heavy clothing
or other factors.
•
Auto-step scanning
mode. This presents the individual items
in groups or sequentially for the user to select. It can be used by
individuals with severe movement limitations or movement and visual
constraints (e.g., driving a car), and when direct selection (e.g.,
speech input) techniques are not practicable.
•
Direct text control
techniques. These include keyboard or
speech input.
Example: Using A Uni-List-Based
Architecture As Part Of The Interface
One approach to device design that would support this type of
flexibility is the Uni-List architecture. By maintaining a
continually updated listing of all the information items currently
available to the user, as well as all the actions or commands
available, it is possible to provide a very flexible and adjustable
user interface relatively easily. All the techniques
Page 303
listed above are easy to implement with such an architecture,
and it can be applied to a range of devices or systems.
Take, for example, a three-dimensional (3D) virtual
reality-based shopping mall. In such an application, a database is
used to provide the information needed to generate the image seen
by the user and the responses to user movements or actions on
objects in the view. If properly constructed, this database could
also provide a continually updated listing of all objects in view
as well as information about any actionable objects presented to
the user at any time. By including verbal (e.g., text) information
about the various objects and items, this 3D virtual shopping
system can be navigated and used in a variety of ways to
accommodate a variety of users or situations.
•
Individuals who are unable to see the screen
(because they are driving their car, their eyes are otherwise
occupied, or they are blind) can have the information and choices
presented vocally (or via braille). They can then select items from
the list in order to act on them, in much the same that an
individual can pick up or "click on" an object in the
environment.
•
Individuals with movement disabilities can have a
highlight or "sprite" step around to the objects, or they could
indicate the approximate location and have the items in that
location highlighted individually (other methods for disambiguating
also could be used) to select the desired item.
•
Individuals who are unable to read can touch or
select any printed text presented and have it read aloud to
them.
•
Individuals with low vision (or who do not have
their glasses) can use the system in the same way as a fully
sighted individual. When they are unable to see well enough to
identify the objects, they can switch into a mode that lets them
touch the objects (without activating them) and can thereby have
them named or described.
•
Individuals who are deaf-blind could use the
device in the same fashion as an individual who is blind. Instead
of the information being spoken, however, it could be sent to the
individual's dynamic braille display.
Additional Benefits Of Flexible,
Modality-Independent Architectures And Data Formats
The two key underlying strategies for providing more universal
access are input and display flexibility and the companion
availability of information in sensory/modality-independent or
parallel form.
Page 304
Both input and display flexibility and presentation independence
have additional benefits beyond the every-citizen interface. These
include the following:
•
Nomadicity support
(discussed above).
•
Searchability. Graphic and auditory information that contains text
streams can be indexed and found by using standard text-based
search engines, which not only can locate items but also can jump
to particular points within a movie or a sound file.
•
Alternate client
support. The same information can be
stored and served to different types of telecommunication and
information devices. For example, information could be accessed
graphically over the Internet, via a telephone by using a verbal
form, or even by intelligent (or not so intelligent) agents using
electronic text form.
•
Display
flexibility. Presentation-independent
information also tends to be display size independent, allowing it
to be more easily accessed using very small, low-resolution
displays. (In fact, some low-resolution displays present exactly
the same issues as low vision.)
•
Low bandwidth. The ability to switch to text or verbal presentation can
speed access over low-bandwidth connections.
•
Future Support. Modality-independent servers will also be better able to
handle future serving needs that may involve access to information
using different modalities. Creating a legacy system that cannot
handle or serve information in different modalities may necessitate
a huge rework job in the future as systems evolve and are
deployed.
Limitations
Today, most of the universal access strategies are limited to
information that can easily be presented verbally (in words).
However, although the Grand Canyon could be presented in three
dimensions through virtual reality, its full impact cannot be
captured in words, nor can a Picasso painting or Mahler symphony
easily be made sensory modality independent. Also, although planes
could be designed to fly themselves, we do not as yet know how to
allow a user who is blind to control directly flight that currently
requires eye-hand coordination (or its equivalent). There are also
situations in which the underlying task requires greater cognitive
skills than an individual may possess, regardless of the cognitive
skills required to operate the interface. It may be a while before
we resolve some of these limitations to access.
On the other hand, we also have many examples where interfaces
that were previously thought to be unusable by individuals with a
particular disability, were later made easily accessible. The
difference was
Page 305
simply the presence or absence of an idea. The challenge,
therefore, is to discover and develop strategies and tools that can
make next-generation interfaces accessible to and usable by greater
numbers of individuals and easier for all to use.
Summary
Through the incorporation of presentation-independent data
structures, an available information/command menu, and several
easy-to-program selection options, it is possible to create
interfaces that begin to approximate the anytime-anywhere-anyone
(AAA) interface goal. Some interfaces of this type have been
constructed and are now being used in public information kiosks to
provide access to individuals with a wide range of abilities. The
same strategies can be incorporated into next-generation
TeleTransInfoCom devices to provide users with the nomadicity they
will require in next-generation Internet appliances.
Before long, individuals will look for systems that allow them
to begin an important communication at their desk, continue it as
they walk to their car, and finish it while driving to their next
appointment. Similarly, users will want the ability to move freely
between high- and low-bandwidth systems to meet their needs and
circumstances. They will want to access their information databases
by using visual displays and perhaps advanced data visualization
and navigation strategies while at a desk, but auditory-only
systems as they walk to their next appointment. They may even wish
to access their personal rolodexes or people databases while
engaged in conversations at a social gathering (by using a pocket
keypad and an earphone to ask, What is Mary Jones' husband's
name?).
The approaches discussed will also allow these systems to
address issues of equity such as providing access to those with
disabilities or those with lower-technology and lower-bandwidth
devices and providing support for intelligent (or
not-so-intelligent) agent software. The AAA strategies discussed
here do not provide full cross-environment access to all types of
interface or information systems. In particular, as noted above,
fully immersive systems that presented inherently graphic (e.g.,
paintings) or auditory (e.g., symphonies) information will not be
accessible to anyone who does not employ the primary senses for
which this information was prepared (text descriptions are
insufficient). However, the majority of today's information and
most services can be made available through these approaches, and
extensions may provide access to even more.
Finally, it is important to note that not only do
environment/situation-independent interfaces and
disability-accessible interfaces appear to be closely related, but
also one of the best ways to explore
environment/situation-independent
Page 306
nomadic interface strategies may be the exploration of past and
developing means for providing cross-disability access to computer
and information systems.
Challenges And Research Areas
For a system to be more accessible to and usable by every
citizen, it must be (1) perceivable, (2) operable, and (3)
understandable.
The following areas of research can help to address these
needs:
•
Data structures, compression, and transport
formats that allow the incorporation of alternate modalities or
modality-independent data (e.g., text embedded in sound files or
graphic files);
•
Techniques and architectures for partial serving
of information, (such as the ability to fetch only the visual, the
auditory, the text, or any combination of these tracks from a
multimedia file or to fetch one part of a file from one location
and another part from a second location (e.g., fetching a movie
from one location and the captions from another);
•
Modality substitution strategies (e.g., techniques
for restructuring data so that ear-hand coordination can be
substituted for eye-hand coordination);
•
Natural language interfaces (e.g., the ability to
have information presented conversationally and to control products
with conversation, whether via speech or ''typed" text);
•
Alternate, substitute, and remote interface
communication protocols (e.g., robust communication protocols that
allow sensory- and presentation-independent alternate interfaces to
be connected to and used with devices having less flexible
interfaces);
•
Voice-tolerant speech recognition (ability to deal
with disarthric and deaf speech);
•
Dynamic tactile displays (two- and
three-dimensional tactile and force feedback 3D);
•
Better random access to information/functions
(instead of tree walking); and
•
Speed-List (e.g., EZ Access) equivalent access to
structured VR environments.