AN EMBEDDED, INVISIBLE, EVERY-CITIZEN INTERFACE
Mark Weiser
Xerox Palo Alto Research Center
The nation's information infrastructure is a vast, loosely connected network of informing resources found mostly in people's everyday lives. When considering interfaces to new electronic information sources, and especially when replacing old information sources with new electronic sources, it is crucial to consider how the existing infrastructure really works. Two examples will help.
Consider how you would find a grocery store in a new town. How do you solve this problem on first driving in? Most likely, by looking around, watching the people and the streets, and making a couple of guesses, you could find one in no time. The information infrastructure is everyday knowledge of towns (including economic and practical constraints on layout, walking distances, etc., that are embedded in that knowledge), and physical clues that map that general knowledge into this particular town. Information infrastructure can be physical (see http://www.itp.tsoa.nyu.edu/~review/current/focus2/open00.html).
More conventionally, our national information infrastructure today includes tens of thousands of public and school libraries all across the country. These libraries are in nearly every elementary, junior high, high school; and they are in nearly every community, even the very small. Of course, many of these libraries are connected to the Internet. But it is very important to consider the other resources provided by these libraries. Thirty-five percent of all library visitors never use the catalog, and 12 percent use no library materials at all, but bring in their own materials. Clearly, libraries do something more than just supply data that could be gotten over the Web (see http://www.ubiq.com/weiser/SituationalAspectsofElectronicLibraries.html).
As the above examples illustrate, the existing information infrastructure often functions without calling itself to our attention. It stays out of sight, effectively not even noticed. So the first challenge for every-citizen interface is to be invisible (what I have called "calm technology" elsewhere; see http://www.ubiq.com/weiser/calmtech/calmtech.htm).
As the above examples also illustrate, the existing information infrastructure is extremely widespread, found in every nook and cranny of our lives. The second challenge for the every-citizen interface is to be ubiquitous (see http://www.ubiq.com/ubicomp).
Finally, not addressed by the above examples but presumably clear to everyone, the current Internet is just the beginning. I like to think of it by analogy to television channels. Once upon a time we fretted about how we would manage a TV with 500 channels. How could we ever view them all? The Internet will give us 5 billion channels, one for every person on the planetonly about 30 million so far, but more are coming. And soon these channels will be multimedia, multiway video and sound using the Mbone. This kind of interconnection is a deep technical challenge to the current Web infrastructure, which cannot begin to support even a few multiway Mbone connections, much less 5 billion. I consider this to be a user interface issue because it is just this infrastructure that opens up the Web to use by anyone who can point a camera or talk on the phone. The third challenge for the every-citizen interface is to support billions of multiway real-time interactive connections.
Of these three challenges I believe that the first is currently the most promising of progress, the one most susceptible to interdisciplinary attack, and the one least well addressed by existing projects. How does a technology become invisible? To some degree, anything can, given enough practice. Invisibility is a property of people, technology, and the situation in which we find ourselves (a tiger invisible in the tall grass stands out at the zoo).
Some suggested challenges for developing a "science of invisibility" for a every citizen interface are as follows:
INTELLIGENT MULTIMEDIA INTERFACES FOR
"EACH" CITIZEN
Mark T. Maybury
Mitre Corporation
Future interfaces will take advantage of knowledge of the user,
task, and discourse and exploit multiple input/output media and
associated human sensory modalities to enable intuitive access to information,
tools, and other people (see Figure 1). The more effectively computers
can process heterogeneous information and automatically acquire
knowledge about users, the more efficient they will become at supporting
users' frequently changing tasks, particularly information-seeking ones.
Information-seeking tasks range from directed search to casual browsing,
from looking up facts to predicting trends. Each of these goals can be
achieved more or less effectively by the content, form (i.e., media), and
environment that support the user. Our emphasis at Mitre Corporation has
been on investigating technologies and tools that enable more effective
and efficient information interfaces for a variety of application areas,
including command and control, intelligence analysis, and education and
training. As a consequence of our experience, we believe we should aim not
to build a one-of-a-kind interface for every citizen, but rather common
interfaces that can be tailored to each citizen in accordance with his or
her goals, knowledge, skills, and preferences.
CHALLENGES
Achieving an intelligent interface vision requires addressing some fundamental technology limitations, including:
We believe there exist fundamental tasks associated with communication that underlie all interface activities. These can be viewed as a hierarchy of communicative acts, which can be formalized and com-putationally implemented and by their nature can be realized in multiple modalities (e.g., linguistic/auditory, gestural, visual). The choice among modalities itself is an important, knowledge-based task, as is the broader task of presentation planning.
In supporting information access, our efforts have focused on
multimedia analysis (in particular, message understanding and video
analysis), including its segmentation into atomic units, extraction of
objects, facts and relationships, and summarization into compact form. New
requirements have arisen (e.g., resulting in multimedia query
mechanisms that raise issues such as how to integrate visual and linguistic queries).
Multimedia information-seeking tasks remain perhaps the most
important but least well understood area. We believe that careful
experimental design, use of instrumented environments, and task-based
evaluation (e.g., measuring at least time and accuracy (false positives, false
negatives)) will yield new insights.
A PLAN OF ACTION
There are several important recent developments that promise to enable new common facilities to be shared to create more powerful interfaces. A strategy to move forward should include:
Because they affect all who interact with computers, user
interfaces are perhaps the single area of computing that can most radically
alter the ease, efficiency, and effectiveness of human-computer interactions.
RECOMMENDED RESEARCH
Table 1 indicates several functional requirements and associated key technologies that need to be investigated to enable a new generation of human-computer interaction, indicating near-term and far-term research investment recommendations. Key areas for research include:
REFERENCE
Maybury, M.T. (Ed.) 1993. Intelligent Multimedia Interfaces. Menlo Park, Calif: AAAI/MIT Press.
Nathan Shedroff
vivid studios
Over the next 15 years the issues facing interface designers, engineers, programmers, and researchers will become increasingly complex and push farther into currently abstract and, perhaps, esoteric realms. However, we are not without guidance and direction to follow. Our experiences as humans and what little history we have with machines can lead us toward our goals.
Computers and related devices in the future will need to exhibit many of the following qualities:
WHERE DID THE "USER" GO?
Although the word "user" is, admittedly, easy to say and use and has some history, it is important that our understanding of those using computers is broadened to emphasize growing participatory aspects of computer use. While historically, people input, managed the processing, and output data and information, the building of knowledge requires more participation and interaction of the type most closely experienced with other people. People are becoming active audiences and participants instead of merely users. They are increasingly communicating with others and creating meaningful things rather than merely "viewing" and watching.
The next 100 million computer users (who may begin using computers over the next 3 years) have different needs and understandings than current users. Their needs are different not because of their capabilities (all of these people are capable of learning to use existing systems) but mostly because of differences in their perceptions, interests, and understandings of computers. One important reason why these people are not now buying or making use of current computers is that, in their minds, computers don't do much that they are interested in doing. Existing computers are not capable of or equipped for helping these people enjoy, expand, or make meaning of their lives. This is the reason why home computer sales have traditionally been dismal and are currently confined to home-office purchases and for kids' educations. Fifteen years ago the best use that computer marketers could come up with for people to buy their own computers was to balance their checkbooks and store recipes. Today, while computers have evolved significantly, many people's perceptions have not, and they understand precious few reasons why computers might enhance the lives of this next "user group." Part of this is an education issue (and, perhaps, a marketing one), but mostly it is a failure of computer systems (hardware and software) to respond to the needs and interests of the general public.
The interface starts much before a computer is turned on.
Consider an analogy to shopping. The shopping experience does not start
the moment a transaction is made (perhaps an item is bought or ordered).
It doesn't even start when someone walks into a store or browses a catalog. The shopping experience starts when people perceive the need for
something, at least, and often before they encounter others shopping,
products and services they do not currently need, and even celebrity athletes
sporting brand names on their outfits. Likewise, the interface to a
computer begins at the fulfillment of life needs and interests and the education
of the participants about the capabilities and possibilities of computers
and interfaces. Automobiles are often held up as examples of
easy-to-learn, universal interfaces, but in reality they are neither. They take
monthssometimes yearsto master, are not standardized, and are
sometimes never learned sufficiently well. Yet our understanding of driving a
car and its fulfillment of our needs make us persevere.
WHAT IS A COMPUTER?
What I mean when I use the word "computer" is a specific device
for processing, storing, and transmitting information, aiding the building
of knowledge, and/or facilitating communications more sophisticated than
a current telephone. To be sure, many objects around us will evolve to be
more sophisticated and many already include computers whether this is evident
to their users or not. To what extent computers will disappear as distinct
devices is not a question that can yet be answered. However, it does not
really matter either. The needs and interests of people have changed very little
over the past 100 years and will likely change only slightly over the next 15 to 20.
Most people will still need to work, create, love, interact, communicate,
and be entertained (as well as entertain each other). Interfaces should
concentrate on the activities, not the technologiesnor should they be immediately
concerned with the nature of a device itself (is it distinct or embedded?).
These interfaces may show up in computers, televisions, telephones, door knobs,
or devices not yet invented. What will remain fairly constant, however, are
the needs themselves.
WHAT IS INTERACTIVITY?
When I use the word "interactive" I do not mean what has become the standard industry definition of dynamic media or the ability to make choices when using computer programs. Most "interactive media" is nothing more than multimedia presentations (usually with video and animation) with the ability to click to the next screen of material in a nonlinear way. In this sense interactivity has become bad television where the audience must click for more in order to keep the stream coming. To me interactivity is much richer and includes the abilities to create, share, and communicate rather than merely watch. Interactive experiences should change over time and between different people. Sadly, few products or experiences do this now, which is the main reason why the CD-ROM industry fell apart over the past few years (the products offered little to do that was interesting).
However, this is merely the starting point. As an industry (academics and professionals alike), we understand this word "interactivity" very little and need to explore greatly what it means to people, what it can be, and how to create it. This is one of the points that grants and funding can apply. Unfortunately, the commercial end of the interactive media industry offers little chance of exploring and experimenting with the whole notion of interactivity, as the demands of an overhyped market, sky-rocketing costs, too much publicity, and too many expectations prevent most companies from asking these questions. Likewise, on the academic end of the spectrum, demands to produce work-ready students, lack of interdisciplinary programs, and the history of computer science studies (emphasis on software, programming, engineering, and computer languages) prevent students and professors from asking these questions because they seem esoteric and "light" in the phase of other research.
About the only people who are explicitly trained in the skills of
interaction are those in the performing arts: dancers, actors, singers,
comedians, improvisational actors, and musicians. However, these fields are
hardly seen as complementary or valid courses of study in computer science,
multimedia, and even design programs. Yet the experience and
knowledge that performers can bring to these disciplines are exactly the answers to
the questions that should be asked. Grants for programs that try to
explore these issues with the help of many different disciplines would help
speed the development of answers badly needed in this industry.
COMPUTERS THAT ARE AWARE
Interfaces need to become more aware of themselves and those around them. This is true in both a physical sense (where am I, where are you, and who else is here?) and a cognitive sense (who am I, what can I do, and how can I communicate with others?). While computers won't have truly "cognitive" capabilities for a long timeif everthey already have a few elements of these capabilities and information and need even more. What features these capabilities will eventually create are mostly unpredictable right now, but we can count on the facilities to respond to people in a more adaptable and individual manner to make a major improvement in interfaces. Developing processes, standards, and technologies to build these capabilities upon will prove mandatory.
Other technologies that will be needed to develop these more
intuitive and adaptive interfaces include perceptual technologies to
support computer perception in sound, vision, touch, gesture, environment,
temperature, airborne particles, and so forth.
INTERFACES TO KNOWLEDGE
Most interfaces and applications today have sped the transmission, storage, and processing of data but have hardly changed the accumulation, creation, or quantity of either information or knowledge. Certainly, we cannot say that computers have made us more "wise," but the interactions computers offer do give us more chances to communicate our thoughts and build wisdom if we only knew better how to. Our understanding of knowledge and wisdom and its processes is inadequate but also critical to our continued development as a culture and a species. Research into the components of these processes, of our minds, and of our thoughts is needed to advance not only our toolsof which the computer is one of our bestbut ourselves. This research is needed not only in terms of software and hardware (perhaps finding form in file formats, applications, operating systems, and products), but in the underlying processes and understandings of how we thinkupon which all of the aforementioned are based. This must be coordinated with the fields of education, psychology, and communications, as well as computer science. It may even be helpfuland necessaryto include those in philosophy.
These are the most esoteric and unpredictable of questionsindeed, they have kept us busy for our entire historiesbut this should not deter us from seeking their answers. Even if we will never truly answer these questions completely, each part of the answer gives us new insights into building more valuable interfaces that meet more of our needs.
Another aspect of interfaces that facilitate knowledge are the technologies involved with representing and displaying data and information. Present tools commonly available on the market such as spreadsheets, word processors, databases, and graphics programs are hardly adequate for representing or visualizing complex relationshipships and informing communications. The hardware required for better-performing visualization systems includes displays that are high resolution, portable, and low power so that they are more easily used where needed. Standards for evolved displays will need to be established, adopted, and made prevalent so that engineers, programmers, and audiences can come to count on their capabilities and availability. Integration with input/output devices such as scanners, pointing devices, and printers will need to be advanced as well. Devices that enable more direct interaction between display and control are more learnable and more evident to usein essence, an evolution of what is commonly understood as direct manipulation today. Systems for processing and working with these devices will need to rely on more powerful and faster CPUs (central processing units) and hardware as well.
Software for representing and manipulating more complex data
visualizations will need to build on new understandings of how people
think and work with data. These applications will need to explore new
paradigms in representation and manipulation in order to offer the kinds
of flexibility and understandability required by more complex
processing and less "professional" audiences.
INTERFACES ACROSS MEDIA
Where possible, new interfaces should translate well across
media and devices, whether portable, stationary, shared, public, private,
networked, or personal, and should strive to encompass and integrate
traditional media, such as broadcast and print media, where
possiblenot merely electronic and on-line media. This is not to say that
interfaces shouldn't take advantage of their unique capabilitiesindeed, they
need to do so more than currentlybut they also need to relate to each
other where possible and it should be recognized that printed interfaces such
as newspapers, classifieds, catalogs, documentation, and directories are
in just as dire need of evolution as technological ones. I am certainly
not calling for computer screens to look like little notebooks of paper
with spiral binds nor print paper to look like current computer interfaces
with pull-down menus. However, our interfaces to printed information
and knowledge have evolved little (outside of stylistic appearances) in
the past 100 years, and, since this still represents a huge proportion of
information dissemination and interaction (and will likely continue to
be), some funding for the evolution of these critical interfaces should be
allocated.
INTERFACES ACROSS CULTURES
As interfaces become more complex and deal with more abstract issues, how they address people from different backgrounds and cultures will become more critical. We have been able to achieve a certain amount of standardization and utilization so far with present interfaces, but this is due mainly to the nature of tasks currently completed with computers. As computers become more involved with knowledge building, communications, and community and as interfaces facilitate these more social purposes, they will need to address how differences between people change their understanding of how to use these devices and of the processes themselves. These differences may be based on age, gender, culture, language, or nationality. Interfaces in the future will not have the luxury of requiring the same amount of capitulation on the part of the audience since the next level of computer users (the next 100 million users) will not be as willing to change their approach to problems and their interaction with devices as the enthusiasts and professionals who comprise the present base of computer users. Issues of language, gesture, understanding, privacy, approach, civility, and "life" are not consistent throughout the worldand wonderfully soand must be discovered and documented. Also, systems, standards, and interfaces must be developed that are sensitive to these differences. Lastly, the knowledge of these difference must be made available to researchers and developers.
Automatic language translation is one of the most criticaland most difficultproblems to solve. It is such a complex problem that it is probably not solvable by conventional programming means. Efforts to "grow" or "evolve" complex software for pattern recognition and processing are probably the best hope for tackling problems of this complexity, and it will probably require several efforts in coordination.
Lastly, greater education is needed to inform researchers,
professionals, participants, and the industry of these issues, their importance,
the state of their progress, and their details. We cannot merely rely on
the media to inform people about computers or their capabilities since
the messages usually get dissolved to the lowest common
denominatorbut one of cynical expectations and far lower than actual capabilities to
understand. Movies, news, books, and other instruments of culture
often create unrealistic understandings, expectations, and often fears of
computers and their uses. We must address these and reverse them
ourselvesas no one else willif we expect future interfaces for
everybody to be more effective.
INTERSPACE AND AN EVERY-CITIZEN INTERFACE TO
THE
NATIONAL INFORMATION INFRASTRUCTURE
Terry Winograd
Stanford University
With the sudden emergence of widely used Internet-centered applications, it has become glaringly obvious that the computer is not a machine whose main purpose is for a person to pursue a task. The computer (with its attendant peripherals and networks) is a machine for communicating all kinds of information in all kinds of media, with layers of structuring and interaction that could not be provided by traditional print, graphic, or broadcast media.
The traditional idea of "interface" implies that we are focusing on two entitiesthe person and the machineand on the space that lies between them. But a more realistic view recognizes the centrality of an "interspace" that is inhabited by multiple people, workstations, servers, and other devices in a complex web of interactions. The hardest task in creating an every-citizen interface will be the design of appropriate theories, models, and representations to do justice to the potential richness of this interaction space.
As a simple example, consider the Web. Today we talk of surfing from place to place on the Web, touching home pages, and following links. These metaphors of spatial locomotion are engaging. They opened up new ways of thinking and doing that had not been explored in the predecessor desktop metaphor. But the Web is not the answer to the future of interfaces any more than the desktop was in its day. Each metaphor is another stepping stone, which in turn creates a way of thinking that creates a blindness to new possibilities.
New research and development activities need to enhance our ability to understand, analyze, and create interaction spaces. The work will be rooted in disciplines that focus on people and communicationsuch as psychology, communications, graphic design, and linguisticsas well as in disciplines that support computing and communications technologies. The work will start from an assumption that a computer system provides a shared space for multiple people, each in a personal and organizational background that shapes and guides interaction with others. The systems we can build based on new research will support communications structures at all levelsfrom the generic document structuring of the Web to highly task-specific interactions, like those that go on in an airport control tower.
Some sample areas for research are discussed below.
COLLABORATION STRUCTURES
There is a body of research on the structure of collaborative
work, sponsored under previous National Science Foundation (NSF)
initiatives on collaboration and developed by commercial software developers
under labels such as "workflow." The current state of the art can be
described as having a large "hole in the middle." At the highest level
there are very general (and hence very abstract) theories of how people
get work done through communication. At the low level, there are
thousands, even millions, of specialized applicationsfrom the order
system at a fast food restaurant to the NSF proposal application
processthat support organized group activities. But we have not yet developed
the conceptual and computational tools to make it easy to bring
collaboration into the mainstream of applications. When I work with my
research group on a joint paper, we use sophisticated word processors,
graphics programs, and the like, but our coordination is based on generic
e-mail and calendars and often fails to help us at the places where
breakdowns occur.
SEMANTIC ALIGNMENT
Whenever two people talk, they have only an approximate understanding of each other. When they speak the same language, share intellectual assumptions, and have common backgrounds and training, the alignment may be quite close. As these diverge, there is an increasing need to put effort into constant calibration and readjusting of interpretations. Ordinary language freezes meanings into words and phrases, which then can be "misinterpreted" (or at least differently interpreted).
This problem shows up at every level of computer systems, whenever information is being represented in a well-specified syntax and vocabulary. Even simple databases have this problem. If information is being shared between two company databases that have a table for "employee," they are apparently in alignment. But if one was created for facilities planning and the other for tax accounting, they may not agree on the status of part-time, off-site, on-site contract, or other such "employees." This difference may be nowhere explicit in what is stored on the computer but is a matter of background and context.
Ubiquitous networking is leading us to the point where every
computer system supports communication and where every term we use
will be seen and hence interpreted by others. There are traditional
philosophical and linguistic approaches to making sure we have "common
understanding," but these tend to be based on highly abstract or
idealized examples and settings. We need to develop new theoretical
foundations for talking about the kinds of "semantic approximations" that are
needed for something as apparently simple as sharing data between two
databases and as ubiquitous as the nodes of the Internet.
BUILDING NEW "VIRTUALITIES"
In designing new systems and applications, we are not simply providing better tools for working with the objects in the previously existing world. Computer systems and software are a medium for the creation of virtualitiesthe worlds in which users of the software perceive, act, and respond to experiences. Software is not just a device with which the user interacts; it is also the generator of a space in which the user lives. Software design is like architecture: when an architect designs a home or an office building, a structure is being specified. More significantly, though, the patterns of life for its inhabitants are being shaped. People are thought of as inhabitants rather than as users of buildings.
The creation of a virtual world is immediately evident in computer games, which dramatically engage the player in exploring the vast reaches of space, fighting off the villains, finding the treasuresactively living in whatever worlds the game designer can imagine and portray. But the creation of worlds is not limited to game designers. There is also a virtual world in a desktop interface, in a spreadsheet, and in a use of the World Wide Web. Researchers in human-computer interfaces have used other terms, such as conceptual model, cognitive model, user's model, interface metaphor, user illusion, virtuality, and ontology, all carrying the connotation that a space of existence, rather than a set of devices or images, is being designed. The term virtuality highlights the perspective that the world is virtual, in a space that is neither a mental construct of the user or a mental construct of the designer.
Today, we are all familiar with the virtuality of the standard graphical user interface, with its windows, icons, folders, and the like. Although these virtual objects are loosely grounded in analogies with the physical world, they exist in a unique world of their own, with its special logic and potentials for action by the user. The underlying programs manipulate disk sectors, network addresses, data caches, and program segments. These underpinnings do not appearat least when things are working normallyin the desktop virtuality in which the user works.
There is little theoretical grounding today on which to base the design of new virtualities. Obviously, there are considerations from psychology about how people perceive new kind of objects and activities. There are philosophical discourses about how we divide the world up into constituent things and properties and how we can formulate our interactions with them. There is also a more common-sense level of understanding how people think about familiar domains and how their expectations from their experiences in life will shape their interactions with computer systems.
As a simple example, consider the three primary modes of interacting with a virtuality that are learned by every normal person in infancy:
What can be said at a theoretical level about the nature of these
modalities and the problems that arise in usingand especially in mixingthem?
Are there other conceptual modalities that are fundamentally different
from these three that will be understandable and practical for people to use?
How does the finer-grained analysis of interaction structure fit into
this larger picture? And, finally, what is the difference in the nature of
multi-person activity in these different modalities, and how does that map
onto the kinds of multiperson collaboration we want to support in an
every-citizen interaction space? (See
<http://www-pcd.stanford.edu/winograd> and <http://www-pcd.stanford.edu/winograd/book.html>.)
BIBLIOGRAPHY
On Coordination
Denning, Peter and Pamela Dargan, Action-Centered Design, in Terry Winograd (ed.), Bringing Design to Software, Reading, MA: Addison-Wesley, 1996, pp. 105-120.
Holt, Anatol, Diplans, A Formalism for Action, ACM Transactions on Office Information Systems 6:2 (April 1988).
Malone, Thomas W. and Kevin Crowston, What Is Coordination Theory and How Can It Help Design Cooperative Work Systems?, MIT CCS report 112, 3402/3183, Cambridge, MA: Massachusetts Institute of Technology, April 1990.
Medina-Mora, Raul, Terry Winograd, Rodrigo Flores, and Fernando Flores, "The Action Workflow Approach to Workflow Management Technology" in The Information Society, Volume 9, Number 4, October-December 1993, p. 391.
Verharen, Egon, Nardo van der Rijst, and Jan Dietz (eds.), Proceedings of the Language/Action Perspective: International Workshop on Communication Modeling, Economisch Institut Tilburg, Tilburg, the Netherlands, 1996. <http://infolabwww.kub.nl:2080/infolab/lap96/>
Winograd, Terry (1988), Introduction to the Language/Action Perspective, ACM Transactions on Office Information Systems 6:2 (April 1988), pp. 83-86.
On Semantic Alignment
There are papers on this topic from the point of view of artificial intelligence ("ontology" matching), databases ("schema" matching), and information retrieval ("attribute set" matching). I have not put together a good list or found an integrative article that cuts across them. Some general considerations are presented in the following:
Winograd, Terry and Fernando Flores, Understanding Computers and Cognition: A
New Foundation for Design, Norwood, NJ: Ablex, 1986, 220 pp. Paperback issued
by Addison-Wesley, 1987.
On the Design of "Virtualities"
Hutchins, Edwin, James Hollan, and Donald Norman, Direct Manipulation Interfaces, in D. Norman and S. Draper (eds.), User-Centered System Design, Hillsdale, NJ.: Erlbaum, 1986, pp. 87-124.
Lakoff, George, and Mark Johnson, Metaphors We Live By, Chicago: University of Chicago Press, 1980.
Winograd, Terry, with John Bennett, Laura De Young, and Bradley Hartfield (Eds.), Bringing Design to Software, Reading, MA: Addison-Wesley, 1996. Available on-line at <http://www-pcd.stanford.edu/winograd/book.html>. See, especially, Introduction, Chapter 2 (David Liddle, "Design of the Conceptual Model"), and Chapter 4 (John Rheinfrank and Shelley Evenson, "Design Languages").
MOBILE ACCESS TO THE NATION'S INFORMATION INFRASTRUCTURE
Daniel P. Siewiorek
Carnegie Mellon University
INTRODUCTION
The focus of this position paper is mobile access to the nation's information infrastructure (NII). The goal should be to provide "the right information to the right person at the right place at the right time." In order for the NII to reach its potential, the average person should be able to take advantage of the information on or off the job. Even while at work, many people do not have desks or spend large portions of time away from their desks. Thus, mobile access is the gating technology required to make the NII available at any place at any time.
The next section describes the time rate of change of computer
technology, indicating what might be expected in the form of technology
from the computer industry as well as defining a new class of
computersthe wearable computer. The third section describes the importance of a
variety of modalities of interaction with wearable computers. The
paper concludes with some research challenges.
TIME RATE OF CHANGE OF COMPUTER TECHNOLOGY
Computer systems are typically compared using two classes of metrics: capacity and performance. Capacity is how large a component may be or how much information it may store. Performance is measured in functions per unit of time (often referred to as bandwidth or throughput) or, conversely, the time needed to complete a specific function (referred to as latency). Recently, ease of use has become a major differentiating characterisitic between computer systems and hence represents a third class of metrics.
Because they directly reflect the state of technology, hardware capacity and performance metrics are the easiest to determine or derive. These measures are usually associated with individual components in a computer system. There are six basic functions in a computer system. In addition, attributes such as energy consumption and physical size gain increasing importance as computers become more mobile. Table 1 summarizes eight metrics for a computer system, including their units of measurement. Capacity is usually measured in the number of information units such as bytes or pixels. Bandwidth/throughput is measured in operations per second for the processor and bits per second for communications. Energy is measured as the reciprocal of kilowatts, while physical size is summarized as the reciprocal of the product of the weight times the volume of space occupied. Notice that for all these metrics the larger the number the better. The three columns in Table 1 include a contemporary workstation (an anchored, unmovable system), a contemporary laptop computer (a luggable system), and a palmtop/personal digital assistant (a portable pocketable system).
Since ease of use is so closely associated with human reaction, it is much more difficult to quantify. There are at least three basic functions related to ease of use: input, output, and information representation. Box 1 summarizes several points for each of these basic functions. Note that, unlike the continuous variables for capacity and performance, the ease-of-use metrics are discrete.
Siewiorek et al. (1982) considered the concept of the computer class. A computer class attempts to integrate many computer system details into an overall evaluation, grouping similarly evaluated systems together. Thus, the workstation, laptop, and palmtop in Table 1 can each be considered representative of a computer class. These researchers also observed that computer classes differ in physical dimensions and price by roughly 1.5 orders of magnitude (e.g., approximately a factor of 30). In addition, it was observed that, as each computer class evolves, new members of the class are expected to have increased capacity and functionality. The increases in technology serve to increase the capacity and functionality of a class. Thus, the boundary of various attributes can be considered to be increasing with time.
On the other hand, technological changes can be used to initiate new computer classes with the same functionality offered by the next higher class several years before. It is extremely important to remember that all classes of computers have followed approximately the same evolutionary paths as their capacity and functionality have increased. The newer computer classes benefit from the evolutionary process of older classes, adapting to proven concepts quickly, where the older classes required a trial-and-error process. Siewiorek and co-workers also observed that computer classes tend to lag each other by approximately 5 years. Thus, the palmtop computer of today could be considered to have approximately the functionality of a laptop 5 years ago or a workstation 10 years ago. Thus, we can expect the palmtop of the year 2006 to have the attributes of today's workstation.
One can speculate on the emergence of a new class of computers called "wearable computers." Wearable computers will weigh less than a few ounces, operate for months or years on a single battery, and have esthetically pleasing shapes that can adorn various parts of the body. Pagers and electronic watches (complete with calculator and memory to store phone numbers/memos) represent the first examples of the wearable class of computers. Thus, the wearable computer of the year 2006 will have at least the functionality of today's laptop (as depicted in Table 1).
As with the capacity and performance metrics in Table 1, the
ease-of-use metrics in Box 1 are also moving out with time. For example,
the keyboard with an alphanumerical display using textual information is
representative of time-sharing systems of the early 1970s. The keyboard
and mouse, graphical output, and iconic desktop are representative of
personal computers of the early 1980s. The addition of handwriting
recognition input, speech synthesis output, and multimedia information is emerging
in the early 1990s. It takes approximately one decade to completely
assimilate new input, output, and informational representations. By the early part
of the next decade, speech recognition, position sensing, and eye
tracking should be common inputs. Heads-up projection displays should
allow superposition of information onto the user's environment.
MODALITIES OF INTERACTION WITH
WEARABLE COMPUTERS
The objective of wearable computer designs is to merge the user's information space with his or her work space. The wearable computer should offer seamless integration of information-processing tools with the existing work environment. To accomplish this, the wearable system must offer functionality in a natural and unobtrusive manner, allowing the user to dedicate all of his or her attention to the task at hand with no distraction provided by the system itself. Conventional methods of interaction, including the keyboard, mouse, joystick, and monitor, all require some fixed physical relationship between user and device, which can considerably reduce the efficiency of the wearable system. Among the most challenging questions facing mobile system designers is that of human interface design. As computing devices move from the desktop to more mobile environments, many conventions of human interfacing must be reconsidered for their effectiveness. How does the mobile system user supply input while performing tasks that preclude use of a keyboard? What layout of visual information most effectively describes system state- or task-related data. To maximize the effectiveness of wearable systems in mobile computing environments, interface design must be carefully matched with user tasks. By constructing mental models of user actions, interface elements may be chosen and tuned to meet the software and hardware requirements of specific procedures.
The efficiency of the human-computer interface is determined by
the simplicity and clarity of the mental model suggested by the system.
By modeling the actual task as well as the human interface, a linkage can
be constructed between user and machine that can be examined to
improve the overall efficiency of the wearable system. We begin with the
assertion that for wearable systems to be efficient the mental model of the
interface design must closely parallel that of the user task; there must be
minimal interference or obstruction posed by the computer in completing
jobs. Although the number of quantifiable metrics suited for interface
evaluation is small, a series of basic observations provide a means for
comparison. One characteristic of an application interface is the number of
user actions required to perform a given subtask. We define a subtask as
an operation, possibly consisting of multiple inputs, that a user completes
in the process of performing a larger coherent task. For example, in
the course of performing an inspection, a user might wish to return from
his or her present location in an application to the main menu. This
subtask may require a single input (perhaps a voice command or an
on-screen button) or multiple inputs (backing out through a hierarchy of
categories to reach the top or main level). We assert that an application
requiring few inputs will allow a user to dedicate more attention to the job at
hand, while a larger number of inputs will require more concentration on
the computing system. A comparison of equivalent subtasks in two
wearable computers (Smailagic and Siewiorek, 1996) is shown in Table 2.
The speech recognition engine accepts complex commands that allow
some subtasks requiring a series of manual inputs to be executed with a
single phrase. However, the response time to a spoken input is longer and
the accuracy is lower. For these reasons the quantitative aspect of
system latency and accuracy must be factored into evaluations of usability.
TABLE 2 Comparison of Number of Steps to Retrieve Information Using Selection Buttons and Speech
|   |   | Buttons/Menu Selection | Speech |
| Get information |   | 4 | 1 |
| Get photograph |   | 5 | 1 |
| Navigate to location |   | 3 | 2 |
RESEARCH CHALLENGES
There are several challenges that research must address to make mobile access to the NII effective. Following is a partial list of those challenges:
REFERENCES
Siewiorek, D., C. G. Bell, and A. Newell. 1982. Computer Structures: Principles and Examples, McGraw-Hill, Inc., New York.
Smailagic, Asim, and D.P. Siewiorek. 1996. Modalities of Interaction with CMU Wearable Computers. IEEE Personal Communications, Vol. 3, No. 1, February.
ORDINARY CITIZENS AND THE NATIONAL
INFORMATION INFRASTRUCTURE
Bruce Tognazzini
Healtheon Corporation
The original working title for this workshop was "Toward an Ordinary-Citizen Interface to the National Information Infrastructure." It was then altered to "Every-Citizen" to be inclusive of all of our citizens. While I support that change, we did lose something quite important in the transition, for of all the people whose lives have been affected by the computer revolution, perhaps none has received as scant attention as our ordinary citizens.
Today, we face the prospect of millions of have-nots shut out of cyber-space, a threat that has little to do with economic status, country of origin, race, creed, color, or physical ability. Instead, it has everything to do with age, gender, education, culture, aptitude, and attitude. If cyberspace today were to have a dead-honest advertising slogan, it would read: Built by Boys, for Boys!
As Margie Wylie (1995) says: "Far from offering a millennial new world of democracy and equal opportunity, the coming Web of information systems could turn the clock back 50 years for women." The 18- to 39-year-old males with technological talent and above-average intelligence and education who built today's cyberspace built it for themselves. Large parts of it reflect the delicate ambiance of an automobile junkyard. We must make fundamental changes in the direction of computer design if the true have-nots of cyberspace are not to be those rare individuals who do not feel instantly comfortable clattering over mounds of twisted metallic wreckagein other words, ordinary people.
Somewhere along the line, many technology designers lost track of the real goal: empowering users. From video cassette recorders to clock radios, designers are adding every button, switch, and other power-user doodad they can in the mistaken belief that the true power of technology is to be measured in the number of features and controls rather than the impact on people's lives. Our computer software has tracked this trend. Systems and applications today are festooned with every "wangdoodle" imaginable, offering users plenty of power to blow themselves up while at the same time inhibiting them from accomplishing their task.
If the desktop computer is a dark and mysterious closet, the
Internet is a positively terrifying, sucking black hole. The advent of the
World Wide Web is helping to address part of the problem by making at least
the waystations on the Internet visible, but just the sheer immensity of
today's Cyberspace is frightening to all but a small group of people. Sure,
the kinds of tasks users attempt on their computers have become more
complex, but something else is leading to the increased difficulty of using
our machines, something we need to address: we are designing our
systems for power users, to the exclusion of everyone else.
POWER USERS VERSUS EXPERT USERS
Most people want to be seen as power users, but then we have the real thing. Power users typically consist of bipedal, testosterone-soaked life forms between the ages of 18 and 39. Yes, I said testosterone-soaked life forms. At the risk of offending certain politically correct parties, there does appear to be a difference, however minor, between boys and girls. And the overwhelming majority of power users I've come across are definitely male.
Let me explain what I mean by power user. A "power user" is a person driven by hormones to want complete and utter control of every function of his or her computer, even if having such control seriously degrades efficiency and productivity. Tim Allen's character on "Home Improvement," the ABC comedy series, is the prototypical power user. He's the only guy in the neighborhood with a 120-horsepower lawn mower that will do 0 to 60 in less than 7 seconds. It's not much use on his suburban lawn, but it makes a really neat noise when he starts it up.
I knew several guys at Apple who had so many weird public-domain extensions in their system folder that virtually none of their applications ran properly. Accomplishing the smallest task was like walking through a mine field. So what? As far as they were concerned, it merely increased the challenge! They wouldn't have thought of paring down their systems.
Most women see their machines as serious productivity tools, there for the express purpose of helping them accomplish their task. Women want to do their work, not "play computer" (Bulkeley, 1994). They are not alone: A high percentage of men don't want to "play computer" either; they just don't dare complain about it.
Many people across the board become expert users. Expert
users understand their craft and are competent at using the tools that will
help them succeed. They may have no interest in tearing apart their
tools, either to understand them or to "improve" them. It's the difference
between someone who is an expert at driving a car and someone who
looks forward to Saturday morning because that is when he can tear the
car apart and perhaps get it back together. The Saturday-morning
power user may very well not be particularly expert at driving the car
(although he will claim to be).
CHANGING TIMES
Thirty years ago computer users consisted of two classes: young male programmers and operators and powerless, minimum-wage females who endlessly keypunched 80-column cards. (Of course, not all keypunch operators were female. I was one of the few powerless male keypunch operators in those days. My cohorts and I quickly escaped, but the women were generally not so lucky.)
Today, two-thirds of personal computer users are women, according to a Logitec Inc. poll (1992), and millions of those female users are now in higher technical and management positions. Those who are not wandering the labyrinths of cyberspace today will be in the very near future.
The majority of users, according to the same poll, are now more
than 36 years old. Most of us above the age of 36male or
femalehave abandoned changing our own motor oil. And we are no longer quite
as amused by the prospect of spending 10 hours tracking down the
reason why our World Wide Web connection has become unresponsive
ever since we installed our new tax planner.
THE ECONOMIC PENALTY FOR "BOYTOYISM"
A 1992 survey by Nolan, Norton and Company pegged the annual cost of ownership for a standard business personal computer at as much as $21,500 per year. That's a lot of money for a $5,000 computer. Where's the money going? A disproportionate percentage can be traced to direct and indirect training costs. The study found that the known visible costs per computer ranged from $2,000 to $6,000 annually. These direct costs cover hardware, software, initial installation, scheduled maintenance, and people taking time from their regular jobs to attend training classes.
The indirect costs are more complex: users waste time pressing buttons and flailing through manuals trying to figure out what went wrong with their machine, when the problem is that they inadvertently triggered some unknown and less-than-obvious system state. They waste time wandering around looking for a warm body in another office to ask for help. Finally, when they find someone who can help, the other party ends up wasting their own time, too. This peer-to-peer training is expensive. The study pegged the cost at $6,000 to $15,000 per year.
According to Bulkeley (1992):
"We all just about fell out of our chairs when we saw the amount
of mutual support," says David J. Baker, a process consultant for
Sprint who participated in organizing the study. "Everyone knew
[peer-to-peer training] was taking place, but when we guessed what the
amount would be beforehand, we missed by a factor of 65."
SUPPORTIVE FAMILY AND FRIENDS
The effect of interface complexity can be felt beyond the workplace. Ordinary people's access to cyberspace is a direct function of their access to a family member or friend who can carry out informed peer-to-peer activities. Should the knowledgeable family member die, divorce, or grow up and move out, the other family members may lose their access to cyberspace with the first disk problem or mandatory upgrade. Even if their hardware and software systems continue to function, they may well be unable to gain the full benefit of services offered, just because they do not have the technical skills to access them.
Ordinary citizens who do not live in a high-tech area of the
country, who had no one to lean on from the beginning, are just as effectively
disenfranchised from the cyber revolution as those who are economically
disadvantaged.
RESEARCH ISSUES
While the solution to the problem of today's excessive complexity will involve applied technology, the problem has not arisen from a lack of technology, and it will not be solved by blindly throwing more technology at it. The solutions, I believe, lie more in the areas of sociology and psychology.
Macintosh used to have the slogan, "the computer for the rest of us." Macintosh was not. From the beginning, the Macintosh was designed to be "the computer for the rest of them." The Macintosh team, like the Lisa team, Alan Kay's Xerox PARC Altos team, and Doug Engelbart's original SRI team before them, was keenly aware that they were designing not for themselves but for others. All these teams held a common understanding of who their users were and chief in that understanding was a rock-solid belief that users were not like themselves.
Ten years later we are expecting ordinary citizens traveling on the World Wide Web to follow a naming convention so foreign to human experience as to be completely incomprehensible: http://www.goliath.com/~grandma.
We need research projects that will enable us to form a bridge between the needs of ordinary people and the inventiveness of our young technological minds. Several studies have shown startling differences between software engineers and ordinary citizens on Jungian psychological-type tests (Sitton and Chmelir, 1984; Tognazzini, 1992). These tests need to be repeated and expanded on, using larger populations and varied and more exacting instruments, answering the question: How are engineers different from ordinary citizens? Those engineers who do want to make technology accessible to ordinary people have little to go by since we still know remarkably little about our ordinary citizens. What are the capabilities, needs, and wants of ordinary citizens?
Today, software engineers master systems of amazing complexity during the course of their education and often graduate with the attitude that others can and should experience the same complexity they have: In what ways can we improve the education of our engineers so they are better able to understand and provide for the needs and wants of ordinary citizens?
Much of the complexity our computer science students face is necessarythey are doing complex things. However, much of it is just bad design, bad design they often end up emulating in their own products. How prevalent is bad design in the systems that computer science students use? What can be done to improve those systems? What can be done to sensitize our students to bad design and its consequences, so they will cease emulating it?
We need case studies of projects that resulted in approachable products or services versus those that only an engineer could love. What makes a project result in a system that people can use? What caused that project to succeed? What changes could have been made early on in projects that resulted in difficult-to-use systems that would have made them more approachable?
While many organizations have embraced the idea of human interaction design as a profession, many still see human interaction design as something to be done by engineers in the normal course of their work. What are the comparative outcomes of projects done in conjunction with human interaction designers versus engineers acting alone? Is the investment in human interface specialists worth it? Does that investment result in designs that are more approachable by ordinary citizens?
Our interfaces to the national information infrastructure must be accessible to ordinary people. The Star-Lisa-Macintosh interface made a fundamental shift in design away from the earlier "black cave" interfaces. People had previously been expected to navigate blindly through cyberspace, leaping from menu to menu, building in their mind's eye an image of what their cyberspace looked like. The new interfaces swept all of that aside. The "lights were turned on," with cyberspace objects and actions represented by icons, menu selections, and other visible objects in the interface. People no longer navigated at all: everything was brought to the user, with the user staying always in one placeseated before the desktop.
The Web represents a step backward to the old black cave metaphor. True, people can see one home page at a time, but they are back to navigating their way around cyberspace and, once again, can see no visual evidence of their movement. It's like the tunnel of love: you see a lot of objects jumping out at you, but you don't really know where you are.
In the early days of the personal computer, we were attempting to sell an unproven technology to a skeptical world. We could not depend on people investing weeks or months of self-education in a system they did not yet know would improve their lives. We had to make things easy. However, sometimes, what was easy in that first 20 minutes was not necessarily the right solution for maximum efficiency over the long haul.
People now recognize the value of the personal computer or workstation. They are willing to make a reasonable commitment toward learning. The interface of today does not necessarily need the training wheels that the Star, Lisa, and Macintosh provided, but we need research to find out how much we can increase the difficulty of the learning experience in an effort to further empower users. We need to establish the relevance today of the principles that drove the design of the original graphical user interfaces, as embodied in such lists as the Principles of Macintosh Design (Apple Computer, Inc., 1986). Which of the design principles for early graphical user interfaces represented "training wheels," and which represented needs, wants, and limitations of ordinary citizens that are just as important today as they were then?
Since the advent of those early graphical user interfaces, users have faced increasing complexity. What are the areas of today's technology that act as a barrier to ordinary citizens? What seeming complexities are not acting as barriers but are in fact embraced by ordinary citizens?
It takes 16 years to learn how to drive, with a lot of formal and informal education along the way. How much education should our ordinary-citizen children be receiving in school? What form should this education take? Based on an understanding that our children would be formally educated in information retrieval and other complex computer tasks, how far can we increase the learning burden for such tasks in our aim to improve overall efficiency and productivity?
Finally, we need to explore what, if anything, government or industry needs to do to bring simple power to our systems. Will competition by itself eventually result in approachable systems, or will we need an "Underwriters' Laboratory" type of institution that can certify our technological efforts? Will we need fast-moving standards organizations that can stay abreast of developments? If so, could there ever be any such thing as a fast-moving standards organization?
We have recently seen the result of the computer industry "putting its foot down" on the issue of the digital versatile disk (DVD, nee digital video disk). Instead of two competing systems being thrown on the market, there will be only one, and that one is better than either of the two that would have arrived. The DVD was also designed, rather than just kind of evolving. It had input from marketers, as well as engineers, marketers who actually went out and spoke to clients. Industry cooperation can work.
On the other hand, industry has had a miserable overall record
of cooperation and consistency, from the VHS versus Beta wars to the
Windows versus Macintosh wars, with ordinary citizens not only paying
the price along the way but often ending up with an inferior alternative in
the end. We know governmental supervision can work. We've seen it
with the standards for NTSC video, for "compatible color," and, more
recently, for HDTV (high-definition television) standards. The question always
is whether any of us will live long enough to see the results. How can
we achieve the certainty of governmental supervision with the
mercurial speed of industry cooperation?
SUMMARY
Many of the above explorations could be carried out by a variety of agenciesindustry, government, or academia. Who does what study is probably relatively unimportant. What is critical is that it become a matter of public policy that we make our national information infrastructure accessible to ordinary citizens as well as the technologically gifted. What is critical is that people drop the belief that the realm of cyberspace should rightly be the exclusive province of those boys who worked on cars, that cyberspace is by nature, not by design, a dark, dangerous, and forbidding place.
Today, every aspect of computers, from the out-of-the-box
experience to surfing the Internet, is a joy to "technoguys" and an unpleasant
challenge to ordinary citizens. We have the technology to make the
national information infrastructure accessible and attractive to the vast majority
of our citizens. The time has come to make the investment in research
and education that will enable all of our citizens to participate in the future.
REFERENCES
Apple Computer, Inc. (1986). The Apple Human Interface Guidelines, Addison-Wesley, Reading, Mass.
Bulkeley, William M. (1992). Study Finds Hidden Costs of Computing, Wall Street Journal, November 2.
Bulkeley, William M. (1994). A Tool for Women, A Toy for Men, Wall Street Journal, March 16.
Logitec Inc. (1992). PC's and People Poll, A National Compatibility Study of the Human Experience with Hardware, sponsored by Logitec Inc., 6505 Kaiser Drive, Fremont, CA 94555, (510) 795-8500.
Nolan, Norton and Company (1992). Managing End-User Computing, Nolan, Norton and Company, Boston.
Sitton, Sarah, and Chmelir, Gerard (1984). The Intuitive Computer Programmer, Datamation, October 15.
Tognazzini, Bruce (1992). Tog on Interface, Addison-Wesley, Reading, Mass.
Wylie, Margie (1995). "No Place for Women: Internet Is a Flawed Model for the Infobahn," Digital Media: A Seybold Report, Vol. 4, No. 8, pp. 3-6, January 2.
Ronald A. Cole
Oregon Graduate Institute of Science and Technology
Spoken-language systems allow people to communicate with machines by using speech to accomplish some task. The development of spoken-language systems is a true multidisciplinary endeavor, requiring expertise in areas of electric engineering, computer science, statistics, linguistics, and psychology. The technologies involved in spoken-language systems include speech coding, speech recognition, natural language understanding, and dialogue modeling and may optionally include speaker recognition, language identification, and speech-to-speech language translation.
Advances in speech technology are of critical importance to the goal of an every-citizen interface to the national information infrastructure (NII). To be sure, speech technology cannot by itself achieve this goal. Many people are unable to speak or hear. Some information (e.g., paintings) is not in a form that can be appreciated using speech. Nevertheless, the vast majority of people in the United States speak and understand a language, and speech is an obvious means for them to access information. As spoken-language technologies mature, we can imagine spoken-language systems performing as cooperative agents, not unlike helpful human operators, to support a wide variety of transactions. For this to take place, significant advances in the technology must occur through fundamental research.
A significant advantage of using speech as an interface modality is that it can be transmitted by existing communications networks using common and inexpensive devices such as telephones and televisions. Today, use of the Internet is limited to people with access to computers and the skills to use them. These requirements exclude a great many Americans: computers are too expensive for many of us to own, and about one-third of our citizens are functionally illiterate (National Center for Education Statistics, 1993). In the future, computers are unlikely to be the major appliance for accessing the NII; telephones, cellular phones, televisions connected to cable networks, and inexpensive information appliances are likely to become the preferred means of access.
The state of the art in human language technology was summarized recently in an international survey entitled State of the Art of Human Language Technology, sponsored by the National Science Foundation (NSF) and the European Community (Cole et al., 1996). Each of the 92 authors who contributed to the survey was asked to define a specific area of human language technology, review the state of the art in that area, and identify key research challenges. The survey is available on the World Wide Web at http://www.cse.ogi.edu/CSLU/HLTsurvey/HLTsurvey.html. A second source of information on the state of the art of speech technology is the report of a workshop sponsored by the NSF in 1992 and published subsequently as a journal article (Cole et al., 1995). The workshop participants identified eight areas in which research advances are essential to the development of spoken-language systems and the infrastructure needed to support research in those areas: (1) robust speech recognition, (2) automatic training and adaptation, (3) spontaneous speech, (4) dialogue models, (5) natural language response generation, (6) speech synthesis and speech generation, (7) multilingual systems, and (8) multi-modal systems.
Given the importance of speech technology to an every-citizen
interface and to U.S. economic competitiveness, it is important to ask if
the activities of the research community will produce the desired
technology in the shortest period of time. In the remainder of this note, I offer
my opinions about the major stumbling blocks to the development of
spoken-dialogue systems for an every-citizen interface. These are (1) an
insufficient focus on interactive systems by speech researchers, (2) limitations
of statistical modeling approaches, and (3) lack of tools for research
and technology transfer.
INSUFFICIENT FOCUS ON INTERACTIVE SYSTEMS
The defining feature of a spoken-dialogue system is the interaction between human and machine. It follows that progress in developing these systems requires the continued study of how people interact with machines using speech. Such studies will highlight the limitations of speech recognition technology in the context of system use and focus research efforts on ways to overcome these limitations.
Today, the primary focus of speech recognition research in the United States is not interactive systems but the transcription of words in continuous speech. Large-vocabulary continuous speech recognition (LVCSR) is a priority of the defense establishment, which plays a major role in defining the priorities of the speech research community. For the past 12 years, progress in speech recognition research has been measured by recognition performance on benchmark tasks in annual competitions. Current benchmark tasks include recognition of articles read from newspapers, recognition of speech in news broadcasts, and recognition of speech in telephone conversations.
Transcription of words in continuous speech is both important
and challenging, but the challenges are different from those needed to
produce spoken-dialogue systems. For example, research in LVCSR does
not focus on such issues as how to phrase a system prompt, how to
determine if a recognition error has occurred, or how to engage in
conversational repair if such a determination is made.
LIMITATIONS OF CURRENT TECHNOLOGY
There is growing evidence that current statistical modeling approaches to speech recognition, which treat speech as a sequence of independent time frames, will not scale to acceptable levels of performance on difficult tasks. For example, current systems are able to recognize about 50 percent of words in telephone conversations. This level of performance is achieved by gathering statistics on the frequency of occurrence of word sequences; performance drops to below 20 percent when word sequence constraints are disabled and word recognition is based solely on acoustic information. Significant effort has been devoted to this task in recent years, with only minor improvements in performance. The ability of statistical modeling techniques to recognize words in natural conversations is not encouraging.
A serious limitation of frame-based statistical modeling techniques is the difficulty of incorporating linguistic knowledge into the recognition paradigm. The IBM speech group, one of the pioneers of speech recognition using hidden Markov models, worked with linguists for several years to incorporate syntactic and semantic knowledge into IBM's systems, always with the same resultan increase in word recognition error rates. This led Bob Mercer, then of the IBM speech group, to assert in a keynote address to a speech recognition workshop that the most effective technique IBM has found for decreasing error rates is to fire a linguist.
The difficulty of incorporating linguistic knowledge into the
dominant research paradigm stands as a major stumbling block to progress.
Accurate speech recognition requires the integration of diverse
acoustic cues, such as stop bursts, format movements, changes in pitch and
comparison of acoustic features across segments. Similarly, speech
understanding requires the integration of these acoustic cues with
syntactic, semantic, pragmatic, and situational knowledge. No paradigm
exists today that allows these information sources to be combined in a
principled way that improves system performance. The result is that
those with the most knowledge about human communication and
spoken-language are largely excluded from the research process. New
paradigms are needed that enable psychologists and linguists to become vital
contributors to the development of human language technology.
LACK OF TOOLS FOR RESEARCH AND TECHNOLOGY TRANSFER
A final obstacle to progress in spoken-dialogue systems is the lack of available tools to support research and technology transfer. The development of spoken-language systems is a complex activity, requiring significant computer resources, integration of sophisticated signal processing, training and recognition algorithms, and language resources such as speech corpora and pronunciation dictionaries. Because of the resources and expertise required, spoken-language systems research is localized in a few specialized laboratories, which produce only five or six Ph.D. students each year. The result is that all but a few of the most fortunate students are denied the opportunity to participate in this exciting area of research, and we are not training enough researchers in an area of great strategic importance.
Without tools to create and manipulate spoken-dialogue systems and to support technology transfer, progress will be limited to the efforts of relatively few researchers at elite laboratories. For progress in spoken-language systems to occur, researchers need tools to rapidly design working systems and manipulate system parameters to test experimental hypotheses.
Despite these obstacles, I see great hope for the future. This
workshop recognizes the importance of interface technologies, as do an
increasing number of NSF initiatives in human language technology
sponsored jointly by the Defense Advanced Research Projects Agency
and other defense agencies. The growing support for interface research
is bringing new researchers and new ideas into the field. Some of
these researchers will focus their efforts on spoken-dialogues systems, and
some will produce more powerful recognition techniques that will limit
the amount of engineering required for each new task. There are also
efforts under way to develop and distribute tools to support research and
development of spoken-language systems. One such toolkit has been
released by the Center for Spoken-language Understanding at the Oregon
Graduate Institute (Sutton et al., 1996).
REFERENCES
Cole, R.A., L. Hirschman, et al. 1995. The Challenge of Spoken-language Systems: Research Directions for the Nineties, IEEE Transactions on Speech and Audio Processing, Vol. 3, No. 1, pp. 1-21.
Cole, R.A., J. Mariani, H. Uszkoriet, A. Zaenen, and V. Zue (Eds.). 1996. Survey of the State of the Art in Human Language Technology, Cambridge University Press, Stanford University, Stanford, Calif.
National Center for Education Statistics. 1993. Adult Literacy in America, U.S. Department of Education, technical report no. GPO 065-000-00588-3, U.S. Government Printing Office, Washington, DC, September.
Sutton, S., D.G Novick, R. Cole, P. Vermeulen, J. de Villiers, J. Schalkwyk, and M. Fanty. 1996. Building 10,000 Spoken-Dialogue Systems, to appear in Proceedings of the 1996 International Conference on Spoken-Language Processing, Philadelphia, Pa. (Information on the availability of the toolkit is provided on-line at http://www.cse.ogi.edu/CSLU/toolkit/toolkit.html.)
TOWARD AN EVERY-CITIZEN INTERFACE
Steven K. Feiner
Columbia University
INTRODUCTION
Building user interfaces to the national information infrastructure (NII) that can fulfill the needs of all users, rather than just a privileged subset, will be a difficult task. In this position paper, I state my understanding of what the NII will be, lay out a set of goals for future NII user interfaces, and describe some research issues and projects associated with these goals.
I take the NII to be the public medium supporting all forms of
interaction between people and machines that do not require the transport
of physical matter. A user's interactions with the NII are
accomplished through displays, interaction devices, and controlling software, which
together comprise a user interface. I expect that an interface's displays
and interaction devices would in most cases be the property of an individual
or private company, as would the facilities needed to communicate and
store information within a home, office, car, or pocket. The NII would
include public networks that carry information between these private facilities
and public sources of information and computation. Moreover, it would
also include public displays and interaction facilities (e.g., the global
positioning system (GPS) position tracking infrastructure), and software and
standards needed to make communication and interaction possible.
GOALS
I have tried to capture the properties that I believe user interfaces to
the NII should ideally have in the following list of high-level characteristics.
Multimedia
Interactions should be multimedia and multimodal, taking
maximal advantage of all our senses to communicate information effectively. I
intend this to go beyond the combination of graphics, video, text, sound,
and voice typically implied by popular usage of the term
multimedia to encompass the goals of research in virtual environments and visualization.
Adaptive
User interfaces should adapt to the needs and abilities of the
individual user and situation, interactively tailoring both the form and
content of the material being presented and providing customized help
when necessary. An adaptive user interface would take into account factors
as diverse as the user's education, skills, previous experience, and
physical capabilities or disabilities. Recognizing that many activities are long
lived, it should accommodate fluid and frequent changes in all aspects of
the environment: who, what, where, when, why, and how.
Integrated
Interaction through the NII should be integrated smoothly and
naturally into our daily activities, rather than being, as it is now, a
compartmentalized special-purpose activity accomplished only when sitting
in front of a workstation running special software. That is, the goal is
not just to get the NII into our homes but rather to get it into our lives. In
part this means mobility and wearability but without the compromises
that are built into current PDAs.
Collaborative
Many of the tasks we perform are group activities, not solitary
endeavors. NII user interfaces should support collaborative work and
play, regardless of whether users are collaborating in the same place or at
the same time.
Instructable
A user should be able to describe tasks that are to be carried
out through the NII. Assuming that the lowest-level steps are within
the capability of the resources available, the tasks should ideally be as
rich and complex as those that the user could describe to other people.
I hesitate to use the word "programmable" here to avoid the
implication that this should involve a conventional programming language, such
as Java, or even the simpler languages provided by current systems for
end-user programming.
Responsive
Large enough quantitative differences in performance can make
for qualitative differences in how a user interface feels and how it is used. Sufficient resources must be available to all users to allow certain
baseline tasks to be accomplished comfortably.
Empowering
Independent of a system's style (e.g., "invisible," "intelligent
agent," or "direct manipulation tool"), its users should be able to
accomplish more with it than without it and should feel a sense of satisfaction
in doing so.
RESEARCH ISSUES AND PROJECTS
Each subsection below provides a background overview, followed
by a selected set of issues and projects, keyed to the list of
characteristics presented above.
Multimedia
Background
Interpreting "multimedia" broadly, I see two major research subgoals here: developing user interfaces that support real-time interaction with true three-dimensional (3D) input/output devices (i.e., virtual environments or virtual worlds) and learning how to use these devices to present information effectively, a task known in the graphics community as visualization. I am partial to the term information visualization (Card et al., 1991), which is rapidly gaining currency (e.g., see Gershon and Eick, 1995) and which stresses the diversity of domains and users that can benefit beyond those targeted by research on scientific visualization. While visualization research embraces work that appeals to senses other than the visual, the term sonification has been used to refer explicitly to the ways in which information can be presented through sound (Kramer, 1994).
Most state-of-the-art commercial user interfaces emphasize the use of 2D Windows, with which users interact using 2D devices such as mice. Increasing CPU (central processing unit) power, combined with the popularization of VRML (Virtual Reality Modeling Language; VRML, 1996) and the introduction of low-priced sound, video, and 3D graphics cards, is transforming personal computers into 3D multimedia workstations. The results thus far are evolutionary: 3D graphics appear in 2D windows and are manipulated under mouse control. Research in 3D user interfaces extends beyond this to address the use of interactive 3D graphics, audio, and haptics, presented with true 3D stereo displays and 3D interaction devices that monitor the user's actions in three-space. The goal is to harness the physiological capabilities and training that enable us to perform physical tasks effectively in 3D, and apply them to develop effective user interfaces for visualizing and accomplishing computer-based tasks.
Issues
We must develop real-time operating systems support for highly parallel asynchronous input (from large numbers of 3D trackers) and output (to multiple-display modalities). We need to build effective "augmented realities" (Caudell and Mizell, 1992; Bajura et al., 1992; Feiner et al., 1993b) that enrich the user's existing environment with additional information, merging synthesized material with what the user normally sees, hears, and feelsoverlaying or replacing it, as appropriate. We need to develop display and interaction device hardware that matches our abilities better than the current offerings do, including high-quality, high-resolution, wide-field displays (e.g., graphics, sound, force, temperature) and tracking (e.g., hand, body, eye). For example, there is a need for lightweight, comfortable, high-quality, "see-through" displays for use in augmented realities. A general-purpose see-through display technology would allow differential visual accommodation, corresponding to real and virtual objects at different distances in the same image. It would also perform full visible-surface determination with all objects, real and virtual: virtual objects should be able to occlude real objects, and real objects should be able to occlude virtual ones.
How can we map abstract task domains effectively to a 3D environment in which we can visualize and manipulate objects in the domain? How can we take advantage of the richness of 3D gesture to reduce our reliance on icons to express actions in current user interfaces? For example, rather than moving an item to the trash can, could we dispose of it by using an appropriate gesture?
In a world of whole-body computer interaction, there may no longer be any distinction between human factors in general and the human factors of computer interfaces. The existing hardware that limits our capabilities (and that also limits our mistakes) will be gone, making it possible to create user interfaces that are both far better and far worse than anything we can create now. How can we ensure that 3D user interfaces are usable, especially in an environment that supports end-user programming and customization?
Projects
Much of this work is and should be multidisciplinary. For
example, the design of display and interaction device hardware and
software should be informed by research in human psychophysics. The design
of user interface software should draw on disciplines that have long
explored the design and use of 3D space, such as architecture,
industrial design, theater (Laurel, 1993), and dance.
Adaptive
Background
This goal is to develop approaches that make it possible for user interfaces to adapt interactively to the needs of the current user, situation, and hardware. Adaptive multimedia user interfaces should be able to design and present information to people through multiple output media and understand user input provided through multiple input media. They should be able to adapt to the user's work mode, be it direct manipulation and exploration or passive observation.
Issues
To design high-quality adaptive multimedia user interfaces, we must first be able to design ones that function well in a single medium. We must be able to perform high-quality automated generation and understanding of individual media, ranging from those that have long been explored by artificial intelligence researchers (e.g., written text and speech) to less well-charted terrain (e.g., graphics, audio, haptics).
How can we predict and evaluate presentation quality? A system should be able to predict the quality of a presentation in the course of designing it. Based on these predictions, it should be able to refine the presentation until it is adequate. This requires the ability to evaluate the presentation (estimating how it will affect the user) and evaluating the user's response (estimating how it has affected the user). The ability to evaluate the presentation makes possible time-quality tradeoffs. For example, if our time is limited, we might prefer a "rush job" now over a higher-quality presentation later.
Temporal media are those in which information content is presented over time in a way that is controlled explicitly by the producer (Feiner et al., 1993a), such as animation, speech, and audio. We must develop generation and understanding capabilities for temporal media. Issues include how to "phrase" information (e.g., for maximal comprehension). For example, we must develop the ability to generate output and understand input that communicates complex temporal relationships. If a system can convey the relative order of actions, information need not be provided in chronological order (e.g., presenting the most important information first, as in a newscast).
We must develop facilities for coordinated generation and understanding of multiple media. The key challenge is to assure that material in different media reinforce, rather than interfere with, each other. Multimedia presentations must be temporally coordinated (especially when using temporal media such as animation and speech), so that information presented in all media is synchronized.
Given the ever-increasing amount of information bombarding us, automated multimedia generation offers the potential for automated summarization, selecting the material most relevant to a user's needs and presenting it in a way that meets their time constraints.
We must develop models that can be used as a basis for customizing the interaction between users and systems so that information is presented and obtained as effectively as possible. These models must represent:
Situations. Model current situations (e.g., routine versus crisis, individual work versus multiuser interaction).
Rather than requiring that these models be static, they should be able to be updated on the fly. Difficult problems here include being able to determine how a user is affected by a presentation. For example, can the system determine whether a user has actually learned the material that an explanatory presentation is intended to communicate? Ideally, it should be able to do this based on the user's normal interactions with the system, without requiring explicit testing.
We need to develop the facilities to model the rhetorical structure of multimedia dialogues for real, complex multiuser tasks. This includes what a user tells the system, what the system tells the user, and what users tell other users, in addition to what the user(s) and system(s) each believes the others have communicated. By studying current multimedia interactions and developing cognitive models that account for how information is being communicated among participants, we can lay the groundwork for developing rules for generating and understanding multimedia.
Projects
This research would center in the artificial intelligence and
human-computer inferface communities, especially in the fields of
multimedia generation and understanding (Sullivan and Tyler, 1991; Maybury,
1993) and modeling of users (Kobsa and Wahlster, 1989) and how they
perform tasks (Card et al., 1983).
Integrated
Background
Integration of the NII into our lives will mean, in part, accommodating users who are mobile, and who use the NII as they move about. As displays of all sizes proliferate, this will also spell the end of the one-user, one-display metaphor that underlies so many current systems. We need to support interaction in a world in which there are many displays and interaction devices: handheld, head-worn, desktop, and wall-mounted. Some will be private, others public (or at least shared). As users walk about, they will move into and out of the presence of some of these peripherals and of other users. We need to build user interfaces that exploit this rich and constantly changing combination of peripherals.
Issues
Drawing an analogy to window management, the term environment management has been used (MacIntyre and Feiner, 1996) to describe the idea of managing large numbers of objects on large numbers of displays. This is a difficult task: unlike the one or two displays that most window managers typically control, the environment may be continually changing as users and resources move and may include displays and devices that are shared, such as a wall-mounted hallway display. From the user's standpoint, however, environment management should ideally be easier than the current task of window management. This could be possible if environment management were to be carried out through systems that used knowledge of the user's needs and effective information presentation approaches to determine how to structure the surrounding information environment.
Projects
Research projects should build on ongoing research in mobile
computing, wearable computing, ubiquitous computing (Weiser, 1991),
and augmented reality.
Collaborative
Background
User interfaces should support collaborative problem solving and interaction among multiple people and computers cooperating in the same task or in coordinated tasks.
Issues
We must design systems that account for the personal presentation needs of individual users while allowing for communication among users based on material they have been presented in common. An important problem here is how is to accommodate users who have been presented with different information and who would like to refer to the presentation as they interact with each other. The system might serve as an intelligent "go-between" that mediates between users so that references made by one user to what she has been presented are translated into references to what another user has been presented.
The NII has the potential to help create a strong sense of national (and global) community. Consider the information infrastructure provided by a residential street, town square, or college dorm hallway. By encouraging citizens to interact with others across the country and providing information about our country's workings on the NII, we could foster a better understanding of how people depend on each other and ultimately provide more opportunity for an informed populace to participate in government. Imagine, for example, a multimedia SimCity-like virtual environment that modeled the country's economy and supported collaborative attempts to see how it responded to different situations and assumptions.
Projects
There are separate communities of researchers in
computer-supported cooperative work (many of whom concentrate on the design of
multimedia systems in the popular sense of the term) and in distributed
multiuser virtual environments. Joint research projects could be especially
fruitful here.
Instructable
Two key research areas for the creation of instructable systems are programming by demonstration and agents.
Programming by Demonstration
Background. Research in end-user programming attempts to develop ways for end users to "program" an application's behavior without the overhead of learning or using a conventional programming language. One promising line of research is "programming by demonstration," in which users demonstrate the tasks to be performed using the application's interface (Cypher, 1993). A simple example is the keyboard macrofacility in e-macs: the user can specify that a series of keystrokes issued in the course of editing should be saved (and optionally named and bound to a key), so that it can be applied again, typically at another place in the document being edited. Since the demonstration is a specific example, if it is to be applied to other situations, it must be generalized. In the case of the e-macs keyboard macro, generalization is usually achieved solely by using keystroke commands that operate relative to the current position in the file.
An allied notion is that of having the system learn patterns in the user's behavior and volunteer to complete some recognized pattern when it guesses that the user has begun to perform it. Existing research systems monitor the user's interactions during a session, can present "graphical histories" of a session, allow the creation of macros using about-to-be-executed (or previously executed) commands, and can perform primitive inferencing to support simple generalization.
Issues. Most existing end-user programming facilities rely on simple straight-line flow of control (or escape into conventional programming syntax to perform all but the simplest conditionals and loops). How can end-user programs allow complex flow of control without looking and feeling like conventional programming? How can they incorporate multimodal interaction into the programming user interface itself?
How can we generalize demonstrational programs in a way that minimizes the amount of end-user involvement while maximizing the places where the system guesses right? When should generalization be performedat program creation time, at execution time? What sources of information can be used?
If large numbers of user-developed programs exist, how can the user find the ones that are relevant to some specified task? How can the user determine what each does (without necessarily having to execute it)? Note that this is a particularly difficult example of on-line search: the user isn't looking for a match on a text string but rather on a set of capabilities, which may be implicit in the program.
How can a user-developed program be modified? How can one develop an end-user programming capability that intrinsically supports cooperation among multiple users and systems?
Projects. Most work in this domain has concentrated on 2D
user interfaces. I think there is much to be gained in trying to take
advantage of interactive multimedia and 3D in the design of the language itself.
Agents
Background. One kind of instructable interface is based on the metaphor of an "agent" that carries out a task on the user's behalf, often using knowledge and abilities that the user may not have herself. There has been a fair amount of heated debate in the human-computer interface community, pitting proponents of agent-based user interfaces against those who favor direct-manipulation user interfaces. Among the arguments against agents are claims that people may prefer interfaces over which they feel they have direct control and that agent-based interfaces are being unfairly touted as having some responsibility for their actions beyond that of their programmer or user.
Issues. I believe that much of the controversy is due to the popular conception of the agent as a busy-body anthropomorphic assistant, in the manner of the nattering bow-tied helper in Apple's "Knowledge Navigator" videotape. While the argument has been made that users will not want to sacrifice control to such agent-based systems, people willingly give up control in other matters that do not involve computers. For example, although it is common to compare the relative ease of using cars and computers, consider instead the car's predecessors: horses, mules, and donkeys. Environmental issues aside, would you really send even an experienced driver hurtling down the Grand Canyon's trails on a motorcycle? Yet each year thousands of folks with no previous riding experience travel those same trails safely on mules. A mule's rider exerts only discretionary high-level control with regard to general speed and direction, especially so for inexperienced riders. Riders are even told that, if acrophobia sets in, they should just close their eyes, hold on, and let the mule find its waythe original intelligent user interface. Mules are hardly anthropomorphic (although the reverse is sometimes true), yet they are possessed of skills and abilities that we don't have. While we may be amazed at how much more surefooted they are than us, we find this reassuring, not intimidating. Instead of asking why computers can't be more like cars, perhaps we should ask why computers can't be more like mules.
Projects. There is already overlap between the
programming-by-demonstration and agent-based systems communities, particularly in
addressing how agents can be instructed to perform tasks. Coordinated projects
could address how users would determine what these systems can do
(including what they can learn and what they already know). There is also
potential for joint research with the multimedia generation community.
Responsive
Background
The goal is to build systems that can utilize the power available throughout the NII in a way that doesn't compromise the responsiveness of the user interface.
Issues
Resources needed to make a responsive system include not only network bandwidth and computational power but also appropriately sized and sited storage. While we can assume that users will have personal storage space at home, permanent or temporary mirroring of material at additional sites throughout the NII might be able to significantly decrease network load and response time. For example, we might have a system of large public storage caches located throughout the country to provide users with relatively local copies of frequently referenced material. This could include both conventional "mirror" sites and caches of currently accessed material controlled by some automated paging strategy. This could be the next tier in a caching system that would include the individual local memory and disk caches of current Web browsers.
Projects
Many of the issues here build on research being done in the
systems (OS and distributed systems) and multimedia storage/transport
communities.
Empowering
Background
Plainly put, we need to study the kinds of things that people do and determine how the NII can best assist in doing them. In part, this will involve building the models of activities mentioned previously.
Issues
I trust that falling prices will ultimately put any technology that has the potential to be popular within the reach of all. This has happened with television, microwave ovens, Walkman-style tape players, digital watches, and compact-disc changers. It is about to happen with computers, be they net-tops, set-tops, palm-tops, or something else. Unlike fixed-function devices, however, computers (in particular, computer programs) have an essentially unlimited potential to confuse and intimidate. While much of this potential can be mitigated through better user interface design, there is no substitute for users having the right skills and mindset. Even if we can build powerful systems that are truly "self-teaching," users will still need time to learn how to use them effectively. We need to ensure not only that computer skills (whatever that might mean in the future) are taught in school but also that there is ample opportunity and time for people who are not in school to acquire them.
Projects
Experimental studies and model building by academic and
industrial researchers address only one part of the problem. Enlightened social
and governmental policies also will be key.
REFERENCES
Bajura, M., Fuchs, H., and Ohbuchi, R. 1992. Merging Virtual Objects with the Real World: Seeing Ultrasound Imagery Within the Patient. Computer Graphics 26(2):203-210.
Card, S., Moran, T., and Newell, A. 1983. The Psychology of Human-Computer Interaction. Lawrence Erlbaum Associates, Hillsdale, N.J.
Card, S.K., Robertson, G.G., and Mackinlay, J.D. 1991. Proceedings of the Computer Human Interactions: Human Factors in Computing Systems, pp. 181-188. The Information Visualizer, An Information Workspace. New Orleans, La., April 28-May 2.
Caudell, T., and Mizell, D. 1992. Augmented Reality: An Application of Heads-Up Display Technology to Manual Manufacturing Processes, Proceedings of the Hawaii International Conference on System Science, January.
Cypher, A. (Ed.). 1993. Watch What I Do: Programming by Demonstration, MIT Press, Cambridge, Mass.
Feiner, S., Litman, D., McKeown, K., and Passonneau, R. 1993a. Towards Coordinated Temporal Multimedia Presentations. Intelligent Multimedia Interfaces, M. Maybury (Ed.), pp. 139-147. AAAI/MIT Press, Menlo Park, Calif.
Feiner, S., MacIntyre, B., and Seligmann, D. 1993b. Knowledge-Based Augmented Reality. Communications of the ACM 36(7):52-2.
Gershon, N., and Eick, S. (Eds.). 1995. Proc. Information Visualization '95. IEEE Computer Society Press, Los Alamitos, Calif.
Kobsa, A., and Wahlster, W. (Eds.). 1989. User Models in Dialogue Systems. Springer-Verlag, Berlin.
Kramer, G. (Ed.). 1994. Auditory Display: Sonification, Audification, and Auditory Interfaces. Addison-Wesley, Reading, Mass.
Laurel, B. 1993. Computers as Theatre. Addison-Wesley, Reading, Mass.
MacIntyre, B., and Feiner, S. 1996. Future Multimedia User Interfaces. Multimedia Systems.
Maybury, M. (Ed). 1993. Intelligent Multimedia Interfaces. AAAI/MIT Press, Menlo Park, Calif.
Sullivan, J., and Tyler, S. (Eds.). 1991. Intelligent User Interfaces. Addison-Wesley, Reading, Mass.
VRML (Virtual Reality Modeling Language). 1996. The VRML Forum (available on-line at http://www.vrml.org/www-vrml).
Weiser, M. 1991. The Computer for the 21st Century. Scientific American 265(3):94-104.
NOMADICITY, DISABILITY ACCESS, AND
THE EVERY-CITIZEN INTERFACE
Gregg C. Vanderheiden
University of Wisconsin-Madison
THE CHALLENGE
With the rapid evolution of the national information infrastructure (NII) and the global information infrastructure (GII), attention has turned to the issue of information equality and universal access. Basically, if information systems become as integral to our future life-styles as electricity is today, access to these systems will be essential for people to have equal access to education, employment, and even daily entertainment or enrichment activities.
Although the goal of equal access seems noble, it can seem somewhat less achievable when one considers the full range of abilities or disabilities which must be dealt with to achieve an every-citizen interface. It must be usable even if people
have any combination of these difficulties (e.g., are deaf-blind; have reduced visual, hearing, physical, or cognitive abilities, which occurs in many older individuals).
In addition, the products and their interfaces must remain
equally efficient and easy to use and understand for those who (1) have no
problems seeing, hearing, moving, remembering, and so forth; and (2)
are power users.
Is It Possible?
A list like this can bring a designer up short. At first blush, it
appears that even if such an interface was possible it would be impractical
or inefficient to use for people with all of their abilities intact. Packages
such as the EZ Access approach developed for kiosks
(http://trace.wisc.edu/world/kiosks), PDAs (personal digital assistants), and other
touchscreen devices, however, demonstrate how close we can come to such an ideal,
at least for some types of devices or systems. Using a combination of
Talking Fingertip and Speed List technologies, the EZ Access package
(for information, see
http://trace.wisc.edu/text/kiosks/minimum.html) provides efficient access for individuals with low vision, blindness,
and poor or no reading skills. A ShowSounds/caption feature provides
access for individuals with hearing impairments or deafness, as well
as access for all users in very noisy locations. An infrared link allows
the system to be used easily with alternate displays and controllers, so
that even individuals who are deaf-blind or paralyzed can access and use
the system. Thus, with a relatively modest set of interface variations,
almost all the needs listed above can be addressed.
Is It Practical?
Practicality is a complex issue which involves cost, complexity, impact on overall marketability, support, and so forth. To use the EZ Access approach as an example, the hardware cost to provide all of these dimensions of accessibility to a standard multimedia kiosk is less than