National Academies Press: OpenBook

More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure (1997)

Chapter: 3 Input/Output Technologies: Current Status and Research Needs

« Previous: 2 Requirements for Effective Every-Citizen Interfaces
Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×

Page 71

3—
Input/Output Technologies:
Current Status And Research Needs

Meeting the every-citizen interface (ECI) criteria described in Chapter 2 will require advances in a number of technology areas. Some involve advances in basic underlying display and interface technologies (higher-resolution visual displays, three-dimensional displays, better voice recognition, better tactile displays, and so on). Others involve advances in our understanding of how to best match these input/output technologies to the sensory, motor, and cognitive capabilities of different users in different and changing environments carrying out a wide variety of tasks. But the new interfaces will need to do more than just physically couple the user to the devices. To meet these visions, the interfaces must have the ability to assist, facilitate, and collaborate with the user in accomplishing tasks.

Subsequent chapters address interface design-the creation of interfaces that make the best-possible use of these human-machine communications technologies-and system attributes that lie beneath the veneer of the interface, such as system intelligence and software support for collaborative activities. This chapter examines the current state and prospective advances in technology areas related directly to communication between a person and a system-hardware and software for input (to the system) and output (to a human). The emphasis is on technical advances that, if implemented in well-designed systems (as stressed in Chapter 4), hold the potential to expand accessibility and usability to many more people than at present. The discussion includes a cluster of speech input/output technologies; natural language understanding (including restricted languages with limited vocabularies); keyboard input; gesture recognition

Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×

Page 72

and machine vision; auditory and touch-based output; interfaces that combine multiple modes of input and output; and visual displays, including immersive or virtual reality systems. Because the ECI challenge involves connecting to the information infrastructure, rather than just to stand-alone systems, this chapter reviews the current status of and research challenges for interfaces for systems in large-scale national networks. The chapter ends with the steering committee's conclusions, based on workshop discussions and other inputs, about the research priorities to advance these technologies and our understanding of how to use them to support every citizen.

Framing The Input/Output Discussion-Layers Of Communication

The interface is the means by which a user communicates with a system, whether to get it to perform some function or computation directly (e.g., compute a trajectory, change a word in a text file, display a video); to find and deliver information (e.g., getting a paper from the Web or information from a database); or to provide ways of interacting with other people (e.g., participate in a chat group, send e-mail, jointly edit a document). As a communications vehicle, interfaces can be assessed and compared in terms of three key dimensions: (1) the language(s) they use, (2) the ways in which they allow users to say things in the language(s), and (3) the surface(s) or device(s) used to produce output (or register input) expressions of the language. The design and implementation of an interface entail choosing (or designing) the language for communication, specifying the ways in which users may express ''statements" of that language (e.g., by typing words or by pointing at icons), and selecting device(s) that allow communication to be realized-the input/output devices.

Box 3.1 gives some examples of choices at each of these levels. Although the selection and integration of input/output devices will generally involve hardware concerns (e.g., choices among keyboard, mouse, drawing surfaces, sensor-equipped apparel), decisions about the language definition and means of expression affect interpretation processes that are largely treated in software. The rest of this section briefly describes each of the dimensions and then examines how they can be used to characterize some currently standard interface choices; the remainder of the chapter provides an examination of the state of the art.

Language Contrasts and Continuum

There are two language classes of interest in the design of interfaces: natural languages (e.g., English, Spanish, Japanese) and artificial languages

Page 73

BOX 3.1 Layers of Communications

1.

Language Layer

 

Natural language: complex syntax, complex semantics (whatever a human can say)

 

Restricted verbal language (e.g., operating systems command language, air traffic control language): limited syntax, constrained semantics

 

Direct manipulation languages: objects are "noun-like," get "verb equivalents" from manipulations (e.g., drag file X to Trash means ''erase X"; drag message onto Outgoing Mailbox means "send message"; draw circle around object Y and click means "I'm referring to Y, so I can say something about it.")

2.

Expression Layer

 

Most of these types of realization can be used to express statements in most of the above types of languages. For instance, one can speak or write natural language; one can say or write a restricted language, such as a command-line interface; and one can say or write/draw a direct manipulation language.

 

Speaking: continuous speech recognition, isolated-word speech recognition

 

Writing: typing on a keyboard, handwriting

 

Drawing

 

Gesturing (American Sign Language provides an example of gesture as the realization (expression layer choice) for a full-scale natural language.)

 

Pick-from-set: various forms of menus

 

Pointing, clicking, dragging

 

Various three-dimensional manipulations-stretching, rotating, etc.

 

Manipulations within a virtual reality environment-same range of speech, gesture, point, click, drag, etc., as above, but with three dimensions and broader field of view

 

Manipulation unique to virtual reality environment-locomotion (flying through/over things as a means of manipulating them or at least looking at them)

3.

Devices

 

Hardware mechanisms (and associated device-specific software) that provide a way to express a statement. Again, more than one technology at this layer can be used to implement items at the layer above.

 

Keyboards (many different kinds of typing)

 

Microphones

 

Light pen/drawing pads, touch-sensitive screens, whiteboards

 

Video display screen and mouse

 

Video display screen and keypad (e.g., automated teller machine)

 

Touch-sensitive screen (touch with pen; touch with finger)

 

Telephone (audible menu with keypad and/or speech input)

 

Push-button interface, with different button for each choice (like big buttons on an appliance)

 

Joystick

 

Virtual reality input gear-glove, helmet, suit, etc.; also body position detectors

Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×

There was a problem loading page 73.

Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×

Page 74

(e.g., programming languages, such as C++, Java, Prolog; database query languages, such as SQL; mathematical languages, such as logic; command languages, such as cshell provides). Natural languages are derived evolutionarily; they typically have unrestricted and complex syntax and semantics (assignment of meaning to symbols and to the structures built from those symbols). Artificial languages are created by computer scientists or mathematicians to meet certain design and functional criteria; the syntax is typically tightly constrained and designed to minimize semantic complexity and ambiguity.

Because an artificial language has a language definition, construction of an interpreter for the language is a more straightforward task than construction of a system for interpreting sentences in a natural language. The grammar of a programming language is given; defining a grammar for English (or any other natural language) remains a challenging task (though there are now several extensive grammars used in computational systems). Furthermore, the interactions between syntax and semantics can be tightly controlled in an artificial language (because people design them) but can be quite complex in a natural language.1,2

Natural languages are thus more difficult to process. However, they allow for a wider range of expression and as a result are more powerful (and more "natural"). It is likely that the expressivity of natural languages and the ways it allows for incompleteness and indirectness may matter more to their being easy to use than the fact that people already "know them." For example, the phrase, "the letter to Aunt Jenny I wrote last March," may be a more natural way to identify a letter in one's files than trying to recall the file name, identify a particular icon, or grep (a UNIX search command) for a certain string that must be in the letter. The complex requests that may arise in seeking information from on-line databases provide another example of the advantages of complex languages near the natural language end of this dimension. Constraint specifications that are natural to users (e.g., "display the protein structures having more than 40 percent alpha helix'') are both diverse and rich in structure, whereas menu- or form-based paradigms cannot readily cover the space of possible queries. Although natural language processing remains a challenging long-range problem in artificial intelligence (as discussed under "Natural Language Processing" below in this chapter), progress continues to be made, and better understanding of the ways in which it makes communication easier may be used to inform the design of more restricted languages.

However, the fact that restricted languages have limitations is not, per se, a shortcoming for their use in ECIs. Limiting the range of language in using a system can (if done right) promote correct interpretation by the system by limiting ambiguity and allowing more effective communication.

Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×

Page 75

For instance, the use of domain- and task-specific restricted languages for certain applications of speech recognition systems has produced results, allowing people to use speech to communicate when they cannot see (either because they are limited by the communication device being used, such as the telephone, or because of physical impairment). Radiologists' workstations, for example, allow the use of speech as the primary means of inputting reports on X-rays or other radiographic tests. Direct manipulation languages may be ideal if there is a close match to what the user wants to do (and hence is able to "say"), that is, if the user's needs are anticipated and the user will not need to program or alter what the system does; they can be a robust means of control that limits the risk of system crashes from misdirected user actions.

In short, the design of an interface along the language dimension entails choices of syntax (which may be simple or complex) and semantics (which can be simple or complex either in itself or in how it relates to syntax). More complex languages typically allow the user to say more but make it harder for the system to figure out what a person means.

Expression Contrasts

A natural language sentence can be spoken, written, typed, gestured, or selected from a menu. An artificial language statement also can be spoken, written, typed, gestured, or selected from a menu.

Language expression can take many forms, generally differentiated as being more or less continuous or involving selection from a set of options (e.g., a menu). Speaking can involve isolated words or continuous speech recognition. Writing can involve handwriting or typing; drawing can be free form or can use prespecified options. Gesturing-independently or to manipulate objects-can be free form, can involve a full-scale natural language (e.g., American Sign Language), or can involve a more restricted set of prespecified options (e.g., pointing, dragging, stretching, rotating). Virtual reality and other visualization techniques represent a multimedia form of expression that may involve speech, gesture, direct manipulation, and haptic and other elements.

Thus, the different ways of saying things in a language may also be divided into two structural categories-free form and structured-and several different realization categories: typing, speaking, pointing. Free-form expression is usually more difficult to process than structured expression. For example, a sentence in natural language can be spoken "free form" (this is what we usually think of with natural language), or it might be specified by picking one word at a time out of a structured menu.3 In the structured form the system can control what the user gets to choose to "say" next, and so it is much easier for a system to interpret

Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×

Page 76

and handle. Within a given form, some means of realization may be easier to handle than others (e.g., correctly typed words are easier to interpret than handwritten words; freehand drawings are more difficult than structured CAD/CAM (computer-aided design/computer-aided manufacturing) diagrams). It is also important to note that more structured systems may be preferable for certain applications, such as those involving completion of forms (Oviatt et al., 1994).

Menu/icon systems thus provide an alternative way of expressing command-like languages. They have underlying languages, typically very much like command languages. The commands (natural language verb equivalents) are often menu items (e.g., "select," "edit"); the parameters (natural language noun equivalents) are icons (or open files); and the statements (natural language sentence equivalents) are sequences of select "nouns" and "verbs." The menus and icons provide the structure within which a user can say something in the language.

Devices

The hardware realization of communication can take many forms; common ones include microphones and speakers, keyboards and mice, drawing pads, touch-sensitive screens, light pens, and push buttons. The choice of device interacts with the choice of medium: display, film/videotape, speaker/audiotape, and so on. There may also be interactions between expression and device (an obvious example is the connection between pointing device (mouse, trackball, joystick) and pull-down menus or icons). On the other hand, it is also possible to relax some of these associations to allow for alternative surfaces (e.g., keyboard alternatives to pointers, aural alternatives to visual outputs). Producing interfaces for every citizen will entail providing for alternative input/output devices for a given language-expression combination; it might also call for alternative approaches to expression.

Comparisons Among Graphical User Interfaces, Natural Language, and Speech

The language-expression-device framework can be used to gain perspective on current standard interface types and on the research opportunities and challenges presented by ECIs. For example, it makes clear that natural language processing and speech recognition (and other technologies that may be associated colloquially) introduce different issues and different tradeoffs. A speech-based interface such as AT&T's long-distance voice recognition system, which can recognize phrases such as "collect call" and "calling card,"4 can combine a restricted language with

Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×

Page 77

speech as a means of expression. As this example illustrates, neither speech recognition with unlimited vocabulary nor complete/comprehensive language understanding is necessary to provide natural language-like input to a system within a restricted domain and task. Similarly, it is possible to improve restricted language interfaces by applying principles from natural language communication.

Current graphical user interface/menu/icon systems tightly constrain what one can say, both by starting with a very constrained language and by having a structured way in which one can express things in that language. They are at the opposite end of both the language and the expression spectrum from natural languages. It is thus clear why they are easier to process, but also why they are more constraining (Cohen and Oviatt, 1994).

Ongoing efforts to develop speech interfaces for Web browsers provide a concrete example of the importance of understanding the different tradeoffs of each of these dimensions. Choosing speech on the expression layer rather than pointing and clicking would lead to being able to "speak" the icons and hyperlinks that are designed for keyboard and mouse. Although this may suffice in certain settings-replacing one modality for another can be useful in hands-free contexts and for those with physical limitations-it does not necessarily expand a system's capabilities or lead to new paradigms for interactions. An alternative approach would be to explore how spoken language technology can expand the user's ability to obtain the desired information easily and quickly from the Web, leading to a different, probably more expressive, language. From this perspective, speech would augment rather than replace the mouse and keyboard, and a user would be able to choose among many interface language-expression options to achieve a task in the most natural and efficient manner.

Natural language interaction is particularly appropriate when the information space is broad and diverse or when the user's request contains complex constraints. Both of these situations occur frequently on the Web. For example, finding a specific home page or document now requires remembering a universal resource locator, searching through the Web for a pointer to the desired document, or using one of the keyword search engines available. Current interfaces present the user with a fixed set of choices at any point, of which one is to be selected. Only by stepping through the offered choices and conforming to the prescribed organization of the Web can users reach the documents they desire. The multitude of indexes and meta-indexes on the Web is testimony to the reality and magnitude of the problem. The power of a human/natural language in this situation is that it allows the user to specify what information or document is desired (e.g., "Show me the White House home page," "Will it rain tomorrow in Seattle?" or "What is the ZIP code for

Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×

Page 78

Orlando, Florida?") without having to know where and how the information is stored. A natural language, regardless of whether it is expressed using speech, typing, or handwriting, offers a user significantly more power in expressing constraints, thereby freeing the user from having to adhere to a rigid, preconceived indexing and command hierarchy.

In examining the state of the art of various input/output technologies, it is important to recognize that no single choice is right for all interfaces. In fact, one of the major challenges of interface design may be designing a language that is powerful enough for a user to say what needs to be said, but in as constrained a manner as possible, while still having the power to make processing easier and the possibility of misinterpretation less likely. In looking at input/output options, it will be useful to keep in mind where various options fall on one or another of these scales and the tradeoffs implicit in choosing a given option.

Technologies For Communicating With Systems

Humans modulate energy in many ways. Recognizing that fact allows for exploration of a rich set of alternatives and complements-at any time, a user-chosen subset of controls and displays-that a focus on simplicity of interface design as the primary goal can obscure. Current direct manipulation interfaces with two-dimensional display and mouse input make use, minimally, of one arm with two fingers and a thumb and one eye-about what is used to control a television remote. It was considered a stroke of genius, of course, to reduce all computer interactions to this simple set as a transition mechanism to enable people to learn to use computers without much training. There are no longer any reasons (including cost) to remain stuck in this transition mode. We need to develop a fuller coupling of human and computer, with attention to flexibility of input and output.

In some interactive situations, for example, all a computer or information appliance needs for input is a modulated signal that it can use to direct rendered data to the user's eyes, ears, and skin. Over 200 different transducers have been used to date with people having disabilities. In work with severely disabled children, David Warner, of Syracuse University, has developed a suite of sensors to let kids control computer displays with muscle twitches, eye movement, facial expressions, voice, or whatever signal they can manage to modulate. The results evoke profound emotion in patients, doctors, and observers and demonstrate the value of research on human capabilities to modulate energy in real time, the sensors that can transduce those energies, and effective ways to render the information affected by such interactions.

Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×

Page 79

The state of the art in a range of technologies for communicating with systems is reviewed below. Also addressed are the device and expression layers of the model described in the previous section and summarized in Box 3.1. The choice of language-natural, restricted, or direct manipulation-influences but does not dictate the technologies discussed here. The exception is the subsection, "Natural Language Processing," which also encompasses the language layer of the model and discusses how choices along a spectrum from fully natural languages to relatively restricted languages influence the performance of various expression modes, particularly speech input.

Speech Synthesis

Text-to-speech systems, or speech synthesizers, take unrestricted text as input and produce a synthetic spoken version of that text as output. Most current commercial synthesizers exhibit a high degree of intelligibility, but none sound truly natural. The major barriers to naturalness are deficiencies of text normalization, intonational assignment, and synthesized voice quality. Female speech and children's speech are generally less acceptable than adult male synthetic speech, probably because they have been studied less (Roe and Wilpon, 1994).

In the course of transforming text into speech, all text-to-speech systems must do the following:

Identify words and determine their pronunciations;

Decide how such items as abbreviations and numbers should be pronounced (text normalization);

Determine which words should be made prominent in the output, where pauses should be inserted, and what the overall shape of the intonational contour should be (intonation assignment);

Compute appropriate durations and amplitudes for each of the words that will be synthesized;

Determine how the overall intonational contour will be realized for the text to be synthesized;

Identify which acoustic elements will be used to produce the spoken text (for concatenative synthesizers) or to retrieve the sequences of appropriate parameters to generate synthetic elements (for format synthesizers);5 and

Synthesize the utterance from the specifications and/or acoustic elements identified.

While most systems permit some form of user control over various parameters at many of these stages, to fine-tune system defaults, documentation

Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×

Page 80

and tools for such control are usually lacking, and most users lack the requisite background to produce satisfying results.

Particularly for concatenative synthesizers, it is difficult and time consuming to produce new voices, since each voice requires that a new set of concatenative units be recorded and segmented. While most research groups are developing tools in an attempt to automate this process (often by using automatic speech recognition systems to produce a first-pass segmentation), none have succeeded in eliminating the need for laborious hand correction of the database. There have also been efforts in recent years to automate the production of other components of synthesis, to facilitate the production of synthesizers in many languages from a single architecture.

We know that synthetic speech should sound better. It is not clear, exactly, how to decide what is better: More natural and more human-like? More intelligible? More intelligible at normal talking speeds or at high speeds? Speech is usually used for conversational modes of interaction. When speech is being used for presenting a Web page, for example, there is additional information that needs to be provided: Which words form links? Which words are italicized? How is this information presented most effectively? How should words be dealt with that have multiple different pronunciations in different parts of the country or to different individuals?

Speech Input/Recognition

The full integration of voice as an input medium, if achievable, could alleviate many of the known limitations of existing human-machine interfaces. People with poor or no literacy skills, people whose hands are busy, people suffering from cumulative trauma disorders associated with typing and pointing (or seeking to avoid them)-could all benefit from spoken communication with systems. While the capabilities envisioned in such a system are well beyond the state of the art in both speech recognition and language understanding at present, the technology has advanced sufficiently to allow very simple voice-based applications to emerge (see below).

Speech recognition research has made significant progress in the past 15 years (Roe and Wilpon, 1994; Cole and Hirschman, 1995; Cole et al., 1996). The gains have come from the convergence of several technologies: higher-accuracy continuous speech recognition based on better speech modeling techniques, better recognition search strategies that reduce the time needed for high-accuracy recognition, and increased power of audio-capable, off-the-shelf workstations. As a result of these advances, real-time, speaker-independent, continuous speech recognition, with vocabularies

Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×

Page 81

of a few thousand words, is now possible in software on regular workstations.

In terms of recognition performance, word error rates have dropped by more than an order of magnitude in the past decade and are expected to continue to fall with further research. These improvements have come about as a result of technical as well as programmatic innovations. Technically, there have been advances in two areas. First, a paradigm shift from rule-based to model-based methods has taken place. In particular, probabilistic hidden Markov models (HMM) have proven to be an excellent method of modeling phonemes in various contexts. This model-based paradigm, with its ability to estimate model parameters automatically from training data, has shown its power and versatility by applying the technology to various languages, using the same software. Second, the use of statistical grammars, which estimate the probability of two- and three-word sequences, have been instrumental in improving recognition accuracy, especially for large-vocabulary tasks. These simple statistical grammars have, so far, proven to be superior to traditional rule-based grammars for speech recognition purposes.

Programmatically, the collection and dissemination of standard, common training and test corpora worldwide, the sponsorship of common evaluations, and the dissemination at workshops of information about competing methods have all ensured very rapid progress in the technology. This programmatic approach was pioneered by the Defense Advanced Research Projects Agency (DARPA), which continues to sponsor common evaluations and initiated the establishment of the Linguistic Data Consortium, which has been in charge of the collection and dissemination of common corpora. A similar approach is now being taken in Europe.

Word error rates for speaker-independent continuous speech recognition vary a great deal, depending on the difficulty of the task: from less than 0.3 percent for connected digits, to 3 percent for a 2,500-word travel information task, to 10 percent for articles read from the Wall Street Journal, to 27 percent for transcription of broadcast news programs, to 40 percent for conversational speech over the telephone. Although word error rates in the laboratory can be quite small for some tasks, error rates can increase by a factor of four or more when the same systems are used in the field. This increase has various causes: heavy accents, ambient noise, different microphones, hesitations and restarts, and straying from the system's vocabulary.

Speech recognition has begun to enter the mainstream of everyday life, chiefly through telephone-based applications (Margulies, 1995). The most visible of these applications involve directory assistance services, such as the recognition of a few words (e.g., the digits and words such as "operator," "yes/no," "collect") or recognition of the names of cities in a

Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×

Page 82

particular area code. Speaker-independent recognition of over-the-phone digit strings (more difficult than single-digit recognition) has been deployed since 1990.6 Other applications include voice-activated dialing (especially useful for cellular phones), personal assistant services (to manage one's telephone at work), and call router applications (where the caller says the person's full name instead of dialing). Other less prevalent applications include obtaining stock and mutual fund quotes by voice, simple banking services, and bill payment by telephone.7

Other operational applications of speech recognition include air traffic control training, dictation, and Internet access. Large-vocabulary dictation systems capable of recognizing discrete speech are available on the market and have been used for years. For continuous speech there are systems that are capable of recognizing a few thousand words in real time; at least one of these systems is now being marketed for the dictation of radiology reports. Systems for using voice for Internet access have recently been announced.

Simply making speech recognition available with machines, however, does not necessarily make it immediately useful; it will have to be interfaced properly with the other modalities so that it appears seamless to the user (Martin et al., 1996). (Several vendors have been shipping speech recognition capabilities with personal computers, but there is little evidence of wide usage.) Optimism for general use of speech technologies comes from the facts that performance levels are continuing to improve and that many applications do not require large vocabulary sizes. However, applications must be designed to take into account the fact that recognition errors will occur, either by allowing the user to correct errors or by designing additional error correction mechanisms, such as proper inclusion of human-machine dialogue capabilities. These include the ability to deal with issues such as how to phrase a system prompt, how to determine if a recognition error has occurred, and how to engage in conversational repair if such a determination is made. Other speech integration issues include habitability (the ability of a user to stay within the system's vocabulary most of the time), portability (the ease with which a speech recognition system can be ported to a new domain), and user experience (different users, depending on their experience, may require different types of interaction).

Looking into the future of the national information infrastructure (NII), speech recognition could have many applications, such as command and control, information access and retrieval, training and education, e-mail and memo dictation, and voice mail transcription. The current state of the art in speech recognition can support these applications at various levels of performance, some quite well (e.g., command and control) and others not well at all (e.g., voice mail transcription). Functions

Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×

Page 83

that perform information access, such as making an airline reservation, may require the use of a certain level of language understanding technology. The state of the art in that field only allows for the simplest of such applications at this time (see "Natural Language Processing" below).

Despite significant progress in speech recognition technology in the past decade, the fact remains that machine performance may still not be good enough for many applications. As a barometer of how much progress we may need for certain advanced applications, experiments have shown that human speech recognition performance is still at least an order of magnitude better than that of machines. One optimistic note, however, is that commercialization of the technology is proceeding very vigorously and is lagging the corresponding research capabilities by only a few years, so that any advances in the laboratory can be expected to appear on the market with a delay of only a few years.

Speaker Verification

A related but quite different technology is speaker verification. There has been much concern about private and secure communications over the Internet, especially for business information and financial transactions. Although encryption methods will be used more and more to protect digital data, it will still be necessary to make a more positive identification of customers for certain types of transactions. Speaker verification technology can be used to help provide additional security.

In an initial enrollment phase, each user is enrolled in a system by providing samples of his or her voice. System performance improves over time as the user supplies more voice samples. Using those voice samples, the system creates a model for the voice of each user. Then, when in operation, the system prompts the user to say a (random) phrase and, using the stored model of the user with the claimed identity, computes the likelihood that the speech came from that person. The user is then either accepted or rejected.

The performance of a speaker verification system is often measured by the Equal Error Rate (EER), which is the operating point in a system where the false rejection rate is equal to the false acceptance rate. In the laboratory, an EER of less than 0.5 percent can be achieved. Performance typically degrades to an EER of 2 to 4 percent in the field.

While the current state of the art may be sufficient for low-security applications, it would not, by itself, be adequate for high-security applications. However, if combined with other security measures, such as use of a PIN (Personal Identification Number), speaker verification can provide the added desired security for many applications of interest.

For users with physical disabilities who would like to have voice-only

Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×

Page 84

access to devices and systems, speaker verification could be of great benefit. It should be noted, however, that there are a significant number of people who are unable to speak clearly or reliably. For those people, alternate means of verification will be necessary if they are to use systems that rely on voice verification.

Alternate Keying/Typing Approaches: Strategies and Accelerators

As speech recognition becomes accurate and reliable, it will play a much larger role in future interface systems than it does today. It will not, however, ever completely replace or obsolete keyboard or keypad input to systems. Keying information into systems will continue to be a quiet, accurate, noise-immune (and, for some applications, faster) means of inputting data or commands. Furthermore, even as the performance of natural language understanding improves, free-form typing of natural language will remain a viable alternative to spoken input to such systems.

Today, keypads and keyboards range from systems that are as small as a wristwatch and are operated with a pen tip, to large, wall-sized keyboards operated with a light pen. Common keyboards are operated by using all 10 fingers, which push keys one at a time. Other keyboards have been developed that are chordic in nature and involve the pressing of multiple keys simultaneously. Many of these do not require the user to ever remove his or her hand.

In addition to pressing discrete keys, data can also be input using gestures. Finger spelling is one technique. Today, there are gloves that allow the wearer to spell out the desired characters using finger-spelling gestures. Techniques are also being explored that use cameras to take data via both finger spelling and sign language.

Handwriting is another common method for entering alpha-numeric data. There are techniques for recognizing letters formed in the standard way, as well as techniques (such as ''Graffiti") that increase the accuracy of handwritten characters by having the user write with letters that are similar to, but different from, the standard characters people are familiar with.

To increase the rate of data entry, a number of abbreviation and prediction techniques have been developed. Abbreviation techniques allow an individual to use a smaller set of letters (which can resemble the target word, such as "abv" for "abbreviation," or be completely arbitrary such as "T1" for "please call home"). Prediction techniques look at what a person has typed and try to guess what the next word or words would be. Prediction techniques are less useful for people who can enter data quickly since the time spent looking at the system's guesses may slow one down to the point where it is faster to just enter the data. However, for individuals who have to enter data very slowly or for those who have difficulty

Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×

Page 85

spelling (e.g., because of a learning disability, cognitive impairment, second language), systems that can guess words correctly can significantly increase their rate of communication. If a system always guesses consistently (e.g., when "t" is typed, it guesses "the"; when "th" is typed, it guesses "there''), the user can begin using it for prediction techniques, but very quickly switch over to using it as abbreviation expansion (e.g., the user types "t" and then the confirm button because the user knows that the system will have guessed "the"). Ironically, systems that monitor the context and change their guesses to better match the context prevent an individual from getting into the faster abbreviation expansion mode. If systems can predict whole sentences or phrases, however, their utility would increase. This is usually possible only, however, for stereotypic communication (Vanderheiden et al., 1986).

In some aspects this area is one of the more thoroughly researched ones. However, it is not clear what the best techniques are for combining these input techniques for using keyboard input in connection with speech and other virtual reality and gestural input systems. What is the best way to use a minimalist keyboard with a voice response system either in a key-in/voice-out paradigm or to help handle error correction in voice recognition systems? Also, currently there are no good mechanisms for providing keyboard-based input when people are walking or moving about in virtual reality-like environments.

Natural Language Processing8

Natural language-spoken, written, or even signed-is at the heart of human communication. It is key to interaction between humans and the medium for much of the vast amount of information stored in books, newspapers, scientific journals, audio and video tapes, and now Web pages. As a means of interaction with computers, it requires no special training on the part of users, but it remains uncommon because of the difficulties in supporting it technically. To date, there have been a number of successful commercial applications of natural language processing, including grammar- and style-checking programs; text indexing and retrieval systems, particularly for the named-entity task9 database query products that utilize natural language as input, which are being marketed for targeted applications; abstracting software (for summarizing blocks of text), which has been introduced commercially; and machine-aided translation programs. Access to the NII could be made easier and more productive if people could interact with a computer using natural language and if the computer could better retrieve, summarize, and understand the wealth of linguistic information at its disposal.10

Over the years, natural language processing (NLP) has focused primarily

Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×

Page 86

on three tasks: (1) database access, from typed or spoken queries; (2) information extraction, or the generation of formatted summaries from texts such as newspaper stories, military messages, and Web pages; and (3) machine translation of typed or spoken utterances from one language to another. The challenge of NLP is to build systems that can distinguish in the input language as many significantly different meanings as are relevant to the applications of interest; to interpret correctly as large a variety of linguistic expressions of these meanings as would naturally occur; and to do so in as many task settings as possible, with the computational resources available.

Until recently, most NLP systems shared the same gross architecture, roughly analogous to that of programming language compilers: a syntactic analyzer, or parser, to identify the lexical category of the words of the input sentence11 and their hierarchical organization into phrases and clauses; a semantic analyzer, to construct a representation of the meaning of the input sentence, generally independent of the specific task or application domain; and, finally, a domain- and task-specific mapping from the semantic interpretation to a representation suitable to the task at hand, such as a database query for query systems, a filled template for information extraction systems, or an input into a language generation module for a machine translation system.12 In the current practice, several hundred rules may need to be hand-coded for a new application, even in a limited domain.13

In the early 1990s, NLP took several new directions, largely at the instigation of a succession of DARPA program managers. First, after years of working in parallel, researchers in speech recognition and NLP were encouraged to construct integrated speech understanding systems, for which the chosen task was to answer spoken queries to databases (e.g., of air travel information). Second, information extraction was made a major task of interest. Finally, the performance of NLP and speech understanding systems was to be systematically evaluated.

It was thus necessary to reject the then-prevailing assumption that the NLP system needed to understand only syntactically and semantically well-formed utterances or that the entire content of an utterance or text needed to be understood. Spoken language systems had to deal with the inevitable recognition errors of even the best speech recognition systems as well as queries such as "Boston San Francisco after 8 a.m." and "I'd like to go to Boston, ah, to Atlanta, tomorrow." Systems were designed that tolerated not understanding some parts of the utterance, combining partial analyses of other parts, and explicitly correcting certain forms of disfluencies. Even with such difficult input, it now became possible to actually improve the accuracy of even the best speech recognition programs by applying syntactic and semantic constraints, at least in limited domains.

Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×

Page 87

Systematic evaluation of NLP systems is not possible without the collection of large corpora of linguistic data, both raw and annotated, such as with correct transcriptions and correct answers to spoken queries.14 Although the rule-based paradigm that has dominated computational linguistics so far has produced only a few large-scale systems that have been reused over several different projects (e.g., the CORE Language Engine at SRI), it has been difficult to share large grammars, lexicons, and semantic rules across sites, making it difficult to build on previous results.

The domain specificity of rule-based NLP systems suggests that it would be attractive to be able to automatically train an NLP system, as is done with the hidden Markov models used in speech recognition. Significant effort is being devoted to this direction. The results are promising but still not comparable to what is routinely achievable with rule-based systems. Some of the problems are the amount of training data required, the difficulty of obtaining such data in a wide range of domains, and the cost of annotating the input data with the correct task-specific semantic representation. The annotation problem is exacerbated by the fact that it is much more difficult to get human annotators to agree on correct semantic annotations than on transcriptions of spoken utterances.

Many researchers believe that for some time yet the most effective strategy for the development of NLP systems in new domains will be hybrid systems, based on a core of hand-coded rules but tuned to a domain by automatic training methods. Domain-specific corpora can be used, for example, to assign probabilities to the rules, providing a mechanism by which probabilities can be assigned to rule-based interpretations. This approach, used by most of the currently best-performing systems, can be seen as a way of adapting a set of general rules to a particular domain. Farther down the road are ways of circumventing the data and annotation requirements of fully automatic training methods by dynamically adapting to one domain a system developed in another.

NLP systems vary widely, from those that perform full and deep understanding of an utterance in narrowly construed domains to those that perform partial and shallow understanding of very wide domains. Query systems tend to be at the full and deep understanding end, and information extraction systems at the partial and shallow end.

Several systems have been implemented to answer queries in the DARPA-sponsored Airline Travel Information Service task (DARPA, 1995b), where the user asks information about flights and schedules using speech. The utterance error rate, measured as the percentage of queries for which the system gives the wrong answer, is currently about 6 percent for spoken input and 4 percent for the corresponding text input.

The standards for evaluation of information extraction systems are

Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×

Page 88

set by the DARPA-sponsored Message Understanding Conferences (MUCs). For the "named entity" application, where the system must find all named organizations, locations, persons, dates, times, monetary amounts, and percentages, the error rates are below 5 percent. For the "scenario template" application, where the system extracts complex relationships in well-defined domains (such as joint ventures) in an open source (such as the Wall Street Journal), the error rate for finding the correct elements of the templates is about 45 percent.

In the area of machine translation, the most significant advances continue to occur in Europe. Recent work in the United States using texts written with an eye toward translation also show promise (Carbonell, 1992). Several speech-to-speech translation systems in limited domains, combining speech understanding, machine translation, and speech generation, have been demonstrated.

Still in their infancy are systems with which a human can conduct a coherent dialogue in service of a complex and extended task. Early examples include the TRAINS system (in use at the University of Rochester), which allows a human to control a system that plans the transport of materials, and the CommandTalk system, which provides a spoken interface to a large military simulator. The approach of Sadek and co-workers, at France Telecom (Bretier and Sadek, 1996; Bretier et al., 1995), offers compelling evidence that spoken language systems can have sophisticated models of dialogue and can benefit from them. Future systems will need to allow for a variety of speech acts (e.g., requests, assertions, questions, rejections) and contain dialogue models that enable the establishment of correlations between occurrence of phrases used to refer to the same entities and events in the discourse. Coreference resolution has been the subject of much research, and systems using it are being evaluated in the MUC benchmarks. Also, there is compelling evidence that spoken language systems can have sophisticated models of dialogue and can benefit from them.

Gesture Recognition

Gesture input can come in many forms from a variety of devices (e.g., mouse, pen, data glove). Its role is to convey information (e.g., identify, make reference to, explain, shift focus) in a manner similar to the other more studied forms of language. Gesture replaces the click of the mouse-the mouse's only word-with a wide range of commands. It eliminates the myriad objects on the screen intended to let the user communicate his or her desires. Rather than having to find the word, duplicate, and click on it, the user can make simpler movements involving only the hand. For example, at the workshop, Bruce Tognazzini's "Starfire" video showed a

Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×

Page 89

user separating her fingers to indicate a desire to duplicate an object-leave it here and move it. Gesture can relieve problems of repetitive stress by varying the user's movements, thereby lowering the repetition of any particular action.

Rimé and Schiaratura (1991) characterize several classes of gesture. Symbolic gestures are conventional, context-independent, and typically unambiguous expressions (e.g., "OK" or peace sign). In contrast, deictic gestures are pointers to entities, analogous to natural language deixis (e.g., "this not that"). Iconic gestures are used to display objects, spatial relations, and actions (e.g., illustrating the orientation of two cars at an accident). Finally, pantomimic gestures display an invisible object or tool (e.g., making a fist and moving to indicate a hammer). Gestural languages exist as well. These include sign languages and signing systems for use in environments where alternative communication is difficult. Early experience with glove interfaces indicates that some users have difficulty remembering the gesture equivalents to commands (Herndon et al., 1994).

Gesture recognition plays a role in immersive environments such as the virtual reality or simulation environments. It also should find widespread application in helping to give directions to computers or computerized agents. Pointing and gesturing with the hand or with other objects are natural communication behaviors and will likely form an important component in a natural intuitive interface. In addition, for individuals who are deaf and who communicate primarily through gestural languages (such as American Sign Language), machine recognition of American Sign Language gestures is the equivalent of speech recognition for those of us who can speak.

Machine Vision and Passive Input

Machine vision is likely to play a number of roles in future interface systems. Primary roles are likely to be:

Data input (including text, graphics, movement)

Context recognition (as discussed above)

Gesture recognition (particularly in graphic and virtual reality environments)

Artificial sight for people with visual impairments

Experience with text and image recognition provides a number of insights relevant to future interface development, especially in the context of aiding individuals with physical disabilities. In particular, systems that are difficult to use by blind people would pose the same problems to

Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×

Page 90

people who can see but who are trying to access information aurally because their vision is otherwise occupied. Similar problems may arise as well for intelligent agents.

Text Recognition

Today, there are powerful tools for turning images of text into electronic text (such as ASCII). Optical character recognition (OCR) is quite good and is improving daily. Driven by a desire to turn warehouses of printed documents into electronic searchable form, companies have been and are making steady advances. Some OCR programs will convert programs into electronic text that is compatible with particular word processing packages, preserving the text layout, emphasis, font, and so on. The problem with OCR is that it is not 100 percent accurate. When it makes a mistake, however, it is not usually a character anymore (since word lookup is used to improve accuracy). As a result, when an error is made, it is often a legal (but wrong) word. Thus, it is often impossible to look at a document and figure out exactly what it did say-some sentences may not be accurate (or even make sense). One company gets around this by pasting a picture of any words the system is not sure about into the text where the unknown word would go. This works well for sighted persons, allows human editors to easily fix the mistakes, and preserves the image for later processing by a more powerful image recognizer. It does not help blind users much except that they are not misled by a wrong word and can ask a sighted person for help if they cannot figure something out. (Most helpful would be to have an OCR system include its guess as to the letters of a word in question as hidden text, which a person who is blind could call up to assist in guessing the word.) Highly stylized or embellished characters or words are not recognizable. Text that is wrapped around, tied in knots, or arranged on the page or laid out in an unusual way may be difficult to interpret even if available in electronic text. This is a separate problem from image recognition, though.

Image Recognition

Despite great strides by the military, weather, intelligence, and other communities, image interpretation remains quite specialized and focused on looking for particular features. The ability to identify and describe arbitrary images is still beyond us. However, advances in artificial intelligence, neural networks, and image processing in combination with large data banks of image information may make it possible in the future to provide verbal interpretation or description for many types of information. A major impetus comes from the desire to make image information

Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×

Page 91

searchable by computers. The combination of a tactile representation with feature or texture information presented aurally may provide the best early access to graphic information by users who are blind or cannot use their sight.

Some images, such as pie charts and line graphs, can be recognized easily and turned into raw data or a text description. Standard software has been available for some time that will take a scanned image of a chart and provide a spreadsheet of the data represented in the chart. Other images, such as electronic schematic diagrams, could be recognized but are difficult to describe. A house plan illustrates the kind of diagram that may be describable in general terms and would benefit from combining a verbal description with a tactile representation for those who cannot see to deal with this type of information.

Visual Displays

Visual display progress begins with the screen design (graphics, layouts, icons, metaphors, widget sets, animation, color, fisheye views, overviews, zooming) and other aspects of how information is visualized. The human eye can see far more than current computer displays can show. The bandwidth of our visual channel is many orders of magnitude greater than other senses: ˜1 gigabit/second. It has a dynamic range of 1013 to 1 (10 trillion to 1). No human-made sensor or graphics display has this dynamic range. The eye/brain can detect very small displacements at very low rates of motion and sees change up to a rate of about 50 times a second. The eye has a very focused view that is optimized for perceiving movement. Humans cannot see clearly outside an ˜5-degree cone of foveal vision and cannot see behind them.

State-of-the-art visualization systems (as of 1996) can create images of approximately 4,000 polygons complexity at 50 Hz per eye. Modern graphics engines also filter the image to remove sampling artifacts on polygon edges and, more importantly, textures. Partial transparency is also possible, which allows fog and atmospheric contrast attenuation in a natural-looking way. Occlusion (called "hidden surface removal" in graphics) is provided, as is perspective transformation of vertices. Smooth shading in hardware is also common now.

Thus, the images look rather good in real time, although the scene complexity is limited to several thousand polygons and the resolution to 1,280 × 1,024. Typical computer-aided design constructions or animated graphics for television commercials involve scenes with millions of polygons; these are not rendered in real time. Magazine illustrations are rendered at resolutions in excess of 4,000 × 3,000. Thus, the imagery used in real-time systems is portrayed at rather less than optimal resolution,

Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×

Page 92

often much less actually than the effective visual acuity required to drive a car. In addition, there are better ways of rendering scenes, as when the physics of light is more accurately simulated, but these techniques are not currently achievable in real time. A six-order-of-magnitude increase in computer speed and graphics generation would be easy to absorb; a teraflop personal computer would be rather desirable, therefore, but is probably 10 years off.

Visual Input/Output Hardware

The computer industry provides a range of display devices, from small embedded liquid-crystal displays (LCDs) in personal digital assistants (PDAs) and navigational devices to large cathode-ray tubes (CRTs) and projectors. Clearly, desirable goals are lower cost, power consumption, latency, weight, and both much larger and much smaller screens. Current commercial CRTs achieve up to 2,048 × 2,048 pixels at great cost. Projectors can do ˜1,900 × 1,200 displays. It is possible to tesselate projectors at will to achieve arbitrarily higher resolution (Woodward, 1993) and/or brightness (e.g., video walls shown at trade shows and conventions). Screens with › 5,000-pixel resolution are desirable. Durability could be improved, especially for portable units.15 Some increase in the capability of television sets to handle computer output, which may be furthered by recent industry-based standards negotiations for advanced television (sometimes referred to as high-definition television), is expected to help lower costs.16 How, when, and where to trade off the generality of personal computers against other qualities that may inhere in more specialized or cheaper devices is an issue for which there may be no one answer.

Hollywood and science fiction have described most of the conceivable, highly futuristic display devices-direct retinal projection, direct cerebral input, Holodecks, and so on. Less futuristic displays still have a long way to go to enable natural-appearing virtual reality (VR). Liquid crystal displays do not have the resolution and low weight needed for acceptable head-mounted displays to be built; users of currently available head-mounted displays are effectively legally blind given the lack of acuity offered. Projected VR displays are usable, although they are large and are not portable.

The acceptance of VR is also hindered by the extreme cost of the high-end graphics engines required to render realistic scenes in real time, the enormous computing power needed to simulate meaningful situations, and the nonlinearity and/or short range of tracking devices. Given that the powerful graphics hardware in the $200 Nintendo 64 game is just incremental steps from supporting the stereo graphics needed for VR, it is clear that the barriers are now in building consumer-level tracking gear

Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×

Page 93

and some kind of rugged stereo glasses, at least in the home game context. Once these barriers are overcome, VR will be open for wider application.

High-resolution visual input devices are becoming available to nonprofessionals, allowing them to produce their own visual content. Digital snapshot cameras and scanners, for example, have become available at high-end consumer levels. These devices, while costly, are reasonable in quality and are a great aid to people creating visual materials for the NII.17 Compositing and nonlinear editing software assist greatly as well. Similarly, two-dimensional illustration and three-dimensional animation software make extraordinary graphics achievable by the motivated and talented citizen. The cost of such software will continue to come down as the market widens, and the availability of more memory, processing, graphics power, and disk space will make results more achievable and usable.

As a future goal that defines a conceptual outer limit for input and output, one might choose the Holodeck from the movie Star Trek, a device that apparently stores and replays the molecular reconstruction information from the transporter that beams people up and down. In The Physics of Star Trek, physicist Lawrence Krauss (1995, pp. 76-77) works out the information needed to store the molecular dynamics of a single human body: 1031 bytes, some 1016 times the storage needed for all the books ever written. Krauss points out the other difficulties in transporter/Holodeck reconstruction as well.

Auditory Displays

The ear collects sound waves and encodes the spatial characteristics of the sound source into temporal and spectral attributes. Intensity difference and temporal/phase difference in sound reaching the two ears provide mechanisms for horizontal (left to right) sound localization. The ear gets information from the whole space via movement in time.

Hearing individual components of sound requires frequency identification. The ear acts such as a series of narrowly tuned filters. Sound cues can be used to catch attention with localization, indicate near or far positions with reverberation, indicate collisions and other events, and even portray abstract events such as change over time. Low-frequency sound can vibrate the user's body to somewhat simulate physical displacement.

Speakers and headphones as output devices for synthesized sound match the ears well, unlike the case with visual displays. However, understanding which sounds to create as part of the human-computer interface is much less well understood than for the visual case.

About 50 million instructions per second are required for each synthesized sound source. Computing reverberation off six surfaces for four sound sources might easily require a billion-instruction-per-second computer,

Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×

Page 94

one that is within today's range but is rarely dedicated to audio synthesis in practice. Audio sampling and playback are far simpler and are most often used for primitive cues such as beacons and alarms.

Thus, the barriers to good matching to human hearing have to do with computing the right sound and getting it to each ear in a properly weighted way. Although in many ways producing sound by computer is simpler than displaying imagery, many orders of magnitude more research and development have been devoted to graphics than sound synthesis.

Haptic and Tactile Displays18

Human touch is achieved by the parallel operation of many sensor systems in the body (Kandel and Schwartz, 1981). The hand alone has 19 bones, 19 joints, and 20 muscles with 22 degrees of freedom and many classes of receptors and nerve endings in the joints, skin, tendons, and muscles. The hand can squeeze, stroke, grasp, and press; it can also feel texture, shape, softness, and temperature.

The fingerpad has hairless ridged skin enclosing soft tissues made of fat in a semiliquid state. Fingers can glide over a surface without losing contact or grab an object to manipulate it. Computed output and input of human touch (called "haptics") is currently very primitive compared to graphics and sound. Haptic tasks are of two types: exploration and manipulation. Exploration involves the extraction of object properties such as shape and surface texture, mass, and solidity. Manipulation concerns modification of the environment, from watch repair to using a sledge hammer.

Kinesthetic information (e.g., limb posture, finger position), conveyed by receptors in the tendons, and muscles and neural signals from motor commands communicate a sense of position. Joint rotations of a fraction of a degree can be perceived. Other nerve endings signal skin temperature, mechanical and thermal pain, chemical pain, and itch.

Responses range from fast spinal reflex to slow deliberate conscious action. Experiments on lifting objects show that slipping is counteracted in 70 milliseconds. Humans can perceive a 2-micrometer-high single dot on a glass plate, a 6-micrometer-high grating, using different types of receptors (Kalawsky, 1993). Tactile and kinesthetic perception extends into the kilohertz range (Shimoga, 1993). Tactile interfaces aim to reproduce sensations arising from contact with textures and edges but do not support the ability to modify the underlying model.

Haptic interfaces are high-performance mechanical devices that support bidirectional input and output of displacement and forces. They measure positions, contact forces, and time derivatives and output new forces and positions (Burdea, 1996). Output to the skin can be point,

Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×

Page 95

multipoint, patterned, and time-varying. Consider David Warner, who makes his rounds in a ''cyberwear" buzz suit that captures information from his patients' monitors, communicating it with bar charts tingling his arms, pulse rates sent down to his fingertips, and test results whispered in his ears, yet allowing him to maintain critical eye contact with his patients (http://www.pulsar.org).

There are many parallels and differences between haptics and visual (computer graphics) interfaces. The history of computing technology over the past 30 to 40 years is dominated by the exponential growth in computing power enabled by semiconductor technology. Most of this new computing power has supported enriched high-bandwidth user interfaces. Haptics is a sensory/motor interaction modality that is just now being exploited in the quest for seamless interaction with computers. Haptics can be qualitatively different from graphics and audio input/output because it is bidirectional. The computer model both delivers information to the human and is modified by the human during the haptic interaction. Another way to look at this difference is to note that, unlike graphics or audio output, physical energy flows in both directions between the user and the computer through a haptic display device.

In 1996 three distinct market segments emerged for haptic technology: low-end (2 degrees of freedom (DOF), entertainment); mid-range (3 DOF, visualization and training); and high-end (6 DOF, advanced computer-aided engineering). The lesson of video games has been to optimize for real-time feedback and feel. The joysticks or other interfaces for video games are very carefully handled so that they feel continuous. The obviously cheap joystick on the Nintendo 64 game is very smooth, such that a 2 year old has no problem with it. Such smoothness is necessary to be a good extension of a person's hand motion, since halting response changes the dynamics, causing one to overcompensate, slow down, etc.

A video game joystick with haptic feedback, the "Force FX," is now on the market from CH Products (Vista, Calif.) using technology licensed from Immersion Corp. It is currently supported by about 10 video game software vendors. Other joystick vendors are readying haptic feedback joysticks for this low-priced, high-volume market. In April 1996, MicroSoft bought Exos, Inc. (Cambridge, Mass.) to acquire its haptic interaction software interface.

Haptic interaction will play a major role in all simulation-based training involving manual skill (Buttolo et al., 1995). For example, force feedback devices for surgical training are already in the initial stages of commercialization by such companies as Boston Dynamics (Cambridge, Mass.), Immersion Corp. (Palo Alto, Calif.), SensAble Devices (Cambridge, Mass.), and HT Medical (Rockville, Md.).

Advanced CAD users at major industrial corporations such as Boeing

Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×

Page 96

(McNeely, 1993) and Ford (Buttolo et al., 1995) are actively funding internal and external research and development in haptic technologies to solve critical bottlenecks they have identified in their computer-aided product development processes.

These are the first signs of a new and broad-based high-technology industry with great potential for U.S. leadership. Research (as discussed below) is necessary to foster and accelerate the development of these and other emerging areas into full-fledged industries.

A number of science and technology issues arise in the haptics and tactile display arena. Haptics is attracting the attention of a growing number of researchers because of the many fascinating problems that must be solved to realize the vision of a rich set of haptic-enabled applications. Because haptic interaction intimately involves high-performance computing, advanced mechanical engineering, and human psychophysics and biomechanics, there are pressing needs for interdisciplinary collaborations as well as basic disciplinary advances. These key areas include the following:

Better understanding of the biomechanics of human interaction with haptic displays. For example, stability of the haptic interaction goes beyond the traditional control analysis to include simulated geometry and nonlinear time-varying properties of human biomechanics.

Faster algorithms for rendering geometric models into haptic input/output maps. Although many ideas can be adapted from computer graphics, haptic devices require at least 1,000-Hz update rates and a latency of no more than 1 millisecond for stability and performance. Thus, the bar is raised for the definition of "real-time" performance for algorithms such as collision detection, shading, and dynamic multibody simulation.

Advanced design of mechanisms for haptic interactions. Real haptic interaction uses all of the degrees of freedom of the human hand and arm (as many as 29; see above). The most advanced haptic devices have 6 or 7 degrees of freedom for the whole arm/hand. To provide high-quality haptic interaction over many degrees of freedom will continuously create many research challenges in mechanism design, actuator design, and control over many years to come.

Some of the applications of haptics that are practical today may seem arcane and specialized. This was also true for the first applications of computer graphics in the 1960s. Emerging applications today are the ones with the most urgent need for haptic interaction. Below are some examples of what may become possible:

1999: A medical student is practicing the administration of spinal epidural anesthesia for the first time. She must insert the needle by feel

Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×

Page 97

 

(without visual guidance) through the skin, fat, muscle, and spinal dura, and inject the anesthetic without visual guidance. Like all physicians trained before this year, her instructor learned the procedure on actual human patients. Now, she is using a haptic display device hidden inside a plastic model of the human back. The device simulates the distinct feel of each of these tissues as well as the hard bones that she must avoid with the needle. After a few sessions with the simulator and a quantitative evaluation of her physical proficiency, she graduates to her first real patient with confidence and skill.

2000: An automotive design engineer wants to verify that an oil filter that he knows will require routine maintenance can be removed from a crowded engine compartment without disassembly of the radiator, transmission, and so forth. He brings the complete engine compartment model up on the graphics screen and clicks the oil filter to link it to the six-axis haptic display device on his desk next to the workstation. Holding the haptic device, he removes the oil filter, feeling collisions with nearby engine objects. He finds that the filter cannot be removed because coolant hoses block the way. The engine compartment is thus redesigned early in the design process, saving hundreds of thousands of dollars.

The first of these examples is technically possible today; the second is not. There are critical computational and mechatronic challenges that will be crucial to successful implementation of ever-more realistic haptic interfaces.

Because haptics is such a basic human interaction mode for so many activities, there is little doubt that, as the technology matures, new and unforeseen applications and a substantial new industry will develop to give people the ability to physically interact with computational models. Once user interfaces are as responsive as musical instruments, for example, virtuosity is more achievable. As in music, there will always be a phase appropriate to contemplation (such as composing/programming) and a phase for playing/exploring. The consumer will do more of the latter, of course. Better feedback continuously delivered appears to take less prediction. Being able to predict is what expertise is mostly about in a technical/scientific world, and we want systems to be usable by nonexperts, hence the need for real-time interactions with as much multisensory realism as is helpful in each circumstance. Research is necessary now to provide the intellectual capital upon that such an industry can be based.

Tactile Displays for Low- or No-Vision Environments or Users

Tactile displays can help add realism to multisensory virtual reality environments. For people who are blind, however, tactile displays are

Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×

Page 98

much more important for the provision of information that would be provided visually to those who can see. For people who are deaf and blind and who cannot use auditory displays or synthetic speech, it is the principal display form.

Vibration has been used for adding realism to movies and virtual reality environments and also as a signaling technique for people with hearing impairments. It can be used for alarm clocks or doorbells, but is limited in the information it can present even when different frequencies are used for different signals. Vibration can also be used effectively in combination with other tactile displays to provide supplemental information. For example, vibratory information can be used in combination with Braille to indicate text that is highlighted, italicized, or underlined, or to indicate text that is a hyperlink on a hypertext page.

Vibrotactile displays provide a higher-bandwidth channel. With a vibrotactile display, small pins are vibrated up and down to stimulate tactile sensors. The most widespread use of this technique is the Optacon (OPtical to TActile CONverter), which has 144 pins (6 × 24) in its array (100 pins on the Optacon II). The tactile array is usually used in conjunction with a small handheld camera but can also be connected directly to a computer to provide a tactile image around a mouse or other pointing device on the screen.

Electrocutaneous displays have also been explored as a way to create solid-state tactile arrays. Arrays have been constructed for use on the abdomen, back, forearm, and, most recently, the fingertip. Resolution for these displays is much lower than for vibrotactile displays.

Raised-line drawings have long been "king of the hill" for displaying of tactile information. The principal problem has been an inexpensive and fast way to generate them "on the fly." Wax jet printers showed the greatest potential (especially for high resolution), but none are currently available. For lower resolution, there is a paper onto which one can photocopy and then process with heat, to cause it to swell wherever there are black lines (although at a much lower resolution). Printers that create embossed Braille pages can also be programmed to create tactile images that consist of raised dots. The resolution of these is lower still (the best having a resolution of about 12 dots per inch), but the raised-dot form of the graphics actually has some advantages for tactile perception.

Braille is a system for representing alphanumeric characters tactiley. The system consists of six dots in a two wide by three high pattern. To do the full ASCII character set, an eight-pin braille (two by four) was developed. Braille is most commonly thought of as being printed or embossed, where paper is punched upward to form Braille cells or characters as raised dots on the page.19 There are also dynamic Braille displays, where cells having (typically) 8 pins that can be raised or lowered independently

Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×

Page 99

are arranged in lines of 12 to 20 or more cells on small portable devices and 20 to 40 cells on desktop displays. A few 80-cell displays have been developed, but they are quite expensive and large. By raising or lowering the pins, a line of Braille can be dynamically changed, rather like a single line of text.

Virtual Page Displays. Because of the difficulties creating full-page tactile displays, a number of people have tried techniques to create a "virtual" full-page display. One example was the Systems 3 prototype, where an Optacon tactile array was placed on a mouse-like puck on a graphics tablet. As the person moved the puck around on the tablet, he or she felt a vibrating image of the screen that corresponded to that location on the tablet. The same technique has been tried with a dynamic Braille display. The resolution, of course, is much lower. In neither case did the tactile recognition approach that of raised lines.

Full-Page Displays. Some attempts have been made to create full-page Braille-resolution displays. The greatest difficulty has been in trying to create something with that many moving parts that is still reliable and inexpensive. More recently, some interesting strategies using ferro-electric liquids and other materials have been tried. In each case the objective was to create a system that involves the minimum number of moving parts and yet provides a very high-resolution tactile display.

Ideal Displays. A dream of the blindness community has been the development of a large plate of hard material that would provide a high-resolution solid-state tactile display. It would be addressable like a liquid-crystal display, with instant response, very high resolution, and variable height. It would be low cost, lightweight, and rugged. Finally, it would be best if it could easily track the position of fingers on the display, so that the tactile display could be easily coupled with voice and other audio to allow parallel presentation of tactile and auditory information for the area of the display currently being touched.

An even better solution, both for blind people and for virtual reality applications, would be a glove that somehow provided both full tactile sensation over the palm and fingertips and force feedback. Elements of this have been demonstrated, but nothing approaching full tactile sensation or any free-field force feedback.

Integrating Input/Output Technologies

Filling out the range of technologies for people to communicate with systems-filling in the research and development gaps in the preceding

Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×

Page 100

section-is only part of the input/output requirement for ECIs. Integration of these technologies into systems that use multiple communications modalities simultaneously-multimodal systems-can improve people's performance. (These ideas are discussed in more detail in Chapter 6.) Integration can also ensure that at least one mechanism is available for every person and situation, independent of temporary and/or permanent constraints on their physical and cognitive abilities. Virtual reality involves the integration of multiple input and output technologies into an immersive experience that, ideally, will permit people to interact with systems as naturally as they do with real-world places and objects.

Multimodal Interfaces

People effortlessly integrate information gathered across modalities during conversational interactions. Facial cues and gestures are combined with speech and situational cues, such as objects and events in the environment, to communicate meaning. Almost 100 years of research in experimental psychology attests to our remarkable abilities to bring all knowledge to bear during human communication.

The ability to integrate information across modalities is essential for accurate and robust comprehension of language by machines and to enable machines to communicate effectively with people. In noisy environments, when speech is difficult to understand, facial cues provide both redundant and complementary information that dramatically improves recognition performance over either modality alone. To improve recognition in noisy environments, researchers must discover effective procedures to recognize and combine speech and facial cues. Similarly, textual information may be transmitted more effectively under some conditions by turning the text into natural-sounding speech, produced by an animated "talking head" with appropriate facial movements. While a great deal of excellent research is being undertaken in the laboratory, research in this area has not yet reached the stage where commercial applications have appeared, and fundamental problems remain to be solved. In particular, basic research is needed into the science of understanding how humans use multiple modalities.

Ability-Independent Interfaces

Standard mass-market products are still largely designed with single interfaces (e.g., they are designed to work with a keyboard (only) or they are designed to work with a touchscreen (only)). There are systems designed to work with keyboard or mouse, and some cross-modality efforts

Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×

Page 101

have been made (e.g., systems that support both keyboard and speech input). Usually, though, these multiple input systems are accomplished by having a second input technique simulate input on the first-for example, having the speech interface create simulated keystrokes or mouse clicks rather than having the systems designed from the beginning to accommodate alternate interface modalities. This approach is usually the result of companies that decide to add voice or pen support (or other input technique support) to their applications after it has been architected. This generates both compatibility problems and very complicated user configuration and programming problems.

A similar problem exists with media, materials, databases, or educational programs designed to be used in a visual-only presentation format. Companies (and users) run into problems when the materials need to be presented aurally. For example, systems designed for visual viewing often need to be reengineered if the data are going to be presented over a phone-based information system.

The area where the greatest cross-modality interface research has been carried out has been the disability access area. Strategies for creating audiovisual materials that also include time-synchronized text (e.g., captions) as well as audio descriptions of the visual information have been developed. Interestingly, although closed captioning was added to television sets for people who are deaf, it is used much more in noisy bars, by people learning to read a new language, by children, and by people who have muted their television sets. The captions are also useful for institutions wishing to index or search audiovisual files, and they allow "agent" software to comprehend and work with the audio materials.

In the area of public information systems, such as public kiosks, interfaces are now being developed that are flexible enough to accommodate individuals with an extremely wide range of type, degree, and combination of disabilities. These systems are set up so that the standard touchscreen interface supports variations that allow individuals with different disabilities to use them. Extremely wide variation in human sensory motor abilities can be accommodated without changing the user interface for people without disabilities.

For example, by providing a "touch and hear" feature, a kiosk can be made usable by individuals who cannot read or by those who have low vision. Holding down a switch would cause the touchscreen to become inactive (e.g., touching buttons on the screen would cause no action). However, any buttons or text that were touched would be read aloud to the user. Releasing the switch would reactivate the screen. A "touch and confirm" mode would allow individuals with moderate to severe physical disabilities to use the kiosk by having it accept only input that is

Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×

Page 102

followed by a confirmation. An option that provides a listing of the items (e.g., text and buttons) down the left edge of the screen can be combined with the talking "touch and confirm" mode to allow individuals who are completely blind to easily and accurately use a kiosk. The use of captions for audiovisual materials on kiosks can allow individuals who have hearing impairments to access a kiosk (as well as anyone else trying to use a kiosk in a noisy mall). Finally, by sending the information on the pop-up list out through the computer's Infrared Data Association (IrDA) port, it is possible for individuals who are completely paralyzed or deaf and blind to access and use a kiosk via their personal assistive technologies.

All of these features can be added to a standard multimedia touchscreen kiosk without adding any hardware beyond a single switch and without altering the interface experienced by individuals who do not have disabilities. By adding interface enhancements such as these, it is possible to create a single public kiosk that looks and operates like any traditional touchscreen kiosk but is also accessible and usable by individuals who cannot read, who have low vision, who are blind, who are hearing impaired, who are deaf, who have physical disabilities, who are paralyzed, or who are deaf and blind. Kiosks with flexible user-configurable interfaces have been distributed in Minnesota (including the Mall of America), Washington State, and other states.

These and similar techniques have been implemented in other environments as well. Since the 1980s, Apple Computer has had options built into its human interface to make it more useful to people with functional limitations (look in any Macintosh control panel for Easy Access). IBM has them built into its hardware and software (AccessDos and OS/2), and UNIX has both options in its human interface and modifications in its underlying structure to support connection to specialized interfaces. Windows 95 has over a dozen adjustments and variations built into its human interface to allow it to be used by individuals with a very wide range of disabilities or environmental limitations, including those with difficulty hearing, seeing, physically operating a keyboard, and operating a mouse from the keyboard.

As we move into more immersive environments and environments that are utilizing a greater percentage of an individual's different sensory and motor systems simultaneously (e.g., VR, multimedia), identifying and developing cross-modal interface techniques will become increasingly challenging. In the techniques developed to date, however, building interfaces that allow for cross-modality interaction have generally made for more robust and flexible interfaces and systems that can better adapt to new interface technologies as they emerge (e.g., allowing WIMP (windows, icons, menus, pointers) systems to accommodate a verbal interface).

Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×

Page 103

Virtual Reality and Artificial Immersive Environments

The past 10 years has brought nearly a complete changeover from command line to WIMP interfaces as the dominant every-citizen's connection to computation. This happened because hardware (memory, display chips) became cheap enough to be engineered into every product. It also happened because the first step to the office of the future required replacing the typewriter with the laster printer, an event neatly handled with the "desktop metaphor" and word processing/spreadsheet software. However, the NII implies a complex of technologies relevant to far more than office work, which is a practical reason not to expect it to be accessed by every citizen with mice and windows interfaces alone (van Dam, 1997).20 Another transition is now at hand, one that potentially liberates the interface from the desktop, one that presents information more like objects in a shopping mall than printing on a pile of paper. The virtual shopping mall (or museum) is the next likely application metaphor; the parking lots will be unneeded, of course, as will attention to the laws of physics when inappropriate, but as in three-dimensional user interfaces generally, the metaphor can help in teaching users how to operate in a synthetic environment. Such a metaphor helps also to avoid the constraints that may derive from metaphors linked to one class of activity (e.g., desks and white-collar work) at a time when researchers should think about the needs and challenges posed by all kinds of people.

At SIGGRAPH 96, the major conference for computer graphics and interactive techniques, full-quality, real-time, interactive, three-dimensional, textured flight simulation was presented as the next desirable feature in every product. This visual capability, usually augmented with sound and multidimensional interactive controls, presents information as landscapes and friendly/hostile objects with which the user interacts at high speed. Visual representations of users, known as avatars, are one trend that has been recognized in the popular press. Typing is not usually required or desirable. The world portrayed is spatially three dimensional and it continues way beyond the boundaries of the display device. In this context, input and output devices with more than 2 degrees of freedom are being developed to support true direct manipulation of objects, as opposed to the indirect control provided by two- and three-dimensional widgets, and user interfaces appear to require support for many degrees of freedom,21 higher-bandwidth input and output, real-time response, continuous response and feedback, probabalistic input, and multiple simultaneous input and output streams from multiple users (Herndon et al., 1994). Note that virtual reality also expands on the challenges posed by speech synthesis to include synthesis of arbitrary sounds, a problem

Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×

Page 104

that is hampered by the lack of available sound samples analogous to the voice samples used in voice synthesis.

Economic factors will pace the broader accessibility of technologies that are currently priced out of the reach of every citizen, such as high-end virtual reality. Virtual reality technology, deriving from 30 years of government and industry funding, will see its cost plummet as development is amortized over millions of chip sets, allowing it to come into the mainstream. Initially, the software for these new chips will be crafted and optimized by legions of video game programmers driven by teenage mass-market consumption of the best play and graphics attainable. Coupled with the development of relatively cheap wide-angle immersive displays and hundredfold increases in computing power, personal access to data will come through navigation of complex artificial spaces. However, providing the every-citizen interface to this shared information infrastructure will need some help on the design front.

Ten- to 20-Year Challenges for Virtual Reality Systems

Very little cognitive neuroscience and perceptual physiology is understood, much less applied, by human interface developers. The Decade of the Brain is well into its second half now; a flood of information will be available to alert practitioners in the computing community that will be of great use in designing the every-citizen interface. Teams of sensory psychologists, industrial designers, electrical engineers, computer scientists, and marketing experts need to explore, together, the needs of governance, commerce, education, and entertainment. The neuroplasticity of children's cognitive development when they are computationally immersed early in life is barely acknowledged, much less understood.

1.

A prioritized list of challenges includes the following:

 

a.

Enumerate and prioritize human capabilities to modulate energy. This requires a comprehensive compilation of published bioengineering and medical research on human performance measurement techniques, filtering for the instrumentation modalities that the human subjects can use to willfully generate continuous or binary output. Modalities should be ranked according to quality/repeatability of output, comfort, intrusiveness, cost, durability, portability, and power consumption. Note that much is known about human input capacity, by contrast.

 

b.

Develop navigational techniques, etc. This is akin to understanding the functional transitions in moving around in the WIMP desktop metaphor and is critical to nontrivial exploitation of the shopping mall metaphor of VR. Note that directional surround-sound

Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×

Page 105

   

audio and tactile feedback rich enough to assist a vision-impaired person in navigating a mall would also likely help a fully sighted person. Schematic means need to be developed to display the shopping mall metaphor on conventional desktop computers, small video projectors, and embedded displays.

 

c.

Develop several universal methods for input/output device connectivity. Currently, the personal computer clone is the universal input/output adapter because of its open architecture and the availability of cheap mail-order input/output devices, but many personal computers, each doing one filtering task, trying to communicate with one another on serial lines, are not directly adaptable to the ECI set of needs. Both software and hardware need to be provided in a form that allows ''plug and play." Custom chip sets will drive the cost down to consumer level; adapting video game input/output devices where possible will help in achieving similar price performance improvements as computing itself.

 

d.

Store and retrieve visualization/VR sessions. Despite the easily available technology in chip form, it is still clumsy if not impossible for an ordinary user to make and edit a video recording to document a computer session, unless it is a video game! Imagine text editing if you could only cut and paste but never store the results. One would like to play back and edit visualization/VR sessions in ways akin to the revision mode in word processors. A key technological development here is extension of the videoserver concept to visualization/VR capture and playback.

 

e.

Connect to remote computations and data sources. This is inevitable and will be driven by every sector of computing and Web usage.

 

f.

Understand the computer as an instrument. This is inevitable and will be market-driven as customers become more exposed to good interfaces. Note that the competition between Web browser companies is not about megahertz and memory size!

 

g.

Create audio output matched to the dynamic range of human hearing. Digital sound synthesis is in its infancy. Given the speed of currently available high-end microprocessors, this is almost entirely a software tools problem from the engineering side and a training problem from the creative side. (Note that flawless voice recognition is left out here!)

2.

Controversial: because they seem to be developing for a postliterate society whose members will no longer (!) be able to type and read from screens:

 

a.

Eliminate typing as a required input technique. Many computer

Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×

Page 106

   

applications, of course, already do this (e.g., CAD, video games, touch-screen map displays). Related: provide for typing when necessary in walk-around situations such as VR or warehouse data entry. Possible solutions are wearable chord keyboards, voice recognition, and gesture recognition. Issues include whether training will be essential, ranging from the effort needed to learn a video game or new word processor to that required to play a musical instrument or to drive a bulldozer.

 

b.

Reduce reliance on reading. Road signs have been highly developed and standardized to reduce the reliance on reading for navigation in the real world. The controversy here may stem from the copyright (if not patent) protection asserted by commercial developers on each new wrinkle of look and feel on computer screens. A fine role for government here is to encourage public domain development of the symbolism needed to navigate complex multidimensional spaces.

3.

Barrier-laden challenges:

 

a.

Develop haptic devices. Safe force-feedback devices capable of delivering fine touch sensations under computer control are still largely a dream. Keyboards and mice injure without the help of force feedback; devices capable of providing substantial feedback could do real injury. Some heavy earth-moving equipment designs are now "fly-by-wire"; force feedback is being simulated to give the operator the feel once transmitted by mechanical linkage. The barriers are providing fail-safe mechanisms, finding the applications warranting force feedback, and providing the software and hardware that are up to the task.

 

b.

Provide enough antialiased image resolution to match human vision (minimally 5,000 × 5,000 pixels at 50 Hz). CRT technology seems limited by market forces and development to 2,048 × 2,048 this decade. LCD screen sizes and resolutions seem to be driven by market needs for laptop computers. Twenty-twenty vision is roughly 5,000 pixels (at a 90 degree angle of view); less is needed at the angle people normally view television or workstation screens, more for wide-angle VR applications. A magazine advertisement is typically equivalent to 8,000 pixels across, on average, which is what a mature industry provides and is paid for, a suggested benchmark for the next decade or so. More resolution can be used to facilitate simple panning (which is what a person does when reading the Wall Street Journal, for example) or zooming in (as a person does when looking closely at a real item with a magnifying glass), both of which can be digitally realized with processing and

Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×

Page 107

   

memory without requiring more resolution from the display device. Certain quality enhancements may be achieved with higher refresh rates (e.g., 100 Hz) including less strobing during panning or the capability of doing stereo visuals by sending two 50-Hz images, one for each eye. Low latency, not currently a feature of LCD displays, is needed for 100 Hz or greater devices. Micromirror projectors show promise in this area.

   

Desirable, of course, would be wall-sized screens with very high resolution (›20,000 pixels) whose fidelity would be matched to a person's vision even when closely examined. Multiple projectors tiled together may achieve such an effect (Woodward, 1993) where warranted; monitors and LCD screens do not lend themselves to tiling because the borders around the individual displays do not allow seamless configurations. Truly borderless flat displays are clearly desirable as a way to build truly high-resolution displays.

 

c.

Providing enough computer for the ECI. (This is probably the least of the problems because the microprocessor industry, having nearly achieved the capability of 1990 vintage Crays in single chips, is now ganging them together by fours and eights into packages.) Gigaflop personal computers are close; teraflop desktop units are clearly on the horizon as massive parallelism becomes understood. Taking advantage of all this power is the challenge and will drive the cost down through mass production as the interfaces make the power accessible and desirable. More futuristic goals such as the petaflop computer and biological "computing" will likely happen in our lifetimes.

 

d.

Providing adequate network bandwidth to the end user. Some of the challenges in network infrastructure are discussed in the next section ("The Communications Infrastructure"). With respect to VR specifically, current data transfer rates between disk drives and screens are not up to the task of playing back full-screen movies uncompressed. The 1997 state of the art for national backbone and regional networking is 622 megabits per second. The goal of providing adequate bandwidth depends on the definition of "adequate" and how much computing is used to compress and decompress information. Fiber optics is capable of tremendous information transmission; it is the switches (which are computers) that govern the speeds and capacity now. Assuming that network bandwidth will be provided as demand increases, it seems likely that within 10 years a significant fraction of the population will be able to afford truly extraordinary bandwidth (CSTB, 1996).

Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×

Page 108

The Communications Infrastructure

Because ECIs must work in a networked environment, interface design involves choices that depend on the performance of network access and network-based services and features. What ramifications does connection to networks have for ECIs? This question is relevant because a user interface for any networked application is much more than the immediate set of controls, transducers, and displays that face the user. It is the entire experience that the user has, including the following:

Response time-how close to immediate response from an information site or setup of a communications connection;

Media quality (of audio, video, images, computer-generated environments), including delay for real-time communications and being able to send as well as receive with acceptable quality;

Ability to control media quality and trade-off between applications and against cost;

"Always on"-the availability of services and information, such as stock quotes on a personal computer screen saver, without "dialing up" each time the user wants to look;

Transparent mobility (anytime, anywhere) of terminals, services, and applications over time;

Portable "plug and play" of devices such as cable television set-top boxes and wireless devices;

Integrity and reliability of nomadic computing and communications despite temporary outages and changes in available access bandwidth;

Consistency of service interfaces in different locations (not restricted to the United States); and

The feeling the user has of navigating in a logically configured, consistent, extensible space of information objects and services.

To understand how networking affects user interfaces, consider the two most common interface paradigms for networked applications: speech (telephony) and the "point and click" Web browser. These are so widely accepted and accessible to all kinds of people that they can already be regarded as "almost" every-citizen user interfaces. Research to extend the functionality and performance of these interfaces, without complicating their most common applications, would further NII accessibility for ordinary people.

Speech, understood here to describe information exchange with other people and machines more than an immediate interface with a device, is a

Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×

Page 109

natural and popular medium for most people. It is remarkably robust under varying conditions, including a wide range of communications facilities. The rise of Internet telephony and other voice and video-oriented Internet services reinforces the impression that voice will always be a leading paradigm. Voice also illustrates that the difference between a curiosity such as today's Internet telephony and a widely used and expected service depends significantly on performance:22 Technological advances in the Internet, such as IPv6 (Internet Protocol version 6) and routers with quality-of-service features, together with increased capacity and better management of the performance of Internet facilities, are likely to result in much better performance for voice-based applications in the early twenty-first century.

Research that would help make the NII as a whole more usable includes making Internet-based information resources as accessible as possible from a telephone; improving the delay performance and other aspects of voice quality in the Internet and data networks generally; implementing voice interfaces in embedded systems as well as computers; and furthering a comfortable integration of voice and data services, as in computer-controlled telephony, integrated voice mail/e-mail, and data-augmented telephony.

The "point and click" Web browser reflects basic human behavior, apparent in any child in a toy store who points to something and says (click!) "I want that." Because of the familiarity of this paradigm, people all over the world use Web browsers. For reaching information and people, a Web browser is actually far more standard than telephony, which has different dial tones and service measurement systems in different countries. Research issues include multimedia extensions (including clicking with a spoken "I want that"), adaptation to the increasing skill of a user in features such as multiple windows and navigation speed, and adapting to a variety of devices and communication resources that will offer more or less processing power and communications performance.

The Network Hierarchy and How It Affects User End-to-End Performance

Among the elements of communications infrastructure that affect performance, the access network is one among several network elements (including networking in the local area of the user and networking within the public network) that have considerable influence on performance. Access network bandwidth is an important parameter affecting performance.

Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×

Page 110

Local-Area Communications

Physical communications networking can be categorized as an interworking of three networking levels: local, access, and core (or "wide area"). Almost any network-based activity of a residential user is likely to use all three.

Local area networks (LANs) are on the end-user's premises, such as a house, apartment or office building, or university campus. Ethernet, the most widely deployed LAN technology, is already appearing in homes for computer access to cable-based data access systems such as TimeWarner's RoadRunner, Com21's access system, and @Home's access system. It could be in millions of American homes by the year 2000. In general, the 10-megabit-per-second (Mbps) Ethernet is the favored communications interface for connecting personal computers and computing devices to set-top boxes and other network interface devices being developed for high-speed subscriber access networks. A properly engineered shared-bandwidth architecture such as Ethernet allows multiple devices to have the high "burst rate" capability needed for good performance, such as fast transfer of an image, with only rare degradation from congestion. It is "alwasy on," allowing devices always to be connected and ready to satisfy user needs immediately, as opposed to a tedious connection setup.

A residence will be able to simultaneously operate not only several human-oriented user interfaces in personal computers, heating/cooling and appliance controls, light switches, communicating pocket calendars and watches, and so on, but also user interfaces used by such devices as furnaces, garage doors, and washing machines. The introduction of IPv6 in the next decade will create an extremely large pool of Internet addresses, allowing each human being in the world to own hundreds or thousands of them. This development will foster the interconnection of a wide range of devices with embedded systems, a phenomenon that underscores the concern not to cast the NII or ECI challenges in overly personal computer-centric terms.

Local networking is not necessarily restricted to one shared wired facility such as Ethernet, which is beginning in the home at 10 Mbps but will likely evolve to "Fast Ethernet" commercial versions or to ATM (asynchronous transfer mode) connection-oriented communications, at 100 Mbps and higher. It can include wireless local networking, generalizations of the cordless phone to cordless personal computers and other devices, with burst rates of at least several megabits per second. Local networking is likely to include assigned (not shared) digital channels in various media for such applications as video programming and other stream or bulk uses, at aggregate data rates of hundreds of megabits per second.

Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×

Page 111

How much bandwidth is enough? Assuming "always connected" and good performance from the other network elements to be described, 10 Mbps symmetric should be adequate for almost all processor-based applications including fast response image transfers (a 5-megabyte image in 0.5 second) and high-quality MPEG-2 or H.323 (conferencing/videophone) video at 4 Mbps. For streaming media such as video, additional requirements of reserved capacity and minimal queuing delay may be needed, requirements for which ATM is well suited. ATM breaks traffic into uniformly-sized "cells" that can be efficiently switched and reassembled with specified quality-of-service guarantees. Forecasts of how soon ATM will be available directly at consumer communicating devices vary, but there is likely to be significant availability in 5 to 10 years. For future applications with very complex immersive environments, multiple high-definition video streams, or other bandwidth-intensive needs, fast Ethernet and ATM should suffice. Additional transmission facilities for program distribution could use these or other technologies.

Both shared-bandwidth networks such as Ethernet and dedicated high-capacity channels could reside in the same physical medium, which might be fiber, coax, or twisted-pair. The cost of a LAN has been falling steadily, with Ethernet cards for personal computers well below $100. The cost of wiring a new house or apartment building with cable for Ethernet is low, but the cost could be substantial for rewiring an old residence. Wireless networking, to bypass the wiring problem, is available now, and it may be priced comparably to Ethernet, for comparable capacity, in 4 to 5 years.

Access Communications

The access network is the set of transmission facilities, control features, and network-based services that sits between a user's premises and the core public network. The twisted-pair subscriber line running from a telephone office to a user's residence is part of the telephone access network, for example. There are four basic paradigms offered (and in development): telephone company services via the twisted-pair subscriber line, cable company services via a coaxial cable (coax) feed, wireless access via higher-powered cellular mobile or lower-powered PCS (personal communications services), and direct broadcast satellite service. There are additional paradigms, such as terrestrial microwave, that are of secondary importance compared with these four. The access network has long been regarded as a performance bottleneck. The telephone channel, restricted to 3-kHz (kilohertz) bandwidth (and data rates of about 30 kbps for reliable transmission) by filters and transmission systems designed for

Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×

Page 112

voice, presents both bandwidth limitations and connection delays that seriously degrade performance.

"Access" can be a confusing term. An Internet service provider offers access service to the Internet and some access facilities such as TCP/IP software, but may not provide the physical pipe into the home. For the moment, the discussion is restricted to access networks that include the physical transmission facilities but returns later to Internet service provider facilities because they have a critical influence on the performance of Web browsers and other Internet-oriented user interfaces.

Twisted Pair-based Telephone Company Services. The first paradigm, access via a twisted-pair subscriber line, is advancing with ISDN (integrated services digital network), ADSL (asymmetric digital subscriber line), VDSL (very-high-speed digital subscriber line), and HDSL (high-speed digital subscriber line).23

Cable-based Access Services. A local cable television (CATV) service company maintains a cable distribution system that is still largely dedicated to broadcasting video programming. The coaxial cable network, now actually combining optical fiber trunks with coaxial branches and "drops" to subscribers, is a "tree and branch" architecture well suited to broadcast and not so well suited to upstream communications from the user. It is not well suited to upstream communications because of noise aggregation problems from many drops and branches coming together and because the capacity of the cable, however large, is being shared with bandwidth-hungry downstream video services and by a great many subscribers.

Nevertheless, the cable industry has succeeded in evolving a promising HFC (hybrid fiber coax) network architecture that can service both video distribution and interactive communications needs.24 The HFC system provides digital channels with signals produced by cable modems, for which a downstream channel may generate a 30-Mbps signal within a 6-MHz bandwidth. Instead of one analog video signal, this digital transmission can carry seven or eight high-quality MPEG-2 digital video signals or one digital HDTV (high-definition television) signal plus two MPEG-2 ordinary digital video signals. More important for the NII, the digital capacity can be used for an arbitrary mix of signals, supporting medical imaging, language instruction, software downloads, and an infinite array of other applications. A cable system could typically implement up to a few dozen such 30-Mbps channels plus 80 old-fashioned analog channels for subscribers who have not yet purchased the digital TV sets expected to hit the U.S. market in 1998.

Upstream capacity shared among many subscribers is much more

Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×

Page 113

constrained. Standards are being developed that will allow a user to share with neighbors a 1.5-Mbps upstream channel (one of about 20 such channels serving a group of 125 to 500 subscribers). Other modem designs allow a pool of users to share a 10-Mbps upstream channel, mimicking the behavior of Ethernet. Here, just as with ADSL, the operator is betting that traffic will be asymmetric and that the user will not have a performance complaint even though the upstream bandwidth is not especially generous.

Above this physical channel level, the cable industry's model usually includes IP services with the same "always on" flavor that professionals enjoy at work. This is an important performance advantage for cable access, supporting broadcast information services that flash the latest bulletin on a computer or TV screen, quick Internet telephony perhaps by touching a miniature picture in the screen directory, and immediate linking to a distant Web site (contingent on performance being good farther upstream). If the service, including getting started and customer premise setup,25 is done well, the popular conception of Internet service as difficult to get started and unreliable after that could change radically, and the Web browser could indeed become a universal user interface.

Wireless Access Services: Location Transparency and Consciousness and Power/Bandwidth Tradeoffs. Wireless access, currently in cellular mobile networks and soon in PCS networks, supports mobility of persons, devices, and services. It makes possible carrying wearable or pocket devices, doing computing in a car (perhaps with a "heads-up" display on the windshield-used only when it is safe to do so, of course), reading documents and messages on an electronic "infopad" at meetings, and sending" electronic postcards" from digital cameras and camcorders. The new and large unlicensed NII Supernet spectrum authorized by the Federal Communications Commission, in the relatively high 5-GHz band, will give a large boost to interactive multimedia services when mass-produced, low-cost radio modems become a reality. That could happen within 3 to 4 years.

Wireless access can support both location transparency, in which the user's application appears the same regardless of location, and location consciousness, in which the application finds and exploits local resources and can offer location-dependent services, such as giving directions to the nearest drugstore. These two features are not incompatible, and both contribute to the utility and usability of a user interface.

Because of the power constraints imposed by small portable devices, including but not limited to pocket telephones, medical monitoring and alerting devices, communicating digital cameras and camcorders, communicating watches, communicating pocket calendars, and even some

Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×

Page 114

laptop computers, it is important for the quality of the user interface that the wireless access system offer appropriate tradeoffs between communications and processing resources. One way this is realized is to concentrate the power of the portable device on display functions, such as a bright sharp display, and leave media processing (such as MPEG and videoconferencing digital video coding/decoding) to processors accessed through the wireless network. However, this balance of function may imply an unacceptable cost for the substantial communications capacity to carry the unencoded video information. Another issue is how to minimize power use on portable systems that are always listening. Further research will be required to identify a reasonable balance between processing and communications power in the system.

The microcellular PCS and Supernet networks are well suited to this need, aiming for burst transmission rates of 25 Mbps or more in small (perhaps 300-meter-wide) microcells. This compares very favorably with present-day telephony-oriented cellular mobile networks, where modems may provide up to about 20 kbps communications rate. Higher rates are possible in the digital cellular mobile systems becoming widely deployed now, but probably not more than 256 kbps, still far below microcellular networks.

The low Earth-orbiting satellite (LEO) systems planned for personal communications from anywhere in the world, which will compete to some extent with terrestrial microcellular PCS systems, could offer the significant user interface advantage of having exactly the same user interface anywhere in the world. This would remove a major anxiety for many users.

Direct Broadcast Satellite Distribution Services. Satellite services could augment wired facilities to improve the performance of the user interface. In particular, downloading of large information files to proxy servers in nearby network offices or in the end-user's equipment itself would reduce the delays of access to information in distant servers. There are cache memories in Web browsers that save Web HTML objects requested by users because there is a high probability that the objects will be requested again, but a proxy server does something else. It caches information when it has been requested by one user, under the assumption that if the material is popular other users may request it as well. This has the effect of improving response time considerably for those users and offering added possibilities for customization. There are many important research questions in selecting material for proxy servers, updating strategies, customization for users, and integrating the satellite facility smoothly with the wired network.

Direct broadcast satellite service in the NII would also include its

Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×

Page 115

present function of distributing video programming directly to user TVs. In the future it is possible that continual magazine-style broadcasting of video information clips, captured and displayed immediately by user devices rather than retrieved from cache servers, also will be part of the nation's information infrastructure. This would offer the freshest-possible material, supporting, for example, a customized user information service in which information is updated even as the reader observes it.

Core Network Communications: QoS, Interoperability, and Value-Added Services

The core network interconnects access networks. It aggregates traffic, and is, or should be, designed to provide differing quality-of-service (QoS) treatment for different classes of traffic.26 Continuous voice and video media should enjoy minimum delay, and data files should be transferred with minimum error rate. ATM is already widely deployed in the core network. Research and development on QoS control is already extensive, and further work, on topics such as renegotiation of offered capacity and dynamic user control over QoS, would improve the performance of future user interfaces. For example, a user with several applications running could trade QoS among them, improving video resolution, for example, at the expense of the rate of transfer of a new software module being downloaded in the background.

There are additional services that either the core network or the access network, or indeed parties other than the network operators, can provide to enhance user interface performance. For example, a multiparty desktop audio/video conference can be displayed on one user's screen as a custom combination of pictures of the other participants with a corresponding spatial distribution of their voices. This can be done either in the user equipment by processing multiple audio/video information streams all coming to that user or by a processing service in the network (or offered by a third party) called a ''multimedia bridge" that creates the customized display for the user and supplies that user with only a single audio/video information stream. If access bandwidth is at a premium, the network-based bridging service provides a high-quality user interface at a minimum cost in access bandwidth.

Performance Impacts of Internet Services Architecture

The Internet, which utilizes all of the communications hierarchy outlined above, is considered by many to be the heart of the NII. As the multimedia Internet evolves and assumes much of the quality control (and charge for service!) functionality of the telephone network, this is

Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×

Page 116

likely to become true by definition. The Internet is defined by use of IP, which carries packets from one kind of network to another without the application having to directly control any services in those networks.

Although they do not, in general, provide the access transmission facilities, Internet service providers do supply other access facilities that have a large influence on the performance of user interfaces. These include at least the following:

Adequate modem pools and fast log-on for dial-up service;

Direct low-level packet interconnection to the Internet, as well as higher-level services such as e-mail, UseNet servers, domain name servers, and proxy Web servers;

Gateway services between Internet telephony and public network telephony (evolving in the near future to multimedia real-time communications); and

Documentation and instruction for use of browser applications, e-mail, and various Internet services and resources.

It would require a lengthy report to describe how each of these affects user interface performance. Suffice it to say that a major objective in providing good service is the avoidance of server congestion, by means of the use of proxy Web servers to give users the impression of fast response from uncongested access to a nearby server, when in fact the originating server is far away and highly congested. Fast response time is, as emphasized earlier, an important measure of good performance of the user interface.

We might also reemphasize the importance of being "always connected" to Internet access for applications such as receiving timely information from "push" servers (such as the fast-developing customized current information services producing ever-changing displays in screen savers), immediate delivery of e-mail, fast receipt and initiation of real-time audio/video calls, and participating in the on-line work of a distributed group. An always-connected transmission access facility is required, of course, which must be matched by similar facilities27 for the Internet service provider.

As with providers of wireless access services, Internet service providers will soon be required to support mobility services, such as locating and characterizing nomadic users. There are significant research questions in coordinating Internet routing and service-class support policies with the movement of individuals, in transferring customer profiles for Internet services, and in other aspects of mobility support.

Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×

Page 117

Software Architecture: Distributed Object Environments and Transportable Software

Management of a mobility environment, particularly location transparency and location consciousness, is complex, and further research is needed. Distributed object environments-a software structure being used more and more in communications as well as applications systems-has a large potential to help resolve the complexities and improve performance.28 For example, the global availability of a distributed object environment would make abstract service objects available in a consistent format everywhere, with those objects translating user needs into instructions to local systems.

Transportable software is another important object-oriented technology that proceeds from a different assumption, that a common "virtual machine" (a special operating system on top of the real one) can be created on different platforms, so that software in "applets" (and applications) can be moved around from one machine to another.29 Java is a widely accepted language and virtual machine structure. Web browsers now commonly implement the Java virtual machine, allowing application applets to be downloaded from Web sites and executed in the user's computer. This facilitates animated displays and other features in the user interface, with much better performance than if the software executed in the Web server and large quantities of display information had to be transmitted to the user's browser. It also facilitates customization of the Web browser user interface for users with special needs and constraints.

Transportable software also has great potential for "programmable networks" in which communications protocols and services are not fixed but can be changed on user request by sending the appropriate applets to network elements, such as switches, where they execute. This, too, can improve performance where alternative protocols are better matched to applications needs, making the user interface more responsive and pleasant to use.

Notes

1. See Gunter (1992), Semantics of Programming Languages, for more extensive discussion.

2. For example, the two sentences below differ only in a single word, but the resulting structure of the preferred interpretation is significantly different (Frazier and Fodor, 1978; Shieber, 1983, gives a computational model that elegantly handles this particular psycholinguistic feature). In the first sentence, "on the rack" modifies "positioned," whereas in the second, it modifies "dress": Susan positioned the dress on the rack. Susan wanted the dress on the rack.

Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×

Page 118

3. Texas Instruments had an early natural language system that did this.

4. This example was discussed by John Thomas, of NYNEX, at the workshop.

5. Concatenative synthesizers achieve synthesis by concatenating small segments (e.g., diphones) of stored digitized speech. Formant synthesizers use a rule-based approach to achieve synthesis, by specifying acoustic parameters that characterize a digital filter and how these parameters change as different phonemes are sequenced.

6. Personal communication, John C. Thomas, NYNEX, December 12, 1996.

7. A system introduced by IBM in 1996 for voice recognition software was designed to enable radiologists to dictate reports into a personal computer. Recognizing 2,000 words and requiring some training, its support for conversational discourse, in a context where certain technical phrases may be used frequently, was contrasted in the press to the need to pause after individual words in older commercial software (Zuckerman, 1996).

8. Candace Sidner, of Lotus Development, and Raymond Perrault, of SRI, contributed much of the content of this subsection.

9. Indexing and retrieval constitute a growing application area, especially with the increased desire to organize and access large amounts of data, much of which is available as text.

10. This section concentrates on the state of the art of complete end-to-end natural language processing systems and does not describe research in individual areas. The steering committee notes that there has been significant progress, ranging from new grammatical formalisms to approaches to lexical semantics to dialogue models.

11. There is much promising research on syntactic models, such as the TAG (tree-adjoining grammars) work (see Joshi et al., 1981, 1983, 1995; Shieber, 1983), which are computationally tractable syntactic formalisms with greater power than context-free grammars, and on lexical semantics.

12. Although space prevents including detailed references here, the interested reader is directed in particular to the recent years' conference proceedings of the Association for Computational Linguistics, the European Association for Computational Linguistics, the international meeting in Computational Linguistics (COLING), the DARPA Spoken Language and MUC workshops, and the journals Artificial Intelligence, Computational Linguistics, and Machine Translation.

13. For applications involving database query, or for more sophisticated command and control, the mapping between the sequence of words and their meaning can be very complicated indeed. DARPA has funded applications-oriented research in language understanding (Roe and Wilpon, 1994; Cole et al., 1996) in the context of database query, where the user requests the answer to a query by typing or uttering the query. In most language understanding systems to date, a set of syntactic and/or semantic rules is applied to the query to obtain its meaning, which is then used to retrieve the answer. If the query refers to information obtained in previous queries, another set of rules that deal with discourse is used to disambiguate the meaning. Pragmatic information about the specific application is often encoded in the rules as well. Even for a simple application like retrieval of air travel information, hundreds of linguistic rules are hand coded by computational linguists. Many of these rules must be rewritten for each new application.

14. The Linguistic Data Consortium at the University of Pennsylvania, which is sponsored by government and industry, now makes much of this data available, from different sources, for different tasks, and in different languages.

15. Note that portable devices raise the larger issue of data durability: portable devices may be easier to lose or break, which raises questions about ease of backup for the data they contain.

16. Much of the cross-industry disagreement revolved around interlacing, a technique that has long been used in television to increase resolution and that takes advantage

Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×

Page 119

of the extremely high line-to-line and temporal coherence of images produced by television cameras. Computer output, especially text and graphics, tends to be hard edged and to flicker badly when displayed on interlace monitors. Although one can convert interlaced broadcast TV to noninterlaced at the receiver end easily enough, there is a cost issue that affects the likelihood of flooding the market with the cheapest sets possible, hence affecting penetration and return on investment. The computer industry (hardware, software, and netware), of course, wants the low-end TVs of the future to handle digital output in a reasonable format; the television industry wants a 16 × 9 interlaced format (which is really a 32 × 9 format non-interlaced).

17. The Web is, of course, a great source for visual input. Copyright concepts of fair use and royalties will necessarily adapt, as they will for text, and audio quotations, samples, and outright theft.

18. Blake Hannaford, of the University of Washington, contributed much of the content of this subsection.

19. In fact, the graphics produced are not Braille but simply dot graphics printed on a Braille printer with the same resolution or dot spacing as Braille. This is a common technique, but it produces relatively low resolution graphics.

20. The WIMP interface will not serve this future, though elements will be involved (keypads, pointing, etc.). In its current form it is arguably dangerous to people susceptible to repetitive stress disorders, unusable by a large segment of the population with disabilities, and far too simple for navigation in complex spaces.

21. As noted in Herndon et al. (1994), a slider or dial for volume control has 1 degree of freedom; the mouse for picking, drawing, or two-dimensional location has 2 DOF, a 6D mouse or head-tracker for docking or view control has 6 DOF, a glove or face device for hand/face animation can have 16 or more DOF, and a body suit for whole-body animation can have over 100 DOF.

22. Many users of today's Internet telephony services experience a long delay, sometimes of the order of a second, in transmission, actually due more to buffering in the user's computer to smooth out arriving packets.

23. "Basic rate" ISDN, providing an aggregate 144-kbps symmetric service to a subscriber, suffered from a too-long development, unattractive rate structure, and general ambivalence on the part of telephone operators, but is now widely available and popular for Internet and "work at home" access needs. The usual access rate is 128-kbps symmetric from tying together 2 64-kbps channels provided within the 144-kbps aggregate service. From the user's point of view, ISDN still suffers from the need to set up a connection, although setup is usually quite fast, and from per-minute charges even for local calls.

ADSL, now focused on the generally asymmetric traffic requirements of computer communications sessions, offers 1.5 to 6 Mbps downstream (network to subscriber) and up to 384 kbps upstream (from the subscriber). A subscriber's ADSL service has the potential to be always connected, permanently linked, for example, through a router in the telephone office into a high-speed data network. It is not yet clear that telephone companies have the "always-connected" paradigm in mind. Telephone companies have wavered in their commitment to ADSL, so it is a very tentative forecast that ADSL service at acceptable cost will be available to millions of telephone subscribers in 5 years.

Although ADSL could vastly improve the performance of multimedia user interfaces, it should be recognized-and this will hold for the other broadband access mechanisms as well-that contention for capacity on networks upstream, and congestion at servers, may also seriously constrain performance.

HDSL, which provides symmetric capacity of 1.5 Mbps and up and usually is designed to work over two twisted pair lines, is not generally associated with residential users but could quickly overtake ADSL if households begin to generate high-capacity traffic.

Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×

Page 120

VDSL, at rates of 25 Mbps or higher, requires a distribution point closer to the subscriber than a present-day telephone office. Its potential penetration is difficult to predict and depends a great deal on the success and competitive implications of cable-based data services.

24. Cable interactive access services are just beginning to be commercially available. It is a fairly safe prediction that by mid-1999 millions of cable subscribers will be offered this service.

25. It is a challenge to the cable industry to make subscription and service provisioning simple and fast, and some standards interoperability questions discussed later, such as "plug and play" of digital set-top boxes, remain to be resolved.

26. The conventional "best-effort" IP service does not require any special capabilities from the core network, but the new QoS-conscious IP services and, of course, ATM do. The core network must deploy technologies such as edge switches and access multiplexers that aggregate traffic arriving under various communications protocols, and must closely control QoS parameters for multiswitch routings.

27. For the modem-based ISPs this implies higher rates, but the cable model may allow "always-on" capability without major increases in hardware investment.

28. CORBA (Common Object Request Broker Architecture), standardized by the Object Management Group, is a leading candidate for a universally accepted architecture, although there are other distributed object systems proposed by major software vendors, such as Microsoft's ActiveX.

29. Transportable software and object broker systems such as CORBA are complementary more than competitive. CORBA provides important object location and management services and facilitates use of existing applications software by wrapping applications (written in whatever computer language) in CORBA objects with standard IDL interfaces. The Java virtual machine requires new applications, all in the Java language, and applets may not execute as efficiently as software written for the underlying operating system, but it facilitates the movement of executable software, with appropriate security constraints, with the benefits outlined above. There are many examples now of CORBA-based systems in which CORBA objects are invoked by transportable Java applets.

Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×
Page 71
Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×
Page 72
Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×
Page 73
Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×
Page 74
Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×
Page 75
Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×
Page 76
Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×
Page 77
Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×
Page 78
Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×
Page 79
Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×
Page 80
Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×
Page 81
Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×
Page 82
Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×
Page 83
Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×
Page 84
Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×
Page 85
Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×
Page 86
Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×
Page 87
Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×
Page 88
Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×
Page 89
Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×
Page 90
Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×
Page 91
Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×
Page 92
Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×
Page 93
Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×
Page 94
Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×
Page 95
Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×
Page 96
Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×
Page 97
Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×
Page 98
Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×
Page 99
Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×
Page 100
Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×
Page 101
Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×
Page 102
Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×
Page 103
Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×
Page 104
Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×
Page 105
Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×
Page 106
Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×
Page 107
Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×
Page 108
Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×
Page 109
Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×
Page 110
Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×
Page 111
Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×
Page 112
Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×
Page 113
Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×
Page 114
Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×
Page 115
Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×
Page 116
Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×
Page 117
Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×
Page 118
Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×
Page 119
Suggested Citation:"3 Input/Output Technologies: Current Status and Research Needs." National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure. Washington, DC: The National Academies Press. doi: 10.17226/5780.
×
Page 120
Next: 4 Design and Evaluation »
More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation's Information Infrastructure Get This Book
×
Buy Paperback | $57.95
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

The national information infrastructure (NII) holds the promise of connecting people of all ages and descriptions—bringing them opportunities to interact with businesses, government agencies, entertainment sources, and social networks. Whether the NII fulfills this promise for everyone depends largely on interfaces—technologies by which people communicate with the computing systems of the NII.

More Than Screen Deep addresses how to ensure NII access for every citizen, regardless of age, physical ability, race/ethnicity, education, ability, cognitive style, or economic level. This thoughtful document explores current issues and prioritizes research directions in creating interface technologies that accommodate every citizen's needs.

The committee provides an overview of NII users, tasks, and environments and identifies the desired characteristics in every-citizen interfaces, from power and efficiency to an element of fun. The book explores:

  • Technological advances that allow a person to communicate with a computer system.
  • Methods for designing, evaluating, and improving interfaces to increase their ultimate utility to all people.
  • Theories of communication and collaboration as they affect person-computer interactions and person-person interactions through the NII.
  • Development of agents: intelligent computer systems that "understand" the user's needs and find the solutions.

Offering data, examples, and expert commentary, More Than Screen Deep charts a path toward enabling the broadest-possible spectrum of citizens to interact easily and effectively with the NII. This volume will be important to policymakers, information system designers and engineers, human factors professionals, and advocates for special populations.

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    Switch between the Original Pages, where you can read the report as it appeared in print, and Text Pages for the web version, where you can highlight and search the text.

    « Back Next »
  6. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  7. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  8. ×

    View our suggested citation for this chapter.

    « Back Next »
  9. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!