Statement on
National Information Infrastructure Issues
Oscar Garcia for the IEEE Computer Society
STATEMENT OF THE PROBLEM
A key challenge facing the information technology community is to integrate human-computer interaction
(HCI) research with that of broadband communication in an economically effective manner serving the broadest
possible community. Over the next 5 to 7 years, we expect this issue to become one of the central economic and
scientific drivers of the entertainment industry, as well as a key influence on how science and education are
conducted in technologically advanced nations. Our organization is particularly concerned with the latter arena.
We want to ensure that scientists, students, and other serious information seekers are able to exploit the new
technology and, in the case of a large segment of our membership, also contribute to it. One of the many
nontechnical barriers is the lack of a forum for dissemination of information about the national information
infrastructure (NII), and the IEEE Computer Society, with its technical groups and wide-ranging publications, can
be a significant facilitator of such a forum. The deployment should include a strong leveraging of the educational
potential, given the more likely entertainment and business aspects. Few universities, high schools, and
elementary schools are prepared to profit from the NII deployment, and even fewer understand what changes in
their educational modus operandi are likely to take place if they participate. These issues are not frequently
included in public policy studies of technology because the responsible curricula designers and implementors are
often absent. One certain aspect is that usability via good interfaces must be taken seriously, given the broad
spectrum of users.
AREAS AND EXAMPLES
Five key areas of HCI research activity are summarized below, together with related key developments,
enabling technologies, and capabilities:
- Human-computer dialogue. Natural language, voice, and other modalities (pen, gestures, haptic, etc.)
can be combined to produce dialogue techniques for HCI. These techniques can be robust enough to tolerate
acoustic noise, disfluencies, and so on. Relevant topics here include speech recognition and synthesis; natural
language processing and understanding;1 lip-reading technologies in noisy environments;2 and search, navigation,3
and retrieval from cyberspace using multimedia.4 Speaker identification techniques can be combined with secure
passwords for use in control of computer access, and language identification may support improved multilingual
interfaces.5 Understanding of natural language and speech also concerns semantics and prosody (for speech,
obviously) and probably the use of semantic nets and semantic encyclopedias now under construction.
Technologies of understanding beyond symbolic matching or statistical techniques allow searches invoked by
description of the searched object in a less restrictive spoken or written form. Roe and Wilpon6 present a
thorough analysis of the potential uses of speech recognition and synthesis, including its use in aiding physically
disabled people. Speech could play a very significant role if used in the new asynchronous transfer mode (ATM)
technologies, which are designed, precisely, as a compromise between voice communications and video
transmission. The use of multiuser dialogue is likely to be extremely useful if there is usable software available,
such as in the case of scientific research and engineering design.7
- Machine sensors for HCI and general I/O to facilitate telepresence and teleoperation. Current
computers are "sensory deprived," representing a barrier to both human-computer interaction and machine
learning. Such sensors as microphones in single and array configurations, infrared and other means of scanning
and computing distances, optical sensors at lightwave range including charge-coupled device (CCD) cameras of
small size, haptic interfaces, and alternatives to click-and-point devices should be studied. Fusing sensor inputs to
the computer with intelligent or learned action-response behavior would create a more realistic approach to
machine learning and complex inferencing techniques, involving symbolic, fuzzy, and probabilistic approaches.
This area has been researched with different objectives, but seldom with that of trying to improve the human-computer interface. Standardizing environments (e.g., via a human-computer interaction workbench; HCI-WB)
can improve measurements. Such an experimental environment is also useful in the study of human behavior in
real and virtual modalities related to the NII, and provides comparisons in human subject variabilities between
real and virtual environment behavior, navigation, and orientation. The potential for research in the fusion of the
modalities is enormous.8 The challenge of this research area is to fuse multiple sensor inputs to the computer in a
cohesive and well-coordinated manner. One such example would be the integration of a CCD camera input with a
haptic experiment using force feedback and synthesized video output. Another helpful experiment could involve
mechanisms for the localization of sound in virtual environments9 using the HCI-WB.
- Large storage (archival and nonarchival), database, and indexing technologies, including
multiresolution and compression for different modalities. Video and audio technologies will require large
compression factors and mechanisms for rapid encoding and decoding and are difficult to index and access for
retrieval, and even then, mass storage database techniques will be required. This area is also indirectly related to
the speech and video synthesis technologies, since high-resolution synthesis approaches imply efficient encoding,
possibly at different resolution levels. Similarly, virtual environment research requires efficient storage and
compression technologies for input and output. There are good reasons to believe, for example, that high-quality
audio can be encoded at rates of 2,000 bps using dynamic adaptation of perceptual criteria in coding and
articulatory modeling of the speech signal. Therefore, encoding research should include both generation and
perceptual factors.10 Additionally, multimedia databases require techniques for providing temporal modeling and
delivery capabilities. A novel interface, called "query scripts," between the client and the database system adds
temporal presentation modeling capabilities to queries. Query scripts capture multimedia objects and their
temporal relationships for presentation. Thus, query scripts extend the database system's ability to select and
define a set of objects for retrieval and delivery by providing a priori knowledge about client requests. This
information allows the database system to schedule optimal access plans for delivering multimedia content objects
to clients. One more example of an area of concern related to the overall throughput capability of the NII is the
Earth Observing System (EOS) of NASA. This system is coupled with a data information system (DIS) in a
composite EOSDIS, which is expected, when operational in 1998, to require transport of one terabyte per day of
unprocessed data and possibly an order of magnitude more when processed, roughly equivalent to the total daily
transport capacity of the current Internet. The question is, Will the NII provide the capacity for even a fraction of
such volumes of data?
- Virtual environments and their use in networking and wireless communication (tethered and
untethered) networked environments11 will have an impact on the NII. Virtual environments relate to
telepresence and telecommuting, as well as to personal communication services for digital voice. The technologies
for telepresence and telecommuting involve a mixture of multimedia and networking. Wireless communication
technology also includes techniques such as geopositioning measures, local indoor infrared sensors for location,
communications technologies at low, medium, and high bandwidth, and so on. The technical challenges of
wireless messaging are well known.12 In particular, the proposed use of ATM LANs will integrate virtual
environment research at different sites with communication research.13 The concept of virtual environments is
taken here in a broad sense, including both head-mounted and enclosed CAVE-like environments,14 telepresence,
and their human factor considerations for the real-time and residual long-term psychological effects of immersion.
Strong encouragement for a research emphasis on the human-computer interface is provided by the National
Research Council's Committee on Virtual Reality Research and Development, whose final report15 makes specific
recommendations for federal national research and development investments. Virtual environments are also likely
to be used for education and experimentation over distances when there are sufficient educational and
psychological developmental technologies and related network communication. For example, realistic virtual
environments could be used for technology education in areas that involve high cost and risk, such as in welding
training.16
- Applications of software engineering and CASE to the R&D of complex software systems and
browsers to be used in HCI.17 Many modules of the software and interfaces for the different modalities might be
developed in a compact and reusable manner, taking advantage of existing and newly developed software
techniques. It has been found that 50 to 90 percent of all lines of code in industrial and military software are
dedicated to human-computer interfaces.18 In this sense, we include usability studies in the scope of software
engineering measurements for interfaces.19 A special interest is anticipated in experimentation with the facilitation
of interactive multimedia educational software development, particularly related to science and engineering topics.
Software financial investments in the NII applications would be affected by their ability to be easily accessible to
the broad community of NII users. Software engineering for the NII is likely to have a flavor quite different from
what has been done in the past at research institutes such as the Software Engineering Institute, strongly based on
Ada environments.
INTERACTION AMONG TECHNICAL AND NONTECHNICAL
(LEGAL/REGULATORY, ECONOMIC, and SOCIAL) FACTORS
There are legal concerns with regard to the balance between security and freedom of communications. In
particular, a thorny issue to be discussed is the degree of responsibility, if any, that carriers have for transmitting
illegal material or for the theft or penetration that may take place when security is breached. There are new
socially explosive issues (pornography, copyright issues, etc.) that need to be addressed in the context of networks
and information systems. They are related to the financial viability of the human-computer interaction on a large
scale by big populations and have a tremendous impact on the publishing industry. A new type of "NII electronic
forensics" needs to be established, and it must have a strong technical basis to stand legal scrutiny. This is an
area that only highly secret intelligence agencies have dealt with and that universities have incorporated only
sporadically in their research areas. It is a delicate area of concern for the public, since it is often related to
security and privacy.
CONTINGENCIES AND UNCERTAINTIES
The entertainment industry is most likely to dominate the field. It is most likely (but uncertain) that only
a few educational institutions will be able to afford the expenditures associated with supplying educational
services to their constituencies. It is not clear how the telephone and publishing industries will react and what
their investments will be, but much of it will depend on intellectual property rights protection and the availability
of sources of materials. How users will react to this can only be gleaned from some experiments such as the
"electronic village" at Virginia Polytechnic Institute's Department of Computer Science. The Digital Libraries
Initiative of NSF, ARPA, and NASA needs to continue and be more widely coordinated in a national forum
accessible to all.
USERS
Classes of users to be served include the following:
- The public. The public will have access to the media. In general the spectrum of "public users" has a
broad range of sophistication. A distribution of know-how would have a large number of naive users (mostly
browsers and e-mail users) and then a small number of highly sophisticated users. Age is not a factor in the
experiential know-how. A study of this population should provide a sociotechnical profile, which would be useful
in this study. These are people not associated with an educational, government, or industrial institution but rather
"home users." They may be seeking information from educational, government, or industrial sources but on an
irregular basis.
- Those associated with educational institutions, such as students, teachers, administrators, and
librarians.
- Those public servants who interface with the public and are in charge of dissemination of
government information; also, individuals able to provide services to those citizens who must be licensed, tallied
by a census, taxed, certified for licenses and renewals, and so on. This population could be categorized into
federal, state, and local government public servants.
- Industrial users. This category has a large subcategory of entertainment, and possibly "edutainment."
These are the salespeople on the network or electronic commerce providers of sources, technology, and products
and include, of course, video-on-demand providers. There are many subcategories here.
Disadvantaged persons or those in geographical areas remote to broadband access will be the most
difficult to serve, partly because of their technical access problems and partly because, in general, they will most
likely be at the low end of user sophistication. They will also be those who are likely to benefit the most from
having access to resources that would otherwise be unreachable.
MARKET RAMP-UP
The market will have to provide "substance" or content. The cost of providing substance is high. How to
provide substantive content, create a cottage industry of providers, allow those potential providers the opportunity
to access and sell in a free market, and draw lines of responsibility and legality are but some of the issues that will
determine the speed of the ramp-up. Interactivity is expensive, as is any two-way communication, but the
bandwidth does not have to be symmetric in both channels. This is an area where technology could have an
impact if we understand the human-computer aspects of interactive "dialogue" in a broad sense. Openness should
mean possible accessibility to all the users who fall within the service potential of a provider on an equal basis,
but should be restrictive, of course, on the basis of registration for cases where financial transactions are to take
place. The determination of viable means to charge for services is a techno-economic factor that is of
fundamental importance for early resolution and fast ramp-up. The scalability may also be viewed from the point
of view of the user's sophistication and needs. Our "help" menus are insufficient and too slow to solve the
problems of specialized use for nonspecialized but proper users of the facilities. New approaches to diagnosis of
the user's difficulty are a part of the "HCI problem" and are required for fast progress by the public user and even
by the moderately sophisticated industrial or government user.
References
1. Cole, R., O.N. Garcia, et al. 1995. "The Challenge of Spoken Language Systems: Research Directions for the
Nineties," IEEE Transactions on Speech and Audio Processing, January.
2. Garcia, O.N., with A.J. Goldschen and E. Petajan. 1994. "Continuous Optical Automatic Speech
Recognition by Lipreading," Proceedings of the Twenty-Eighth Annual Asilomar Conference on Signals, Systems, and
Computers, October 31 November 2, Pacific Grove, Calif.
3. Shank, Gary. 1993. "Abductive Multiloguing: The Semiotic Dynamics of Navigating the Net," Electronic
Journal on Virtual Culture 1(1).
4. Vin, Harrick M., et al. 1991. "Hierarchical Conferencing Architectures for Inter-Group Multimedia
Collaboration," Proceedings of the ACM Conference on Organizational Computing Systems, Atlanta, Ga., November.
5. Wilpon, J., L. Rabiner, C.-H. Lee, and E. Goldman. 1990. "Automatic Recognition of Keywords in
Unconstrained Speech Using Hidden Markov Models," IEEE Transactions on Acoustics, Speech, and Signal Processing,
ASSP-38, November, pp. 1870-1878.
6. Roe, D.V., and J. Wilpon (eds.). 1994. Voice Communication Between Humans and Machines. National
Academy Press, Washington, D.C.
7. Anupan, V., and C.L. Bajaj. 1994. "Shastra: Multimedia Collaborative Design Environment," IEEE
Multimedia, Summer, pp. 39-49.
8. Koons, David B., C.J. Saparrel, and K.R. Thorisson. 1993. "Integrating Simultaneous Inputs from Speech,
Gaze, and Hand Gestures," in Intelligent Multi-Media Interfaces, M. Mayberry (ed.). AAAI Press/MIT Press,
Cambridge, Mass., Chapter 11, pp. 257-276.
9. Gilkey, R.H., and T.R. Anderson. 1995. "The Accuracy of Absolute Speech Localization Judgements for
Speech Stimuli," submitted to the Journal for Vestibular Research.
10. Flanagan, J.L. 1994. "Speech Communication: An Overview," in Voice Communication Between Humans
and Machines, D.V. Roe and J. Wilpon (eds.). National Academy Press, Washington, D.C.
11. Kobb, B.Z. 1993. "Personal Wireless," IEEE Spectrum, June, p. 25.
12. Rattay, K. 1994. "Wireless Messaging," AT&T Technical Journal, May/June.
13. Vetter, R.J., and D.H.C. Du. 1995. "Issues and Challenges in ATM Networks," Communications of the
ACM, special issue dedicated to ATM networks, February.
14. DeFanti, T.A., C. Cruz-Neira, and D. Sandin. 1993. "Surround-Screen Projection-Based Virtual Reality:
The Design and Implementation of CAVE," Computer Graphics Proceedings, Annual Conference Series, pp. 135-142.
15. Durlach, N.I., and A.S. Mavor. 1995. Virtual Reality: Scientific and Technological Challenges. National
Academy Press, Washington, D.C.
16. Wu, Chuansong. 1992. "Microcomputer-based Welder Training Simulator," Computers in Industry.
Elsevier Science Publishers, pp. 321-325.
17. Andreessen, M. 1993. "NCSA Mosaic Technical Summary," National Center for Supercomputing
Applications, Software Development Group, University of Illinois, Urbana, Ill.
18. Myers, Brad A., and Mary Beth Rosson. 1992. "Survey on User Interface Programming," Proceedings
SIGCHI'92: Human Factors in Computing Systems. Monterey, Calif., May 3-7, p. 195.
19. Hix, D., and H.R. Hartson. 1993. Developing User Interfaces: Ensuring Usability through Product and
Process, Wiley.