White Papers of the NII: Statement on National Information Infrastructure Issues

Statement on National Information Infrastructure Issues

Oscar Garcia for the IEEE Computer Society

STATEMENT OF THE PROBLEM

A key challenge facing the information technology community is to integrate human-computer interaction (HCI) research with that of broadband communication in an economically effective manner serving the broadest possible community. Over the next 5 to 7 years, we expect this issue to become one of the central economic and scientific drivers of the entertainment industry, as well as a key influence on how science and education are conducted in technologically advanced nations. Our organization is particularly concerned with the latter arena. We want to ensure that scientists, students, and other serious information seekers are able to exploit the new technology and, in the case of a large segment of our membership, also contribute to it. One of the many nontechnical barriers is the lack of a forum for dissemination of information about the national information infrastructure (NII), and the IEEE Computer Society, with its technical groups and wide-ranging publications, can be a significant facilitator of such a forum. The deployment should include a strong leveraging of the educational potential, given the more likely entertainment and business aspects. Few universities, high schools, and elementary schools are prepared to profit from the NII deployment, and even fewer understand what changes in their educational modus operandi are likely to take place if they participate. These issues are not frequently included in public policy studies of technology because the responsible curricula designers and implementors are often absent. One certain aspect is that usability via good interfaces must be taken seriously, given the broad spectrum of users.

AREAS AND EXAMPLES

Five key areas of HCI research activity are summarized below, together with related key developments, enabling technologies, and capabilities:

Human-computer dialogue. Natural language, voice, and other modalities (pen, gestures, haptic, etc.) can be combined to produce dialogue techniques for HCI. These techniques can be robust enough to tolerate acoustic noise, disfluencies, and so on. Relevant topics here include speech recognition and synthesis; natural language processing and understanding;¹ lip-reading technologies in noisy environments;² and search, navigation,³ and retrieval from cyberspace using multimedia.⁴ Speaker identification techniques can be combined with secure passwords for use in control of computer access, and language identification may support improved multilingual interfaces.⁵ Understanding of natural language and speech also concerns semantics and prosody (for speech, obviously) and probably the use of semantic nets and semantic encyclopedias now under construction. Technologies of understanding beyond symbolic matching or statistical techniques allow searches invoked by description of the searched object in a less restrictive spoken or written form. Roe and Wilpon⁶ present a thorough analysis of the potential uses of speech recognition and synthesis, including its use in aiding physically disabled people. Speech could play a very significant role if used in the new asynchronous transfer mode (ATM) technologies, which are designed, precisely, as a compromise between voice communications and video transmission. The use of multiuser dialogue is likely to be extremely useful if there is usable software available, such as in the case of scientific research and engineering design.⁷
Machine sensors for HCI and general I/O to facilitate telepresence and teleoperation. Current computers are "sensory deprived," representing a barrier to both human-computer interaction and machine learning. Such sensors as microphones in single and array configurations, infrared and other means of scanning and computing distances, optical sensors at lightwave range including charge-coupled device (CCD) cameras of small size, haptic interfaces, and alternatives to click-and-point devices should be studied. Fusing sensor inputs to the computer with intelligent or learned action-response behavior would create a more realistic approach to machine learning and complex inferencing techniques, involving symbolic, fuzzy, and probabilistic approaches. This area has been researched with different objectives, but seldom with that of trying to improve the human-computer interface. Standardizing environments (e.g., via a human-computer interaction workbench; HCI-WB) can improve measurements. Such an experimental environment is also useful in the study of human behavior in real and virtual modalities related to the NII, and provides comparisons in human subject variabilities between real and virtual environment behavior, navigation, and orientation. The potential for research in the fusion of the modalities is enormous.⁸ The challenge of this research area is to fuse multiple sensor inputs to the computer in a cohesive and well-coordinated manner. One such example would be the integration of a CCD camera input with a haptic experiment using force feedback and synthesized video output. Another helpful experiment could involve mechanisms for the localization of sound in virtual environments⁹ using the HCI-WB.
Large storage (archival and nonarchival), database, and indexing technologies, including multiresolution and compression for different modalities. Video and audio technologies will require large compression factors and mechanisms for rapid encoding and decoding and are difficult to index and access for retrieval, and even then, mass storage database techniques will be required. This area is also indirectly related to the speech and video synthesis technologies, since high-resolution synthesis approaches imply efficient encoding, possibly at different resolution levels. Similarly, virtual environment research requires efficient storage and compression technologies for input and output. There are good reasons to believe, for example, that high-quality audio can be encoded at rates of 2,000 bps using dynamic adaptation of perceptual criteria in coding and articulatory modeling of the speech signal. Therefore, encoding research should include both generation and perceptual factors.¹⁰ Additionally, multimedia databases require techniques for providing temporal modeling and delivery capabilities. A novel interface, called "query scripts," between the client and the database system adds temporal presentation modeling capabilities to queries. Query scripts capture multimedia objects and their temporal relationships for presentation. Thus, query scripts extend the database system's ability to select and define a set of objects for retrieval and delivery by providing a priori knowledge about client requests. This information allows the database system to schedule optimal access plans for delivering multimedia content objects to clients. One more example of an area of concern related to the overall throughput capability of the NII is the Earth Observing System (EOS) of NASA. This system is coupled with a data information system (DIS) in a composite EOSDIS, which is expected, when operational in 1998, to require transport of one terabyte per day of unprocessed data and possibly an order of magnitude more when processed, roughly equivalent to the total daily transport capacity of the current Internet. The question is, Will the NII provide the capacity for even a fraction of such volumes of data?
Virtual environments and their use in networking and wireless communication (tethered and untethered) networked environments¹¹ will have an impact on the NII. Virtual environments relate to telepresence and telecommuting, as well as to personal communication services for digital voice. The technologies for telepresence and telecommuting involve a mixture of multimedia and networking. Wireless communication technology also includes techniques such as geopositioning measures, local indoor infrared sensors for location, communications technologies at low, medium, and high bandwidth, and so on. The technical challenges of wireless messaging are well known.¹² In particular, the proposed use of ATM LANs will integrate virtual environment research at different sites with communication research.¹³ The concept of virtual environments is taken here in a broad sense, including both head-mounted and enclosed CAVE-like environments,¹⁴ telepresence, and their human factor considerations for the real-time and residual long-term psychological effects of immersion. Strong encouragement for a research emphasis on the human-computer interface is provided by the National Research Council's Committee on Virtual Reality Research and Development, whose final report¹⁵ makes specific recommendations for federal national research and development investments. Virtual environments are also likely to be used for education and experimentation over distances when there are sufficient educational and psychological developmental technologies and related network communication. For example, realistic virtual environments could be used for technology education in areas that involve high cost and risk, such as in welding training.¹⁶
Applications of software engineering and CASE to the R&D of complex software systems and browsers to be used in HCI.¹⁷ Many modules of the software and interfaces for the different modalities might be developed in a compact and reusable manner, taking advantage of existing and newly developed software techniques. It has been found that 50 to 90 percent of all lines of code in industrial and military software are dedicated to human-computer interfaces.¹⁸ In this sense, we include usability studies in the scope of software engineering measurements for interfaces.¹⁹ A special interest is anticipated in experimentation with the facilitation of interactive multimedia educational software development, particularly related to science and engineering topics. Software financial investments in the NII applications would be affected by their ability to be easily accessible to the broad community of NII users. Software engineering for the NII is likely to have a flavor quite different from what has been done in the past at research institutes such as the Software Engineering Institute, strongly based on Ada environments.

INTERACTION AMONG TECHNICAL AND NONTECHNICAL (LEGAL/REGULATORY, ECONOMIC, and SOCIAL) FACTORS

There are legal concerns with regard to the balance between security and freedom of communications. In particular, a thorny issue to be discussed is the degree of responsibility, if any, that carriers have for transmitting illegal material or for the theft or penetration that may take place when security is breached. There are new socially explosive issues (pornography, copyright issues, etc.) that need to be addressed in the context of networks and information systems. They are related to the financial viability of the human-computer interaction on a large scale by big populations and have a tremendous impact on the publishing industry. A new type of "NII electronic forensics" needs to be established, and it must have a strong technical basis to stand legal scrutiny. This is an area that only highly secret intelligence agencies have dealt with and that universities have incorporated only sporadically in their research areas. It is a delicate area of concern for the public, since it is often related to security and privacy.

CONTINGENCIES AND UNCERTAINTIES

The entertainment industry is most likely to dominate the field. It is most likely (but uncertain) that only a few educational institutions will be able to afford the expenditures associated with supplying educational services to their constituencies. It is not clear how the telephone and publishing industries will react and what their investments will be, but much of it will depend on intellectual property rights protection and the availability of sources of materials. How users will react to this can only be gleaned from some experiments such as the "electronic village" at Virginia Polytechnic Institute's Department of Computer Science. The Digital Libraries Initiative of NSF, ARPA, and NASA needs to continue and be more widely coordinated in a national forum accessible to all.

USERS

Classes of users to be served include the following:

The public. The public will have access to the media. In general the spectrum of "public users" has a broad range of sophistication. A distribution of know-how would have a large number of naive users (mostly browsers and e-mail users) and then a small number of highly sophisticated users. Age is not a factor in the experiential know-how. A study of this population should provide a sociotechnical profile, which would be useful in this study. These are people not associated with an educational, government, or industrial institution but rather "home users." They may be seeking information from educational, government, or industrial sources but on an irregular basis.
Those associated with educational institutions, such as students, teachers, administrators, and librarians.
Those public servants who interface with the public and are in charge of dissemination of government information; also, individuals able to provide services to those citizens who must be licensed, tallied by a census, taxed, certified for licenses and renewals, and so on. This population could be categorized into federal, state, and local government public servants.
Industrial users. This category has a large subcategory of entertainment, and possibly "edutainment." These are the salespeople on the network or electronic commerce providers of sources, technology, and products and include, of course, video-on-demand providers. There are many subcategories here.

Disadvantaged persons or those in geographical areas remote to broadband access will be the most difficult to serve, partly because of their technical access problems and partly because, in general, they will most likely be at the low end of user sophistication. They will also be those who are likely to benefit the most from having access to resources that would otherwise be unreachable.

MARKET RAMP-UP

The market will have to provide "substance" or content. The cost of providing substance is high. How to provide substantive content, create a cottage industry of providers, allow those potential providers the opportunity to access and sell in a free market, and draw lines of responsibility and legality are but some of the issues that will determine the speed of the ramp-up. Interactivity is expensive, as is any two-way communication, but the bandwidth does not have to be symmetric in both channels. This is an area where technology could have an impact if we understand the human-computer aspects of interactive "dialogue" in a broad sense. Openness should mean possible accessibility to all the users who fall within the service potential of a provider on an equal basis, but should be restrictive, of course, on the basis of registration for cases where financial transactions are to take place. The determination of viable means to charge for services is a techno-economic factor that is of fundamental importance for early resolution and fast ramp-up. The scalability may also be viewed from the point of view of the user's sophistication and needs. Our "help" menus are insufficient and too slow to solve the problems of specialized use for nonspecialized but proper users of the facilities. New approaches to diagnosis of the user's difficulty are a part of the "HCI problem" and are required for fast progress by the public user and even by the moderately sophisticated industrial or government user.

References

1. Cole, R., O.N. Garcia, et al. 1995. "The Challenge of Spoken Language Systems: Research Directions for the Nineties," IEEE Transactions on Speech and Audio Processing, January.

2. Garcia, O.N., with A.J. Goldschen and E. Petajan. 1994. "Continuous Optical Automatic Speech Recognition by Lipreading," Proceedings of the Twenty-Eighth Annual Asilomar Conference on Signals, Systems, and Computers, October 31 November 2, Pacific Grove, Calif.

3. Shank, Gary. 1993. "Abductive Multiloguing: The Semiotic Dynamics of Navigating the Net," Electronic Journal on Virtual Culture 1(1).

4. Vin, Harrick M., et al. 1991. "Hierarchical Conferencing Architectures for Inter-Group Multimedia Collaboration," Proceedings of the ACM Conference on Organizational Computing Systems, Atlanta, Ga., November.

5. Wilpon, J., L. Rabiner, C.-H. Lee, and E. Goldman. 1990. "Automatic Recognition of Keywords in Unconstrained Speech Using Hidden Markov Models," IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-38, November, pp. 1870-1878.

6. Roe, D.V., and J. Wilpon (eds.). 1994. Voice Communication Between Humans and Machines. National Academy Press, Washington, D.C.

7. Anupan, V., and C.L. Bajaj. 1994. "Shastra: Multimedia Collaborative Design Environment," IEEE Multimedia, Summer, pp. 39-49.

8. Koons, David B., C.J. Saparrel, and K.R. Thorisson. 1993. "Integrating Simultaneous Inputs from Speech, Gaze, and Hand Gestures," in Intelligent Multi-Media Interfaces, M. Mayberry (ed.). AAAI Press/MIT Press, Cambridge, Mass., Chapter 11, pp. 257-276.

9. Gilkey, R.H., and T.R. Anderson. 1995. "The Accuracy of Absolute Speech Localization Judgements for Speech Stimuli," submitted to the Journal for Vestibular Research.

10. Flanagan, J.L. 1994. "Speech Communication: An Overview," in Voice Communication Between Humans and Machines, D.V. Roe and J. Wilpon (eds.). National Academy Press, Washington, D.C.

11. Kobb, B.Z. 1993. "Personal Wireless," IEEE Spectrum, June, p. 25.

12. Rattay, K. 1994. "Wireless Messaging," AT&T Technical Journal, May/June.

13. Vetter, R.J., and D.H.C. Du. 1995. "Issues and Challenges in ATM Networks," Communications of the ACM, special issue dedicated to ATM networks, February.

14. DeFanti, T.A., C. Cruz-Neira, and D. Sandin. 1993. "Surround-Screen Projection-Based Virtual Reality: The Design and Implementation of CAVE," Computer Graphics Proceedings, Annual Conference Series, pp. 135-142.

15. Durlach, N.I., and A.S. Mavor. 1995. Virtual Reality: Scientific and Technological Challenges. National Academy Press, Washington, D.C.

16. Wu, Chuansong. 1992. "Microcomputer-based Welder Training Simulator," Computers in Industry. Elsevier Science Publishers, pp. 321-325.

17. Andreessen, M. 1993. "NCSA Mosaic Technical Summary," National Center for Supercomputing Applications, Software Development Group, University of Illinois, Urbana, Ill.

18. Myers, Brad A., and Mary Beth Rosson. 1992. "Survey on User Interface Programming," Proceedings SIGCHI'92: Human Factors in Computing Systems. Monterey, Calif., May 3-7, p. 195.

19. Hix, D., and H.R. Hartson. 1993. Developing User Interfaces: Ensuring Usability through Product and Process, Wiley.