At the request of a consortium of federal government agencies, the Committee on Virtual Reality Research and Development was established to provide guidance and direction on the allocation of resources for a coordinated federal program in the area of virtual reality. In responding to this charge, the committee has included both virtual environments and teleoperation in its assessment of the field. Such an extension is required not only for logical and scientific reasons, but also because many of the examples cited in the charge feature the use of teleoperator systems.
In a synthetic environment (SE) system, the human operator is transported into a new interactive environment by means of devices that display signals to the operator's sense organs and devices that sense various actions of the operator. In teleoperator systems, the human operator is connected by means of such displays and controls to a telerobot that can sense, travel through, and manipulate the real world. In virtual reality (VR) or virtual environment (VE) systems, the human operator is connected to a computer that can simulate a wide variety of worlds, both real and imaginary. Simple remote manipulators are an example of the first type of system; video games of the second type.
Teleoperator systems effectively provide the operator with a transformed sensorimotor system that enables him or her to perform new types of actions in the real world. Virtual environment systems effectively provide the operator with controllable methods for generating new types of experiences. Using both teleoperator and virtual environment systems, one can (or will be able to) explore the ocean floor and outer space, visit Samarkand while staying in Elmira, try out products not yet manufactured, dig up a 10-ton container of hazardous waste, take a canoe trip through the human circulatory system, and have one's hair trimmed by a barber in Seville.
SCOPE OF THE SYNTHETIC ENVIRONMENT FIELD
The research and development required to realize the potential of SE systems is extremely challenging. The systems are complicated because they involve both complex artificial devices and a complex biological system (the human operator). There is a crucial need for cooperation among many disciplines, including computer science, electrical and mechanical engineering, sensorimotor psychophysics, cognitive psychology, and human factors. Also, the range of possible applications is exceedingly broad. Overall, the committee believes that the SE field has great potential, that the research and development required to realize this potential is just beginning, and that work in this area should be vigorously pursued by a wide variety of specialists in a wide variety of institutions.
There is currently a great deal of excitement, a great deal of ''hype," and a great deal of confusion associated with the SE area. A major source of the confusion is the combination of rapid acceleration of interest in the area and the coming together of individuals from widely varying disciplines. In some cases, individuals are coming together because the problem to be solved requires expertise in diverse areas. In other cases, they are coming together because it has suddenly become apparent that essentially the same problems are being addressed by individuals in different fields who have never had the benefit of communicating with each other about them.
Associated with this interdisciplinary feature of the SE field is confusion over terminology: each discipline brings to the field its own language and its own biases. For example, whereas computer scientists naturally use the terms input and output in reference to the computer, psychologists use these terms in reference to the human user. Thus, in a virtual environment system, what is output to the psychologist is input to the computer scientist. Similar confusions often arise with the term interface. Whereas computer scientists frequently use this term to designate a component internal to the computer's hardware or software, many others use the term as a shorthand for human-computer interface devices external to the computer. Also, of course, in addition to the communication difficulties associated with the interdisciplinary nature of the field, there are communication difficulties associated with the tendency of different individuals, institutions, and countries to compete rather than to cooperate.
Another source of confusion results from political and public relations considerations. Virtual reality and virtual environment (two terms that we regard as equivalent) are such "hot" terms that many people tend to use them even when their use is logically inappropriate. Thus, for example, these terms are often used in a manner that implies that teleoperator systems are a special case of virtual reality systems. At the same time, however, when describing the origins of virtual reality systems, the history of teleoperator systems (in particular, the use of head-mounted displays in these systems) is entirely ignored. Similar distortions often occur in connection with simulator systems. Although simulator systems, like teleoperator systems, are closely related to virtual environment systems and have a long and distinguished history, past accomplishments in the simulation area are often inappropriately downplayed. Further discussion of the basic concepts and terminology is presented in the next section.
Generally speaking, virtual reality currently has an extremely high "talk-to-work" or "excitement-to-accomplishment" ratio. Between 1992 and 1994, roughly 12 new books have been published, 4 new journals or magazines have been started, and 200 new articles have been published
on the topic of virtual reality. Major professional meetings and trade shows are occurring at a rate of roughly one per month. Over 10 government agencies have held conferences or written reports on VE during the same two-year period. And practically everyone in the field is spending substantial time traveling to other laboratories that are working on VE and providing demonstrations of their own facilities in their own laboratories.
Despite this high talk-to-work or excitement-to-accomplishment ratio, substantial efforts are, in fact, under way in various research and development areas and in various application domains. Significant research and development programs, as well as applications of currently available technology, are being pursued in government, in academia, and in industry. Also, some attempts are being made to develop adequate course material for educational programs in the SE area; however, it is likely to be some time before most academic departments recognize SE as a legitimate field of specialization (e.g., one in which faculty can achieve tenure).
Current research and development efforts directly relevant to the creation of useful SE technology are concerned with (1) computer generation of virtual environments, (2) design of telerobots, (3) improvement of human-machine interfaces, (4) study of relevant aspects of human behavior, and (5) development of communication systems that are adequate to support networking of SE systems. Items (3) and (4) are relevant to all the kinds of systems considered, item (1) to VE systems, item (2) to teleoperation systems, and item (5) to networked systems. An additional item of importance when augmented-reality systems are considered is (6) merging of computer-generated images with images derived directly from the real world.
The "SE Challenge" is related to the High Performance Computing and Communications (HPCC) Grand Challenge program initiated by the federal government through both the computer generation of VEs and networked systems. For many applications, adequate computer generation of the associated virtual worlds is going to require very high-performance computing. Similarly, the networking of SE systems is going to require very high-performance communications. In general, SE systems will provide both a major application area for HPCC and an important source of constraints for the design of HPCC systems.
Currently, the main commercial driving force for the development of VE systems is the entertainment application. There is no equivalent commercial driving force for the development of teleoperators or augmented-reality systems at this time.
Programs on SE technologies and applications are under way in almost every developed country (Thompson, 1993). Major players are the
United States, Japan, and the European Economic Community; other players include South Korea, Singapore, the Netherlands, and Sweden. Although each of these regions is engaged in a full range of research, development, and commercial activities, the work in each region bears the marks of its distinctive culture.
Today, more than 25 universities, at least 15 federal agencies, and more than 100 large and small companies throughout the United States are contributing to the growth of research and development in the SE field. In industry, research and development directed toward defense, space, scientific visualization, and medicine are more prominent in the United States than elsewhere. The European Economic Community and Japan have regional or national initiatives on SE, but such initiatives are still being debated in this country.
Although the recession of the early 1990s in Europe has slowed down investment, a variety of SE projects are under way in industry and, to a lesser extent, in universities. Interests in the United Kingdom are similar to those in the United States but place more emphasis on education, training, and entertainment. The United Kingdom may well be the world leader in SE entertainment systems. On the continent, work on SE applications is being conducted at the European Space Research Center in Noordwijk, the Netherlands. Research on computer-aided architectural software and a virtual railroad environment are also being supported in the Netherlands. In France, the university at Metz is developing an autonomous motor vehicle for people with disabilities that uses SE technology. In Lille, the University of Technology is exploring the use of teleoperation in surgery. At the University of Paderborn in Germany, a new method for walk-through animation in three-dimensional scenes is under way. In Italy, the University of Genoa is developing a knowledge-based simulation for production engineers.
Japan entered the VE part of the SE world later than the United States and Europe. Recently, however, that country has realized that VE, as well as teleoperation, is a logical extension of its strong national interest and background in robotics, automation, and high-definition television. Concern with haptic interfaces and force-feedback sensor display systems is also intense. As a consequence, Japan has established 10 national consortia for research and development in the SE area that, taken together, provide more funds per year than all SE investment in the United States (Larinaji, 1994). In 1992, the Japan Technology Transfer Association formed an Artificial Reality and Tele-Existence Research Committee of 90 participating companies from the SE industry. Knowledge and technology sharing among companies—generally a boon to Japanese industry—are extensive. These indicators, together with its typical long-range financial horizon, large targeted investments, and a national technology
agenda, could give Japan a major competitive advantage in SE. The extent to which this advantage is actually realized will depend, at least in part, on the extent to which Japan can become a leader in the relevant computer software areas.
This overview begins by presenting some basic concepts and terminology that are important in talking about virtual environments and teleoperator systems. We then present some visions of where we think the technology may be leading. The visions section differs from the rest of this report in the speculative nature of the material and in the incorporation of societal issues into the scenarios. The overview then goes on to summarize the current state of the synthetic environment (SE) field, covering application domains, knowledge about human behavior and performance, technology issues, and evaluation issues. The committee's assessment of needs and priorities completes the overview. In making these recommendations, we include consideration of the extent to which various research goals are likely to be realized without special government funding efforts or are likely to require such efforts. Similarly, we consider issues related to the infrastructure required to carry out various research and development programs.
BASIC CONCEPTS AND TERMINOLOGY
There are currently no precise and generally accepted definitions of the terms being used in our area of interest. This is due in part, as already discussed, to the interdisciplinary nature of the field and to public relations matters. It is also due to fundamental problems of the type usually encountered in efforts to create language that faithfully reflects the structures and processes to which the language refers. For example, whereas language is fundamentally discrete, the evolutionary process by which virtual environment systems have developed from antecedent systems (such as desktop computing systems, simulators, teleoperator systems, etc.) is effectively continuous. Thus, either the definition of virtual environment systems must remain rather fuzzy, or one must set arbitrary thresholds on the complex, continuous evolutionary process.
Here, we outline some of the principal defining ideas and indicate how the terms virtual environment, teleoperator, and augmented reality are related to each other and to other closely related terms such as simulator, telerobot, and robot. Our purpose is to provide background on the meaning of the terms we use in order to permit readers to understand later sections of the report. The process of creating and defining terms in this area will of course continue for many years.
A teleoperator system consists of a human operator, a human-machine interface, and a telerobot (Figure 1). Environmental signals are sensed by
sensors (cameras, microphones, etc.) located in the telerobot, transmitted to the human-machine interface, and presented to the human by means of display devices (e.g., cathode ray tubes, earphones) in the interface. Human responses, usually motor actions, are sensed by the interface and used to control the actions of the telerobot. Thus, a teleoperator system can be viewed as a system for extending the sensorimotor system of the human organism. The purpose of such a system is to facilitate the human operator's ability to sense, maneuver in, and manipulate the environment. Teleoperator systems vary along many dimensions, including the structure of the human-machine interface and the telerobot and the nature of the control algorithms.
Teleoperator systems have been used to conduct work in outer space and under the ocean; to perform a variety of tasks in connection with security, firefighting, nuclear plants, and hazardous waste removal; to assist in various types of military operations; to perform microsurgery; and to aid in the rehabilitation of individuals with severe physical disabilities. In some teleoperator systems, the human operator has direct and detailed control of all the telerobots actions. In other systems, the human's control occurs only at a supervisory level and many of the telerobots detailed actions are controlled locally and automatically. In the extreme, there is no human control, all actions of the telerobot are automatic and autonomous, and the telerobot is called simply a robot.
A virtual environment system (also illustrated in Figure 1) consists of a human operator, a human-machine interface, and a computer. The computer and the displays and controls in the interface are configured to immerse the operator in an environment containing three-dimensional objects with three-dimensional locations and orientations in three-dimensional space. Each virtual object has a location and orientation in the surrounding space that is independent of the operator's viewpoint, and the operator can interact with these objects in real time using a variety of motor output channels to manipulate them. The extent to which a virtual environment is designed to simulate a real environment depends on the specific application in mind.
As illustrated in Figure 1, teleoperator and virtual environment systems are similar in that they both involve human operators and elaborate human-machine interfaces. They differ however, with respect to what takes place on the nonhuman side of the interface. Whereas in a teleoperator system the interface is connected to a telerobot that operates in a master-slave or supervisory control mode in a real-world environment, in a VE system the interface is connected to a computer.
Consistent with this difference in structure is the difference in purpose between the two types of systems: whereas the purpose of a teleoperator system is to sense, manipulate, and transform the state of the
real-world environment, the purpose of a VE system is to sense, manipulate, and transform the state of the human operator (as in training or in scientific visualization) or to modify the state of the information stored in the computer (e.g., the virtual environment or some theoretical model represented in the computer software). Virtual environment systems are being used in the areas of telecommunication, information visualization, health care, education and training, product design, manufacturing, marketing, and entertainment. In the near future, such systems are likely to find further applications in various areas of psychology, including basic psychophysical research, biofeedback, and psychotherapy.
Many systems are now being developed that are mixtures or blends of teleoperator and virtual environment systems. Thus, for example, VE systems are now being introduced as subsystems of teleoperator systems in order to assist the human operator in controlling the telerobot. In particular, when the telerobot is sufficiently far removed from the human operator to cause significant time delays in the transmission of information between the telerobot and the human operator, virtual environments can be used to present computer-generated information derived from predictive models in the computer.
People are also designing systems in which virtual and real environments are combined (Figure 2). The use of such augmented-reality systems
is being explored in medical applications, manufacturing applications, and driving applications (both airplanes and cars). In many such cases, information from the real environment is sensed directly by means of a see-through display, and the supplementary information from the virtual environment is overlaid on this display. In other cases, the real-environment information to be combined with the virtual-environment information is derived by means of a teleoperator system. Although currently receiving less attention in the SE community, it is also possible of course to consider augmented-reality systems in which, instead of combining input channels, output channels are combined. For example, speech sounds or commands uttered by the human operator might be combined with those uttered by an automatic speech-synthesis system, or physical objects in the environment might be manipulated by systems that include both the hand of the human operator and a telerobotic hand controlled by the operator. There are certainly many tasks in which it would be extremely useful to have a third hand (with special features perhaps) that could work cooperatively with one's own two hands and be controlled,
perhaps, by simple speech commands. A further way of picturing some possible relations between teleoperator and virtual environment systems within an SE system is illustrated in Figure 3.
In all of these systems, the human operator is projected into a new interactive environment that is mediated by artificial electronic and electromechanical devices, and in all of these systems, the operator's performance and subjective experience in the new environments depend strongly on the human-machine interface and the associated environmental (real or virtual) interactions. In general, we refer to all of these systems (teleoperator systems, virtual environment systems, augmented-reality systems, etc.) as synthetic environment (SE) systems.
In considering these different kinds of systems, it should be noted that many of the problems now facing designers of VE systems have been studied previously in the field of telerobotics. This is the case, for example, in the area of human-machine interfaces. Although the constraints
on such interfaces for VE systems and for teleoperator systems are not identical, there is considerable overlap. Similarly, many of the problems now facing VE designers in the area of autonomous agents (i.e., computer-generated entities with programmed-in behaviors that enable the entity to function without direct commands or supervision by the human operator) have been studied for many years by the designers of autonomous electromechanical robots.
One aspect of the subjective experience in SE systems that has received considerable attention is the extent to which the human operator loses his or her awareness of being present at the site of the interface and instead feels present in the artificial environment. This feature, often referred to under the headings of telepresence, virtual presence , or synthetic presence, is dependent on many factors, including the extent to which the interface is transparent and attenuates stimulation from the immediate environment, as well as the amounts and kinds of interaction that take place in the artificial environment.
The distinction between VE systems and simulator systems is more subtle than the distinction between VE systems and teleoperator systems. Also, there is a more or less continuous transition from simulator systems to VE systems. Generally speaking, the term VE system rather than simulator system is increasingly used as the following conditions are more fully satisfied:
The system is easily reconfigurable by changes in software;
the system can be used to create highly unnatural environments as well as a wide variety of natural ones;
the system is highly interactive and adaptive;
the system makes use of a wide variety of human sensing modalities and human sensorimotor systems; and
the user becomes highly immersed in the computer-synthesized environment and experiences a strong sense of presence in the artificial environment.
It has also been suggested (Breglia, personal communication, 1992) that, whereas a simulator is most intimately tied to the given physical system with which the user is expected to interact (i.e., it is designed to simulate this physical system), a VE system is most intimately tied to the human operator (i.e., it is designed to include a general-purpose interface to match the human organism, as well as the capability for generating a large range of virtual worlds). Accordingly, it is not surprising that a large fraction of VE equipment constitutes a kind of high-tech clothing (head-mounted displays, gloves, body suits, etc.). A further suggestion (Allard, personal communication, 1994) is that simulators and VE systems differ in the extent to which the near field (i.e., the world within the
user's reach) is real or simulated. A simulator ordinarily simulates only the far field and uses real physical mock-ups for the near field, whereas VEs can simulate the near field as well as the far field.
The distinction between VE systems and other types of highly flexible computer systems (e.g., conventional desktop computing systems) is based mainly on the extent to which the system is interactive, multimodal, and immersive (items 3, 4, and 5 listed above). When focused on the visual channel, the characteristics of three-dimensional rather than two-dimensional presentations, plus a large field of view, are often cited as distinguishing characteristics.
Most current systems involve visual and auditory displays; very few involve olfactory or gustatory displays (one exception is discussed in Chapter 7). Often the displays are presented by means of devices mounted on the operator's head in a head-mounted display system. Control signals are usually derived from the human operator's motor behavior—actions of the head, hands, feet, or speech production mechanism. The use of control signals derived from neural behavior (e.g., electroencephalogram signals) is still rare. In the case of head-mounted displays, the interface usually includes a system for monitoring head position, and the visual images displayed to the eyes and the auditory images displayed to the ears are modified in real time according to the measured head position. By monitoring head position, the visual image seen by the user can be continually modified so that, no matter how he or she moves, the objects in the virtual world remain in stable locations, just as they would in the real world. The user is given the impression that she or he is moving about in a stable world even though the stable world is created artificially. (In a teleoperator system, the position of the human's head is used to control the position of the telerobots optical and acoustic sensors and thereby the images presented on the displays; in a VE system, the human's head position is used to control the characteristics of the synthesized visual and auditory images.)
The term haptic interfaces refers to interfaces involving the human hand and to manual sensing and manipulation. One common type of haptic interface currently in use is a device consisting of a glove or manual exoskeleton that monitors hand position and posture (i.e., finger joint angles). These devices, like head-position trackers, provide no feedback and are used solely for control. Other devices, such as force-reflecting joysticks, act as tool handles and serve not only a control function but also a display function because they are capable of providing force feedback. Haptic interfaces in which hand position and posture are tracked and object properties such as texture and temperature are displayed to the hand (as well as simple force information) are not yet commercially available.
The availability of force feedback is a powerful addition to a virtual environment. By sensing the position of the fingers relative to a virtual object, such as a simulated rubber ball, the system can introduce force cues as the user closes his or her hand around the virtual object. With suitable sensors and actuators, the object can be made to feel stiff or spongy by systematically manipulating the characteristics of the force cues as a function of the position and motion of the fingers relative to the position of the object. In this manner, it is possible to create haptic images of virtual objects (a further defining characteristic of VEs).
In this section, we attempt to provide the reader with a glimpse of the future that we foresee if SE systems continue to develop at the current rapid rate. To indicate the special nature of this discussion, presented in the form of speculative vignettes, we have used a different typography in the sections that follow.
For specificity, we have chosen to convey this picture of the future in terms of the activities of a family of two adults and two children in their home. Although many of these activities clearly require advances in certain components of the technology, we believe that most such advances will take place within the next 5 to 10 years. In cases in which there is substantial uncertainty about the achievability of a given hypothesized development, we have indicated such uncertainty by referring to it as a research project. (Perhaps the most unrealistic aspect of the picture we paint is that of a traditional nuclear family at home together: current statistics indicate both a decrease in the incidence of traditional two-parent families and an increase in the incidence of multigenerational families.) For convenience, we have chosen to focus on activities inside rather than outside the home, despite the fact that most of the activities considered in our discussion of applications in Chapter 12 take place outside the home.
The envisioned family, the Roberts family, includes a mother, Jennifer, a father, Henry, a 12-year-old daughter, Samantha, and a 16-year-old son, Peter. Henry does not appear until the end of the sketch because he is not actively involved in any SE activity; he is suffering from ''SE overdose."
Finally, it should be noted that in writing this section we have not hesitated to interweave images based on assumed future technology with images based on assumed future social and psychological phenomena. We have included the latter not because we have any particular expertise in predicting such phenomena, but because we believe that technology must be considered in the light of such phenomena. It is our hope that those who follow up on this report will have the expertise appropriate to serious consideration of these issues.
Jennifer Roberts, the mother, is training to become a surgeon and is at her SE station studying past heart operations.
She previously spent many hours familiarizing herself with the structure and function of the heart by working with the virtual-heart system she acquired after deciding to return to medical school and to specialize in heart surgery. This system includes a special virtual-heart computer program obtained from the National Medical Library of Physical/Computational Models of Human Body Systems and a special haptic interface that enables her to interact manually with the virtual heart. Special scientific visualization subroutines enable her to see, hear, and feel the heart (and its various component subsystems) from various vantage points and at various scales. Also, the haptic interface, which includes a special suite of surgical tool handles for use in surgical simulation (analogous to the force-feedback controls used in advanced simulations of flying or driving), enables her to practice various types of surgical operations on the heart. As part of this practice, she sometimes deliberately deviates from the recommended surgical procedures in order to observe the effects of such deviations. However, in order to prevent her medical school tutor (who has access to stored versions of these practice runs on his own SE station) from thinking that these deviations are unintentional (and therefore that she is poor material for surgical training), she always indicates her intention to deviate at the beginning of the surgical run.
Her training also includes studying heart action in real humans by using see-through displays (augmented reality) that enables the viewer to combine normal visual images of the subject with images of the beating heart derived (in real time) from ultrasound scans. Although there are still some minor imperfections in the performance of the subsystem used to align the two types of visual images, the overall system provides the user with what many years ago (in Superman comics) was called X-ray vision. In this portion of her training, Jennifer examines the effect of position, respiration, exercise, and medication on heart action using both the see-through display and the traditional auditory display of heart sounds.
Today, Jennifer is studying recordings of a number of past real heart operations that had been recorded at the Master Surgical Center in Baltimore. In all of these operations, the surgery was performed by means of a surgical teleoperator system. Such systems not only enable remote surgery to be performed, but also increase surgical precision (e.g., elimination of hand tremor) and decrease need for immobilization of the heart during surgery (the surgical telerobot is designed to track the motion of the heart and to move the scalpel along with the heart in
such a way that the relative position of the scalpel and the target can be precisely controlled even when the heart is beating).
The human operator of these surgical teleoperator systems generally has access not only to real-time visual images of the heart via the telerobotic cameras employed in the system, but also to augmented-reality information derived from other forms of sensing and overlaid on the real images. Some of these other images, like the ultrasound image mentioned above, are derived in real time; others summarize information obtained at previous times and contribute to the surgeon's awareness of the patient's heart history.
All the operations performed with such telerobotic surgery systems are recorded and stored using visual, auditory, and mechanical recording and storage systems. These operations can then be replayed at any time (and the operation felt as well as seen and heard) by any individual, such as Jennifer, who has the appropriate replay equipment available. Recordings are generally labeled "master," "ordinary," and "botched,'' according to the quality of the operation performed. As one might expect, the American Medical Association initially objected to the recording of operations; however, they agreed to it when a system was developed that guaranteed anonymity of the surgeon and the Supreme Court ruled that patients and insurance companies would not have access to the information. This particular evening, Jennifer is examining two master double-bypass operations and one botched triple-bypass operation.
During her training time on the following day, she is going to monitor a heart operation in real time being performed by a surgeon at the Master Surgical Center in Baltimore on a patient in a rural area of Maryland roughly 200 miles away. Although substantial advances have been made in combatting problems of transport delay in remote surgery (by means of new supervisory control techniques), very few heart operations are being conducted remotely at ranges over 500 miles.
In addition to spending time on her basic surgical training, Jennifer is participating in a research project being conducted at the center that is concerned with the use of microtelerobots for the diagnosis and treatment of circulatory disorders. Microtelerobots with dimensions less than 1 nm are being designed to enter the circulatory system, make measurements of various circulatory parameters at various locations within the system, and then perform local microsurgery under remote supervisory control.
Samantha Roberts, the 12-year-old daughter, is spending the early hours of the evening shopping for a dress via her SE station.
A week ago she underwent the periodic body scan essential for individuals whose measurements are rapidly changing. This scan was performed using the video camera associated with the SE station, a standard body-scan program provided by the shopping network to which the family had subscribed, and a special body stocking with grid lines to facilitate the automatic measurement process.
Dress material (design and fabric) is selected through the use of an interactive program in which a sequence of samples is displayed to Samantha (visually and tactually); she responds to each element in the sequence by rating the material on a scale of 1 to 10. The presentation sequence is adaptive in the sense that the choice of material to be presented on step n in the sequence is based on the presentations and responses for steps 1 through n – 1. All such sequences are stored in the Samantha file of the shopping system to provide background for future marketing efforts. The universe of patterns and fabrics considered is defined by the currently available manufacturing techniques. The notion of inventory is no longer relevant because all clothing is manufactured to design specifications of individual clients after the specifications are determined. The subjective ratings supplied by Samantha to guide the presentation sequence are based on feeling the fabric by means of a special tactile display as well as seeing it by means of the standard visual display. The tactile display, an experimental component of the special SE clothing package sold by the shopping network, consists of a rectangular array of microactuators that allow one to feel different fabrics and textures by stroking the array with the hand.
After a tentative decision is made on the material to be used, a similar process is employed to pick out a style. In this case, the ratings are based solely on visual displays—tactile images are irrelevant.
Given a tentative choice of fabric and style, the next step involves virtual modeling of the dress by Samantha herself. Given the record of Samantha's physical measurements and images in the shopping network's file, the system now synthesizes visual images of Samantha wearing the dress she picked out by her sequence of rating responses. Moreover, the actions of the synthesized Samantha model are controllable by Samantha herself by means of the shopping network's clothing-model interface (again supplied by the shopping network as part of the special shopping package sold to the family). After a modest amount of practice with the interface, Samantha is able to cause her image to perform routines similar to those she has seen professional models perform in conventional fashion shows. Initially, the shopping network's synthesis program was intentionally distorted to make the client's image appear more like his or her ideal image (derived from a "get-to-know-you" program included in the initial package) than it actually does; however,
a special regulatory rule was introduced to control such distortions. In the future, Samantha may also be able to "feel" the fit of each dress. The industry realizes the importance of the sense of fit and has initiated an intensive, long-range research effort to develop the complex tactile displays required.
The cost of each of the virtual dresses considered by Samantha is presented to her as soon as the fabric and style are selected. Occasionally, Samantha scans through the fabric/style/cost matrix of the dresses she is considering in order to refresh her memory about these dresses.
By paying a special fee, it is possible for the shopper to inspect the files of other shoppers on the system. In particular, when considering the purchase of a specific dress, it is possible to call up a file that provides information on all the other shoppers in the network who purchased a similar dress. (A substantial fraction of the fees collected in this manner are paid out to the shoppers on the network in order to entice them to grant permission for such file inspection by other shoppers.)
Once a final decision is made as to the dress to be purchased, the shopper's decision is communicated to the manufacturing component of the shopping network; the selected "marketing design" is mapped into the appropriate "manufacturing design" and then manufactured in the shopping network's programmable factory—a demand-activated, computer-controlled, manufacturing system in which marketing and sales are fully integrated with design and production. Simultaneously, the funds for purchasing the dress are deducted from the shopper's account in the shopping network bank. In most cases, the dress is delivered within three days of its selection.
No returns of merchandise are permitted until the subscriber has spent a threshold amount of funds via the network; thereafter, returns are permitted, but the cost of such returns to the shopping network is factored into the cost of the shopping service.
High School Education
Peter Roberts, the 16-year-old son, is doing what previously was called school homework. The distinction between doing school work at school and doing homework for school at home has become very muddy; in both cases, much of the time is spent interacting with teachers, other students, and virtual worlds via networks of SE stations. In addition, along with the deemphasis of school as a geographic location, the distinction between school and not-school has diminished. The defining characteristics of students' experiences are the network to which they belong and the network courses or projects in which they become involved. Among the major consequences of these fundamental changes are adaptive time schedules
to accommodate collaboration among individuals who live in different time zones across the world and the inclusion of children and teachers who are homebound because of severe physical disabilities.
Currently, Peter is participating in four network courses: mathematics, environmental science, empathy, and dance-music-dance.
In the mathematics course being taken by Peter, the SE facilities are used to provide course participants with an intuitive understanding of various non-Euclidean geometries. Participants in the course enter into virtual worlds in which the properties of the space are determined by the axioms of the particular geometry being studied. These properties are explored not only by virtually traveling through the space, but also by building virtual towers, bridges, and houses within the space. The effects of changing the axioms of the geometry, which the students are encouraged to explore, are immediately realized in terms of the virtual world structure. When the changes lead to axiomatic systems that are internally inconsistent, the space "blows up" and a tombstone appears with an inscription describing the inconsistency.
In the special term project Peter has selected (each student is required to teach some aspect of mathematics to younger children), he is designing a method for showing younger children how speed is represented by the slope of the distance-versus-time graph (all in Euclidean space). Peter's basic idea is to construct a virtual train that will move across a virtual horizontal track at variable speed and to present associated graphs in virtual graph space. The first graph will plot the distance the train goes from its initial position in the train station as a function of time, and the second the instantaneous speed as a function of time. In addition to augmenting the graphs with a conventional clock face icon, Peter plans to display the tangent to the first course at the current time on the first graph continuously as the curve evolves over time, using the same color for plotting this tangent line on the first graph and for plotting the speed curve on the second graph.
In this course, Peter is participating in three projects, each of which is led by a professional meteorologist in Indiana who is donating four hours per week to the course via the network. The first project focuses on the gathering of information on atmospheric conditions around the world by means of atmospheric measurement kits located in the homes of all the students taking the network course around the world, entering this information into the network computer assigned to the course, and then
studying the atmospheric condition displays generated by the system. Inasmuch as each measurement kit records not only temperature, pressure, and humidity, but also certain molecular constituents of the sampled air, there are many parameters that have to be represented in the display. Some of the course participants are comparing current conditions with those predicted by the model developed by the meteorologist in Indiana. Others, including Peter, are working on improved methods for displaying and interacting with the empirical information, the model's predictions, and the deviations between the two.
In the second project, which the meteorologist introduced by explaining the concept of a microclimate, Peter and his student collaborators are studying a hypothetical environmental accident in Birmingham, Alabama. The specific question being addressed by the students on this day is the following: If a hypothetical accident released 8,000 kg of chlorine gas into the air at the Vulcan Tower in Birmingham at 9:00 a.m. today, what portions of the city should be evacuated? A geographic profile of Birmingham's topography was made available to the network course, and virtual sensors indicating current temperature, precipitation, barometric pressure, and wind velocity were distributed throughout virtual Birmingham. Based on the readings from these sensors and information provided by the meteorologist on how chlorine gas dissipates by bonding with various materials, estimates are being made of the chlorine-content contours associated with the chlorine gas cloud as a function of time. These estimates, combined with information on human tolerance to chlorine gas and on the capacity of various transportation facilities in Birmingham, are being used to construct evacuation plans.
In the third project, the students are learning about the formation and behavior of tornadoes. The meteorologist provided a computational model of tornadoes to the network school and the students are learning about tornadoes by virtually locating themselves at different positions in the tornado and also observing how the velocity vector at different points of the tornado varies as a function of the values assigned to the parameters of the computational model. Production of the visual images seen by the students results from a blending of visual images generated by the computational model and images derived from video recordings of real tornadoes. Appreciation of the forces associated with the tornado is facilitated by providing the students with an ability to place different virtual objects (people, cars, houses, etc.) in the path of the tornado and observing the effects of the tornado on these objects.
This course was developed by a multidisciplinary team of physical scientists, biologists, anthropologists, sociologists, and psychologists.
The goal of the course is to familiarize the network students not only with the behavior of different kinds of people, animals, and physical system, but also with how it feels to participate in these other worlds. Many of the techniques used in this course are refinements of techniques previously developed in connection with interactive virtual environment theater, and many of the supporting personnel for the course are college students participating in various types of internship programs.
Each student visiting a virtual world is assigned a virtual actor in this world and must learn to control this actor in a manner that satisfies the constraints designed into the specific scenario considered. In general, these constraints are used to give the participants experience in living in different physical environments (e.g., in the desert, in the arctic, on the moon); in different social or anthropological settings (e.g., as a member of an ancient culture, a highly discriminated-against minority, a person with severe physical disabilities; or even as a member of a different animal species (e.g., as a member of an insect society or as a sea-dwelling creature low down on the food chain). The role assignments are typically a month in duration and students are expected to refine their understanding of the assigned role to the point at which other observers cannot distinguish the behavior of the characters controlled by the students from the "natives" (played by trained personnel or by highly developed computer-controlled autonomous agents).
In order to make the simulations employed in this course practical for real-time use by the students, they are greatly simplified. In most cases, such as those concerned with other physical environments and other animal species, these simplifications are readily accepted. However, in some cases, particularly those that focus on human social issues, such simplifications have occasionally been seen as offensive stereotypes and strongly resented. Thus, these portions of the courses have become very controversial and have led to considerable turmoil.
Dance-music-dance is a new course introduced by a research associate in the arts department of the network. She is a professional dancer who has become highly skilled in the use of the human body as a musical instrument and in composing musical compositions by means of dance routines. Her main work during the past year had been concerned with evaluating different mappings from the outputs of the body tracker she has been using (an optical system that does not encumber her body or interfere with her dance movements) into the control parameters used for generating sounds via the computer-music system at her disposal. Recently, she has become interested in the relations between the dance
routines used to generate the music and the dance routines the resultant music inspires in other dancers listening to the music. Similarly, she is studying the relations between the initial music generated by her own dance routines and the secondary music generated by the other dancers. In the network course she has constructed, students select dance-to-music mappings to be used, choreograph their own dance routines and thereby compose their own music, arrange for the other students to dance to this music, and then do an analytic study of the above-described relationships. Peter, who believes himself to be rather clumsy and is rather inhibited about performing a full free-body dance, choreographed his initial dance piece using only the index finger of his right hand.
Later in the evening, Jennifer, Samantha, and Peter participate in the weekly network telemeeting. The main focus for this particular meeting is a discussion with the network candidate for Congress. Except for a few minor functions, the members of Congress now represent networks rather than geographic regions such as states. Whereas the last portion of the meeting is intended to accommodate free-ranging discussion, the first portion is structured to cover four specific sets of issues.
The first set concerns the cost structure of network participation, the extent to which members of different networks are becoming isolated from each other and incapable of communicating across network boundaries, and the increasing problem of "ghetto networks."
The second set of issues concerns the rapid rise of network gambling. The amounts of money involved in this activity now exceed that involved in medical care and education combined, and gambling taxes now exceed income taxes. The principal issue of concern is how the tax money collected should be split between the federal government and the network.
The third set of issues focuses on the creation of appropriate laws for governing behavior within virtual environments. The number of cases in which VE crimes and misdemeanors are occurring is increasing, and no significant body of law is available to handle these cases. Also, problems are arising that involve crossing the VE boundary, i.e., unlawful acts being committed in the real world in response to injuries suffered in VEs. Thus, for example, when a virtual pet salamander who was being guided by a man's son was deliberately stepped on by a virtual actor being guided by a stranger on the network, the man located the home of the stranger and went over to his home and shot him.
The fourth set of issues concerns the use of the "information highway" and SE stations for purposes of lovemaking at a distance. Although the use of these facilities for this purpose was clearly predictable
(e.g., based on the use of the mail for transmitting love letters, the use of the telephone for vocal lovemaking, and the use of interactive video for including visual images), it nevertheless is the center of considerable controversy. Of particular concern is the commercial introduction of special devices that contain sensors and actuators to facilitate tactual lovemaking at a distance ("telesex kits"). The inclusion of the tactual channel, and the associated increased blurring between "direct sex" and "telesex," has caused a number of attitudinal tolerance thresholds to be exceeded. For example, in addition to the usual strong feelings evidenced by different groups in society about who should have what kinds of sexual relations with whom under what circumstances, and about which kinds of commercial exploitation of sex should be allowed and which kinds disallowed, the legal community is now concerned about having to handle cases in which sexual relations are conducted across the boundaries of states with different laws governing sexual behavior. Similarly, on the basis of a study in France that documented a decrease in sexually transmitted diseases associated with the increased use of telesex, the Center for the Control of Communicable Diseases is now considering adding the use of telesex kits to the use of condoms as an important means for controlling the still-exploding incidence of the AIDS virus.
Two kinds of facilities are available for participating in these network telemeetings. With the first kind, each individual in the family wears a head-mounted display and works through his or her own SE station; in the second, rather than using the traditional head-mounted displays, the family sits together in a special room outfitted with a wall-sized visual display, a set of acoustical loudspeakers, and a set of video cameras. Each individual is assigned one of the video cameras, which then tracks that individual as he or she moves around the room. The Roberts family purchased both kinds of facilities because they believed that neither one alone is adequate for all purposes.
Henry Roberts, the father, is not participating in the telemeeting because he is suffering from SE overdose. Although problems associated with conventional simulator sickness were brought under control years ago, a set of deeper problems emerged as individuals began to spend substantial portions of their waking hours living in synthetic environments.
One aspect of these problems is evidenced in the choices Henry makes about how to spend his leisure time. Initially, he spent much of his leisure time playing SE games, taking SE trips to foreign lands and planets (both real and imaginary), participating in interactive SE theater, and exploring the world from the viewpoint of different types of
creatures (again, real and imaginary). One of his favorite activities had been to interact with real bees inside a real beehive using a telerobotic model bee system. (A whole variety of such telerobotic model animal systems was developed in connection with scientific study of animal behavior at the Tinbergen-Lorenz Institute in Munich.) Now, however, he wants to get away from all this "electromechanically mediated stuff" and interact with the world directly. Accordingly, he spent his last three vacations in the surrounding mountains camping out with some friends who are having similar problems. Air quality has improved substantially with the introduction of SE systems because of the associated reduction in auto travel. The fact that Henry's desire to get back to nature is rather common is evidenced by the enormous growth taking place in the camping equipment industry.
Henry has also increased the amount of time he spent exercising in the real world. When he first acquired his SE station, he made extensive use of SE jogging (which involved the use of a six-degrees-of-freedom treadmill and synthetic scenery) and SE golf; however, as part of his reaction to too much SE, he has switched back to the "real thing." Also, he refuses to join the political movement concerned with the large amounts of energy being wasted by the exercise mania that has swept the country and with finding a practical scheme for capturing, storing, and making use of this energy.
Henry is also undergoing therapy in connection with his SE experiences. Although many aspects of these experiences seem disturbing to Henry and his therapist, one aspect of central importance concerns Henry's body image, his sense of presence, and his underlying identity. Apparently, the ease with which Henry is able to transform himself into other creatures in other environments, and become realistically immersed in these other roles and other worlds, is becoming a real psychological problem for him.
Ordinarily, the therapist administers treatment to his patients via an SE network that incorporates a biofeedback mode. However, in cases such as Henry's (loss of presence and loss of identity due to SE overdose), the use of an SE system in the therapeutic procedure seems unwise.
Another factor of critical importance to Henry's mental state concerns the articles he has been reading about research on human-machine interfaces that are designed to tap directly into the human's neural system. Although he fully understands the advantages of such interfaces for individuals with severe sensorimotor disabilities, the idea of them makes him uneasy.
Henry's job as an architect adds an additional important dimension to his mental state. Initially, SE played only a supporting role in his work; it was used merely as a tool for design or as a tool for marketing to
the client. Recently, however, the company for which Henry works received some large contracts to design virtual spaces for use in virtual worlds. Apparently, the large amounts of time now being spent in virtual spaces, combined with the limitations of computer scientists in their abilities to design virtual spaces that are not only functional but also aesthetically pleasing, are leading to a new market for architectural firms. However, this new market is of no interest to Henry; in fact, it increases his desire to switch fields. Unfortunately, when he scans the job opening information available to him on his SE network, he finds that the most common type of opening involving interaction with the real world concerns the installation and maintenance of SE systems.
CURRENT STATE OF THE SE FIELD
Although some of the technologies assumed in our visions of the future are already available and others are the subject of current research, these visions are without doubt visions of the future, not the present. In this section, we briefly depict the current state of the SE field. We begin by describing the application areas that are currently receiving the most attention. We then discuss a number of topics in the field of psychology relevant to the design, use, and evaluation of SE systems and the human component of these systems. Next we summarize the status of the associated technologies that make SEs possible: the interfaces used to link the machine and the human operator, the computer hardware and software used to generate VEs, the telerobots used in teleoperator systems, and the communication networks used to integrate multiple SE systems. The section ends with a brief assessment of the SE evaluation efforts to date. More detailed information on most of these topics can be found in the chapters of the report.
Application Domains of SE Systems
The range of potential applications for SE systems is extremely large. Application domains currently receiving considerable attention include: (1) entertainment, (2) national defense, (3) design, manufacturing, and marketing, (4) medicine and health care, (5) hazardous operations, (6) training, (7) education, (8) information visualization, and (9) telecommunication and teletravel.
The entertainment domain is serving both as a massive informal test bed and as a major economic driving force for the development of new VE technology. Although some of this technology can be expensive (particularly that associated with the entertainment of large groups), on the whole the VE technology associated with the entertainment industry is "low end." For example, the head-mounted displays being used for entertainment
purposes are—as they would have to be to make the enterprise commercially viable—orders of magnitude less expensive (and correspondingly less capable) than those being developed for military purposes. Even though applications in the entertainment domain are still in their infancy, they are by far the most widely implemented of all VE applications. In essentially all of the other domains, the activities are in the stage of research and development rather than commercial application or practical use. Not only is there much to be learned about how best to utilize SE technology, but also the cost-effectiveness of most current SE technology (i.e., the bang for the buck) is inadequate for any application domain other than that of entertainment.
The national defense domain, like the entertainment domain, constitutes both a major test bed and a major driving force for VE technology. It differs in that (with the exception of the use of traditional simulator systems) research and development activities still dominate, the associated technology tends to be high end rather than low end, the systems of interest include teleoperators as well as VEs, and the networking of large numbers of active participants is emphasized.
This report discusses neither the entertainment domain nor the national defense domain in detail. The former domain is omitted because it is already receiving extensive commercial support, many of the scientific and technical research issues that arise in this domain also arise in other domains, and improved entertainment technology does not appear to us as one of society's most pressing needs. The latter domain is omitted because it is currently receiving substantial attention within the government (e.g., Thorpe, 1993), significant information may be classified for security reasons and therefore inaccessible, and, again, many of the research issues that arise in this domain also arise in others. This last reason is especially relevant to the national defense domain because so many of the other domains considered, such as training, information visualization, and telecommunication and teletravel, are directly relevant to national defense.
Finally, although not included here as a formal application domain, VE systems are beginning to be envisioned as highly desirable facilities for research groups concerned with experimental psychology. Clearly, not only is knowledge and understanding of psychological phenomena essential for efficient design and productive use of VE systems, but also a high-quality VE system that makes available a wide variety of precisely controlled stimuli, response measurements, and adaptive testing procedures constitutes an ideal tool for conducting research in experimental psychology.
In the following subsections, we discuss briefly the other application domains listed above. As indicated in these subsections, significant research
and development is taking place in a wide variety of applications and, in a few cases, the results of these efforts are beginning to be applied on at least an experimental basis. It is not yet clear, however, how to choose the tasks that will eventually prove most appropriate for the application of SE technology. Not only are the results obtained in the various application domains still too meager to allow one to specify the nature of such tasks from empirical data, but also there is no evidence that much effort has been given to answering the question ''What is SE technology good for?" theoretically.
Individuals with computer graphics backgrounds usually point to tasks involving three-dimensional spatial information and to immersion in three-dimensional space; those focused on multimodal interactive interfaces often point to tasks that depend strongly on sensorimotor involvement. In any case, in order to fully specify what SE is good for, one must estimate the cost-effectiveness of the envisioned SE system both compared with the way in which the same task is now being performed and compared with alternative new systems that could be developed (e.g., that might achieve equivalent task performance at substantially reduced cost). Eventually, of course, in addition to comparative cost-effective estimates to help select tasks for which SE systems are likely to be appropriate, one must evaluate such systems once they are developed. The important and often neglected topic of SE evaluation is considered further in Chapter 11.
Design, Manufacturing, and Marketing
Design, manufacturing, and marketing are generally recognized as a major application domain for SE technology, and it is currently receiving substantial attention. Although much of the activity in this domain is still in the development phase, it is clearly in the process of moving to actual usage. The procedures and technologies used for design are progressing from those associated with conventional computer-assisted operations to those involving VE and augmented-reality systems. Similarly, we are beginning to see at least experimental use of VE in the marketing of products and services. It appears that it will not be many years before design, manufacturing, and marketing will all take place within a unified system that makes substantial use of SE technology.
Independent of whether the item to be sold is a haircut, a kitchen, or an office building, the ability of the client to see and interact with realistic representations of a variety of possible versions or realizations of the item can positively influence both the evolution of the design and the attitude of the client. Furthermore, when very complex and expensive systems, such as an aircraft or submarine, are being designed, the potential for cost
savings by using virtual mock-ups and prototypes rather than real physical ones is enormous.
Medicine and Health Care
Medicine and health care, like design, manufacturing, and marketing, are considered to be a major SE application domain. Although much of the work is still at the experimental stage, applications of both VE technology and teleoperator technology are being pursued very actively.
In addition to developing improved communication networks for providing the right medical information to the right place at the right time, much of the current research is directed toward improved methods for diagnosis; planning of treatment; provision of information to the patient; provision of treatment; and training of medical personnel. VE systems are being developed and studied experimentally to extend conventional consultations and telediagnosis performed over the telephone to include interactive visual displays of both participants and medical information. Such systems are also being studied for use in planning surgical procedures and in helping to increase patients' awareness and understanding of these procedures and of the possible outcomes. Augmented-reality systems are being studied to present visual displays in which information previously obtained from special imaging techniques is overlaid on the normal direct view of the patient; integrated VE and teleoperator systems are being developed for use in telediagnosis and telesurgery and for the training of surgeons. In general, the potential benefits of telemedicine that are being considered include not only the ability to obtain medical information and perform medical actions at a distance, but also the ability, as in any other application of teleoperation, to effectively transform the sensorimotor system of the operator to better match the task at hand. The rapidly increasing use of laparoscopic surgical procedures illustrates the importance of these other benefits.
Aside from the efforts required to realize technology that is adequate for the various medical applications, substantial research is being initiated to realize adequate physically based models of the human body (e.g., for VE training of surgeons). However, current success in creating virtual human skeletons, organs, and physiological subsystems constitutes only a tiny fraction of what needs to be achieved over the long term.
Additional health-related research and development activities in the SE area are taking place in connection with physical rehabilitation. Individuals with sensory or motor disabilities constitute a uniquely challenging domain for application of SE systems with specially designed human-machine interfaces (e.g., gestural tracking and recognition devices for individuals who have lost both the ability to articulate speech and the
manual dexterity required to operate a keyboard). The application of SE technology to psychological rehabilitation (for example, to reduce phobic reactions) is also beginning to receive attention.
One of the driving forces for the creation and development of teleoperator systems has been the need to perform operations that are hazardous, and the application of SE systems to this domain is certainly one of the older applications areas considered. Thus, unlike the situation in some of the other domains, current activities in this domain include actual use as well as research and development. Among the specific applications in this domain that are receiving attention are the handling of dangerous materials, operating heavy machinery, firefighting and policing, conducting military operations, and exploring the ocean floor and outer space.
Despite the potential benefits that can be obtained by using teleoperator systems in many of these areas, and despite the benefits that have already been demonstrated in some of them (e.g., handling nuclear materials, undersea exploration) neither the government nor the public has evidenced great enthusiasm about this domain. Aside from the general lack of excitement engendered by visions of teleoperation compared with visions of virtual reality, perhaps interest in the use of teleoperator systems for hazardous operations is limited by the lack of personal experience most people have with hazardous operations (e.g., defusing a bomb or locating and carrying to safety a child from a burning building). It is even conceivable that the use of teleoperation for hazardous operations may lack support from potential operators because it is inconsistent with a macho self-image.
Although for many applications in this area further research and development is required to achieve teleoperator systems that are both reliable and cost-effective, there is no evidence that such goals cannot be achieved. Also, and quite apart from the use of teleoperator systems in conducting hazardous operations, substantial opportunity for the application of SE systems in this area arises in connection with the training of individuals to conduct hazardous operations (with or without the use of teleoperation). As discussed in the next subsection, the use of VE systems for training constitutes a major application domain for such systems.
Because most activities require at least some training, it is not surprising that the use of VE technology for training is a major application area
in almost all domains considered. Thus, for example, it is of major interest for national defense, medicine and health care, and hazardous operations, among others.
On one hand, the use of simulators for training is quite extensive. Simulations of various types have been used for a long time and, judging by their continued use, are relatively cost-effective (although appropriate analyses have rarely been performed). The apparently successful results obtained with simulators in training various tasks (e.g., flying an airplane) constitute a major motivation for interest in the exploitation of VE technology for training.
On the other hand, the extent to which current VE technology is actually being used in the training area is very limited: essentially all current work on VE training is at the stage of research and development. Given the existing background in the use of simulation for training, it is clear that one of the factors responsible for this situation is the inadequacy of the currently available VE technology. However, that is not the only problem; others relate to our inadequate understanding of basic psychological issues related to training and training transfer. The flexibility inherent in the use of VE systems for training and, in particular, the opportunity to create learning situations that are superior to those that are realizable without such systems (e.g., by the use of special multisensory instructional cues, by purposefully distorting the real situation, by providing multiple viewpoints and various levels of abstraction, and by adapting the system automatically to the individual and the individual's state of training) seriously challenge our basic understanding of the learning process. Of particular concern is the issue of training transfer. Much remains to be known about which of the possible differences between the real task and the task as realized in the envisioned VE training systems are likely to be important, either positively or negatively, and which are insignificant.
Although the term education can be used very broadly to cover almost any situation in which learning takes place, in this report we use the term to refer to the goals and activities normally associated with K-12 education in schools.
One major set of applications currently being explored in this domain focuses on the communication component of SE networks. Examples include communication between students, between teachers, and between students and teachers at different sites; televisits to places of interest that would normally involve costly travel (to explore another culture, to learn a foreign language, to visit a site in outer space or under the ocean); and
even teleoperation of remote telerobots. Other applications focus more on the use of VEs as immersive, interactive, experimental, and play facilities. At one extreme, a VE can be used to present a well-defined situation in a highly structured course. At another extreme, a VE system can be used to encourage free play and various types of model building, or even the construction of virtual environment tools.
As in the training domain, much of the current work in the education domain is being directed toward research to determine the ways in which technology can be usefully applied.
Despite the potential of SE technology to provide cost-effective improvements in K-12 education, many people judge societal infrastructure problems in education to be so overwhelming that attempts to exploit SE technology within the current education system would have only marginal benefits. The history of attempts to introduce computers into the classroom is cited as an example of useful technology being available but not well used. In general it is believed that, unless the infrastructure surrounding the education system is radically changed, the best opportunity for using SE technology to help educate children is likely to occur through the entertainment industry and the entertainment facilities that will be available in many homes (leading to a new meaning for the phrase dual use that is now being so frequently used in government circles). It is also conceivable, of course, that SE technology, together with associated networking features, can play an important role in helping to change the infrastructure.
The dependence of our culture on information and the amount of information that one needs to perceive, digest, understand, and act on are steadily increasing. In attempts to prevent information overload or, alternatively, to prevent ignoring information that is vital to action, research is being conducted to determine methods of information visualization that are superior to those now used. (The term visualization is used here in its most general sense and only for historical purposes; we do not mean to imply that the information is necessarily presented only through the visual channel.) This application domain, like the training domain, cuts across the other application domains considered: effective visualization of information is important in essentially all domains.
In general, the problem of information visualization is an extremely old one. Inventive pictorial representations of important events go back to cave paintings. Descartes' invention of analytic geometry and the associated use of graphs to represent tables of numbers constitutes a truly major advance in this area. Less dramatic, but more technologically relevant
advances have taken place in the area of computer graphics. The extension of two-dimensional to three-dimensional graphics and controlled manipulation of scale and viewpoint are illustrative examples of these advances. Unfortunately, there is relatively little guiding theory available, and relatively little systematic evaluation has been performed to determine the benefits of these new graphics techniques (although they are clearly commercially successful). Furthermore, the use of modalities other than vision and exploration of the benefits of different kinds of sensorimotor involvement in understanding information are only now beginning to be seriously considered. Perhaps the most advanced application area in this domain concerns the visualization of scientific information. Scientific visualization is generally recognized as a major and growing application of VE and is starting to receive substantial attention. Specialists in various fields of science are beginning to make use of advanced computer graphics techniques for improved visualization, and preliminary research and development are being conducted on the use of the auditory and haptic channels for this purpose.
Telecommunication and Teletravel
The domain of telecommunication and teletravel, like training and information visualization, cuts across essentially all of the other domains considered. Telecommunication, which is intended to include teleconferencing as a special case, enables two or more people at disparate locations to interact in any manner permitted by the technology and chosen by the participants. Although sophisticated telecommunication systems involving multiple participants and real-time video as well as audio have been envisioned for many years, their use is still primarily experimental. Currently, only phone, facsimile transmission, and electronic mail are being extensively used by large groups of people. The incorporation of a real-time haptic channel into the telecommunications network by means of which individuals can communicate tactually is only beginning to be considered. Clearly, the ability to hold conferences with people at different locations in a manner that closely approximates real conferences (i.e., that permits one to see all the participants interacting together in a room, to focus one's attention arbitrarily on any member of the group, to direct remarks to specific individuals in the group by directing one's gaze toward that individual, etc.) will require substantial advances in the application of SE ideas to the telecommunications domain.
Teletravel, which would allow individuals to effectively visit remote locations for purposes of work or pleasure, also has not advanced very far; only the phone and noninteractive video are commonly available. The use of teleoperation to facilitate active exploration and work at distant
locations still appears to be confined to the domain of hazardous operations and to experimental work in the domain of health care. Little consideration is being given to the use of teleoperation to enable individuals who are homebound (e.g., because of physical disabilities or because they are incarcerated) to work at jobs that are located elsewhere and that require substantial interactive sensing and manipulation.
Finally, advances in the technology for telecommunications and teletravel are being accompanied by new ventures in the sex-for-profit business. Unless appropriate societal strictures are imposed, it appears that interactive live audio-video will be heavily exploited for this purpose within the next 2 to 3 years, with equivalent exploitation involving the tactual channel occurring within the next 5 to 10 years.
Some Psychological Considerations
By definition, human operators constitute a major component of all SE systems. Furthermore, the range of experiences to which the operator is subjected in these systems can be extremely broad. Thus, there are very few topics concerning human behavior (sensorimotor performance, perception, cognition, etc.) that are not relevant to the design, use, and evaluation of SE systems. A number of the modality-specific topics in this general area are discussed in the section below in connection with human-machine interfaces; some of the more general ones are considered here.
One set of such topics focuses on human performance characteristics and includes, for example, sensorimotor resolution, perceptual illusions, information transfer rates, and manual tracking. Knowledge about all of these topics is essential to cost-effective design of SE systems. For example, the limits on human sensory resolution place an upper bound on the resolution required in sensory displays. Similarly, unintentional variability (noise) in motor responses puts an upper bound on the resolution required in control devices. Both sensory input and motor output characteristics are included in human operator models used to interpret performance in various types of manual tracking tasks. And information transfer rates help characterize the operator's ability to receive information via displays, process information centrally, and transmit information via controls. Perceptual illusions can be used to simplify (and thereby reduce the cost of) stimulus generation procedures. If not thought about in advance, however, they can also lead to unexpected failures in performance.
A second set of such factors arises in connection with the alterations in sensorimotor loops that occur when a human operator "drives" an SE system, and the extent to which and the manner in which the human operator adapts to such alterations as a function of his or her experience
with the alteration. Consider, for example, the case in which the operator is viewing a virtual image of his or her hand as his or her real hand moves through space. The alteration may be caused by defects in the technology and can involve a spatial discrepancy between the position of the seen hand and the position of the kinesthetically sensed (felt) hand, a time delay between the felt motion and the seen motion, or a statistical decorrelation between the seen and felt hand positions (e.g., due to noise in the visual display channel). Similar examples are often encountered in the sensorimotor loop involving the visual (or auditory) scene presented by a head-mounted display and the kinesthetically felt orientation of the viewer's (listener's) head being tracked by an appropriate sensor mounted on the head to control the displayed scene.
Although alterations involving time delays or noise generally have negative effects and are therefore to be avoided in designing and constructing SE systems, alterations involving fixed transformations or distortions may be introduced intentionally to enhance performance. For example, such conditions arise automatically in any teleoperator system that employs a nonanthropomorphic telerobot (e.g., a telerobot that has four eyes, six arms, and moves about on wheels). Because the telerobot and human operator in such cases are nonisomorphic, a special (unnatural) mapping must be employed to relate the sensing and motor actions performed by the telerobot to the sensing and motor actions performed by the human operator. Similar issues arise when sensory substitution is used (e.g., auditory signals are used to represent force feedback) or when VE interfaces are designed to achieve supernormal resolution by magnification (e.g., by simulating increased distance between the eyes to achieve improved visual depth perception or increased distance between the ears to achieve improved auditory localization).
Additional significant issues that arise in connection with such alterations concern the role these alterations play in eliciting the sopite syndrome (chronic fatigue, lethargy, drowsiness, nausea, etc.) and reducing the subjective sense of telepresence, and the extent to which subjects can adapt to such alterations. Unfortunately, there are as yet no adequate theoretical models for enabling SE system designers to predict how subjects adapt to such alterations as a function of the alteration and the kinds and magnitudes of exposure to the alteration. Issues related to interactive effects among multiple alterations (e.g., a distortion plus a time delay plus some jitter in time or space) have hardly begun to be considered.
A third set of topics of major importance in this general area concerns the development of appropriate cognitive models. In the design of systems, like VE systems, in which the goal is to alter the human operator, understanding the human mental processes involved in knowledge acquisition and knowledge organization, and the application of knowledge
to tasks, such as concept understanding, problem solving, decision making, and skill mastery, is critical. Of particular importance is the organization, sequence, amount, and pace of information presented. If the information characteristics of the system correspond to the cognitive processing features of the individual, then the system will be more effective in facilitating training or education and enhancing task performance.
Cognitive scientists have been working for many years to describe the processes used by humans to acquire and build knowledge structures. This work has led to a variety of hypotheses and some research results suggesting that knowledge acquisition strategies and features of effective information presentation depend on a person's level of knowledge, the characteristics of the content area, and the type of task performance required.
Although significant progress has been made and strong research efforts by cognitive scientists will undoubtedly continue, a large number of questions remain about the compatibility between types of tasks and preferred features of information presentation. For example, should novices be presented with different visualizations of scientific data than experts? If so, what features should be different? What is the relationship between image fidelity, amount of information presented, and knowledge or skill acquisition for different types of tasks? And how do we facilitate transfer from a training task to an operational task? These questions are not new, but efforts to date have only begun to develop preliminary answers. In general, we do not yet understand the relationship between information presentation in an immersive SE environment and learning and performance by the SE user.
A fourth set of topics in this area concern what might be called cognitive side effects. Quite apart from how various features of the SE influence the learning and performance exhibited by the user with respect to the specific task of interest, it is important to know how these features influence other aspects of the user's cognitive structure and behavior. For example, might not extensive use of certain kinds of SE systems significantly decrease the user's sense of presence in his or her usual environment or alter the mental model held of his or her own body? At present, experience with truly immersive SE systems is too limited to provide reliable answers to such questions. Another set of questions involves how experiences in SEs might affect an individual's attitudes toward such social behaviors as violence, sex, and fantasy role playing. There is at least some anecdotal evidence of a connection between aggressive behavior in children and playing violent video games. In addition, there have been several cases of individuals who were reportedly so completely drawn into computer role-playing games that they devoted all their time to them. It is possible that experience in immersive SEs could have even
greater impact; however, the current experience with SE systems is too limited to allow one to draw any conclusions in this area.
Current State of SE Technology
Generally speaking, at the time of this writing, a substantial gap exists between the SE technology that is commercially available and the SE technology that is needed to realize the potential envisioned in the various application domains. Even the demonstrations of what are considered advanced SE research systems that can be seen at various universities, military installations, and industrial laboratories sometimes leave technically sophisticated observers who have no vested interest in the technology unimpressed.
There are, of course, important exceptions to this relatively negative assessment. Current VE technology is certainly adequate to be used for some applications in the entertainment domain. Similarly, current teleoperator technology is adequate to be used for some applications in the hazardous operations domain. Also, newly developed SE technology of various types (including that associated with augmented reality) is beginning to be applied in the domains of design, manufacturing, and marketing; medicine and health care; and information visualization. Nevertheless, it is clear that the development of significantly improved technology is a major requirement for most SE applications to be truly successful.
In the following sections, we summarize the current state of technology in the areas of human-machine interfaces, computer generation of VEs, telerobotics, and networks. As in the body of the report, the material on human-machine interfaces and networks has been separated from the material on computer generation of VEs and telerobotics because it is generally applicable to both VE and teleoperator systems. Furthermore, the section on human-machine interfaces covers the visual, auditory, and haptic channels, whereas the section on the computer generation of VEs covers the visual channel only. This asymmetry arises because the over-whelming majority of previous work on computer generation of VEs has been restricted to the visual channel. As the field matures and the computer science community becomes more involved in the generation of auditory and haptic images as well as visual images and the community concerned with the auditory and haptic channels becomes more involved with computer synthesis of environmental signals and objects, this imbalance will become less severe.
The human-machine interface in SE systems consists of all devices used to present information to the human operator and to sense the actions and
responses of the human operator that control the machine in question. Although the problems associated with human-machine interfaces differ to some extent according to whether the system is a VE system or a teleoperator system (i.e., whether the given machine is a computer or telerobot), there is clearly substantial overlap between these two sets of problems. In the following sections, we briefly discuss interface issues for the visual channel, the auditory channel, and the haptic channel. In addition, we consider position tracking and mapping, motion interfaces, speech communication, physiological responses, and a display system that presents information by means of odors and radiant heat.
Visual Channel Of all the devices associated with the human-machine interface component of VE systems, visual displays have received the greatest attention. In addition to continuing efforts directed toward the general use of displays (home television, scientific research, etc.), substantial efforts have been directed toward the development of visual displays specifically for SE systems.
The visual displays currently available for SE use include both head-mounted displays (HMDs) and off-head displays (OHDs). HMDs, which also include devices for presenting auditory signals (earphones) and for measuring the position and orientation of the head (head trackers), would be ideally suited to the SE field; however, they still suffer from information loss (poor resolution, limited field of view) as well as a variety of ergonomic problems, including excessive weight and poor fit (both mechanically and optically). They may also cause the user to experience the sopite syndrome.
In addition to HMDs in which all of the visual images are computer-generated, HMDs are being developed in which the computer-generated images are combined with directly sensed environmental images (see-through displays) or with environmental images sensed and possibly transformed by a telerobotic optical sensing system. To date such augmented-reality systems have not been much in demand by the entertainment industry and therefore remain largely in the research domain.
Low-end HMDs, the development of which has been driven mainly by the entertainment industry, can be obtained for less than $10,000; high-end HMDs, the development of which has been supported mainly by the military, can cost as much as $1 million. Although the HMD area has been and continues to be extremely active, and although there is a wide range of HMDs now available and a substantial number of research projects exploring new technologies for use in HMDs, all of the HMDs now available have major drawbacks. In fact, given the current drawbacks, we think it is extremely unlikely that any individual would choose to wear an HMD on a regular basis (e.g., 40 hr/week) without special incentives. At present, and for some years to come, OHDs will provide
significantly better performance than HMDs for most tasks in most application domains. One intermediate technology currently available is lightweight stereographic glasses and desktop stereo display screens. Another includes a display mounted on a boom that can be moved about (its position and orientation appropriately tracked) manually. Many other types of OHDs, some of which involve tracking and some of which do not, are discussed in the chapter on visual displays.
To a large extent, the design of the visual displays being used in the SE field takes little account of the dual structures found in the visual system (foveal versus peripheral vision, focal versus ambient systems, etc.). Some research has been done on the use of a high-resolution inset and the tracking of gaze direction to locate the inset within the field so that it is always at the right place for stimulating the fovea. However, the design and use of such systems, which reduce the computational requirements associated with presenting continuously varying complex scenes at the cost of the complexities associated with continuous eye tracking, are still mainly in the experimentation phase.
An important set of issues concerning perceptual effects in the visual channel that are only now beginning to be addressed concern the above-mentioned augmented-reality displays. Not only is relatively little known about the detailed perceptual effects of misregistration (misalignment) of visual images, but there are few guidelines available to help designers choose how to merge real and synthesized information (a problem with a strong cognitive component) even when there is no detailed registration problem.
Many of the perceptual issues that have important implications for the design of technology for the visual channel, and about which current knowledge is inadequate, concern how humans respond to various types of sensorimotor alterations associated with the visual display. As mentioned earlier, such alterations can result from the intentional introduction of distortions to achieve superior performance (e.g., simulating greater interocular distance to achieve improved depth perception, using a telerobot with a nonanthropomorphic optical sensory system), from unintentional optical distortions in an HMD, or from time delays and noise generated somewhere within the visual channel. Also, we still know relatively little about how various characteristics of the visual display influence performance on various types of tasks. Although some deficiencies, like those that are likely to induce the sopite syndrome, may be more or less task independent, other deficiencies are likely to depend strongly on the task.
Auditory Channel Unlike the situation for the visual channel, currently available hardware for the auditory component of HMDs is adequate for
essentially all SE applications. Earphones to present desired signals and ''ear defenders" (active or passive) to attenuate unwanted signals from the immediate real environment are effective, inexpensive, and ergonomically reasonable. (The main ergonomic problem occurs when earphones and visual displays have to be used together and the design of the helmet that includes the visual display does not take proper account of the need to also stimulate the auditory channel.) Although considered much less frequently in the context of SE systems, roughly the same conclusions apply to off-head displays and hear-through displays. Loudspeaker technology is sufficiently advanced to provide effective off-head displays in various types of spaces, and hear-through displays can easily be achieved by the use of earphones with controlled acoustic leakage or by placing microphones in the environment and adding the synthetic signals and the environmental signals electronically. Although earphones are preferred to loudspeakers for most applications, in some areas, such as those in which the system is intended to simulate battlefield sounds with sufficient energy to shake portions of the body other than the eardrums, loudspeakers are clearly preferred.
The main current inadequacies associated with the use of the auditory channel in VE systems concern the synthesis of the signals to be presented via the interface. One component of this problem concerns the spatialization of sounds. Despite extensive recent work on spatialization using earphones, the results are far from perfect. Particularly noteworthy is the inability of most current HMD spatialization systems to cause sound sources to be perceived as located in front of the listener (as opposed to behind the head, above the head, or inside the head). Similarly, although the audio industry has devoted a great deal of attention to issues of spatialization for loudspeaker systems, the results are still overly sensitive to the precise location of the listener's head; relatively small movements away from the designated "sweet spot" can cause serious degradation of spatialization. The character of the spatialization response depends on both technology factors and perceptual factors.
A second set of inadequacies involves the generation of acoustic signals. Record and playback (sampling) methods suffer from the need to store an enormous number of sounds or be satisfied with crude approximations of the desired sounds. Reasonably satisfactory sound synthesis methods have been developed only for speech and music; they do not yet exist for the generation of environmental sounds. Also, if the sounds are generated in real time rather than ahead of time and then stored, the process may consume substantial portions of the system's computing power.
Relevant perceptual issues that are being studied include those related to the spatialization of sounds and issues similar to those already
discussed in connection with the visual channel and in the section on human responses to alterations in sensorimotor loops (e.g., concerning human responses to distortions, time delays, and noise within the channel). A further set of issues that are beginning to receive attention involves the use of the auditory system for sensory substitution. Such substitution is being considered both when the visual channel is overloaded and when appropriate haptic feedback is unavailable. Another set of issues being studied, which is highly relevant to synthetic auditory displays (independent of whether they are being used for sensory substitution), concerns the manner in which the auditory system organizes temporal sequences of acoustic signals into a coherent perception of the auditory scene (auditory scene analysis). Understanding scene analysis in the auditory system is quite different from understanding that of the visual system, because in the auditory system there is no peripheral representation of source location (i.e., most information on source location is derived by comparing the signals received at the two ears, a process that involves central processing).
Position Tracking and Mapping By position tracking is meant the real-time measurement of the pose (defined as the three-dimensional position and three-dimensional orientation) of a moving object. Position tracking is required in VEs to control computer-generated stimuli and in teleoperators to control the behavior of the telerobot. In many applications, position tracking of the head, the hand, or the fingers is crucial. In other applications, position tracking of the eyes, the torso, and the arms and legs may also be required. In such applications, partial pose measurement may be sufficient—for example, three-dimensional orientation just for head tracking or three-dimensional position just for hand tracking.
By position mapping is meant the determination of a surface, such as that of the body or of the environment, by measuring a dense set of three-dimensional positions on that surface. Position mapping is required for determining bodily dimensions, for recognizing facial expressions, and for environment mapping to create a geometrical model for simulation. In applications such as environment mapping, real-time requirements may be absent. When position mapping is used for body tracking, however, real-time constraints must be met. Constraints on position mapping are likely to be exceptionally severe when augmented-reality applications are considered and registration issues arise.
Knowledge of the values that can be assumed by the various motion parameters (e.g., velocity, acceleration, bandwidth) for different kinds of bodily actions is reasonably adequate for purposes of tracker design. Less adequate is our knowledge of how various deficiencies in tracker technology degrade performance or contribute to the sopite syndrome in various types of tasks.
Currently, there are four basic technologies for position tracking and mapping in SE work: mechanical linkages, magnetic sensors, optical sensors, and acoustic sensors. SE systems are likely to include a mix of such systems, because each type of system has particular strengths and weaknesses and the requirements depend on the particular application. Although none of the inertial trackers currently available is adequate for SE applications, research is now under way to develop such trackers. As for many other kinds of devices, commercial specifications of position trackers and mappers are not reliable or consistent.
Mechanical trackers are relatively inexpensive, have very small intrinsic latencies, and can be reasonably accurate. Yet body-based linkage devices (called goniometers) may be cumbersome, whereas ground-based linkage devices (e.g., hand controllers) suffer from workspace limitations. The use of goniometers involves problems of fit and measurement related to alignment with joints, rigidity of attachments, calibration of linkages mounted on human limbs, and variations among individuals. The use of ground-based linkage devices involves the difficulty of tracking multiple limb segments and limb redundancies. Hybrid systems, in which body-based and ground-based devices are combined, are also likely to be required for some applications (e.g., both to track finger motion and to provide force feedback to the hand without causing forces to be applied to other portions of the body).
Magnetic trackers are commonly used because of their convenience, low cost, reasonable accuracy, and lack of obscuration problems. Significant current disadvantages that limit their usefulness include modest accuracy, short range, high latency (20-30 ms), and susceptibility to magnetic interference.
Optical sensing is one of the most convenient methods to use for certain kinds of tracking and is capable of providing accuracies and sampling rates that meet many VE requirements. The main drawbacks include visibility constraints and especially high costs.
Acoustic trackers are very attractive for VE because the costs are relatively modest and the accuracies and sampling rates are often sufficient. Efforts are being made to improve accuracies by taking into account atmospheric effects and by using echo rejection.
Inertial trackers, despite having played a distinguished role in the field of long-range navigation, have received little attention in the SE area. Their unique advantage is that they are unconstrained by range limitations, interference, and obscuration; also, latencies are low. Further reductions in sensor size and cost are needed to make inertial trackers a convenient and economical alternative to other trackers.
An ideal eye tracker would satisfy three requirements: linear response over a large range (roughly 50 deg), high bandwidth (1 kHz), and
tolerance to relative motion of the head. Although many eye-tracking devices are available, none of them satisfies all three requirements.
Haptic Channel The haptic channel differs from the visual and auditory channels in two major ways. First, it involves manipulation as well as sensing. Second, it has received less attention than the other two channels with respect to both basic science and device development. One reason for the relatively backward state of the haptics field appears to be the intrinsic difficulties in studying haptics associated with the complexity of combining sensory functions with manipulative functions and with the use of electromechanical systems (for example, the control and measurement of the effective stimulus in haptics has always been difficult). Another reason is the lack of a recognized societal need to develop the haptics field. Whereas research and development related to the visual and auditory channels has been strongly driven by both medical needs and entertainment considerations, haptics-related research and development has had no such support. Until very recently, the main support for research and development on haptic interfaces has come from the field of telerobotics and, as indicated previously, this field has had limited support. It should be noted, however, that the introduction of the relatively simple haptic interfaces known as mice is having a major effect on the interest in such interfaces by computer scientists as well as those concerned primarily with human-machine interface issues.
The most useful currently available haptic interfaces fall into one of two categories: (1) body-based gloves or exoskeletons that track the position and posture of the hand (as discussed in position tracking) and (2) ground-based devices, such as "joysticks," that both sense certain actions of the hand and provide force feedback. Whereas many of the latter devices have been developed in connection with human-machine interfaces for teleoperators and have a relatively long history, many of the former devices have been developed recently with VE systems in mind. Although exoskeletons that provide force feedback have been developed for research and development purposes, they tend to be both cumbersome and expensive and are not in widespread use even experimentally. At present, one of the main experimental thrusts is directed toward the development of tool-handle systems, in which the human user manipulates a real tool handle, the actions of the tool handle are used to control some feature of a telerobot or a VE, and force feedback is displayed through the tool handle according to interactions of the telerobot with the real environment or of the virtual tool with virtual objects in the VE.
Relatively little work, even at the research and development level, has been directed toward haptic interface devices that provide feedback (related to perception of texture or temperature) of kinds other than
simple force feedback, and no such device is yet available commercially. To convey more detailed information through the skin, tactile displays of the type used to convey visual or auditory information to individuals who cannot see or hear need to be considered. Although a variety of such displays has been developed (e.g., involving vibratory or electrocutaneous arrays), none has been successfully incorporated into SE systems.
Current research associated with the development of haptic interfaces involves investigation of human haptics, development of technology, and optimizing the interactions between the two. Basic research on human haptics includes biomechanical studies of the hand and psychophysical studies of the sensorimotor and cognitive systems associated with the hand. Illustrative issues of particular concern in this research include determination of the mechanical properties of the soft tissues in contact with haptic interfaces; quantification of limits on human sensing and control of contact forces and hand displacements; identification of stimulus cues in the perception of contact conditions and object properties; and characterization of human sensorimotor performance in the presence of time delays, distortions, and noise. Basic research on technology includes development of novel technologies for sensor and actuator hardware; design of computer architectures for fast computation of physical models; and development of algorithms for real-time control of devices that render tactual images.
Among the more applied topics that are beginning to be studied are those that are related to the inclusion of tactual images in multimodal VEs. These include the design of high-performance haptic interfaces with appropriate sensors, actuators, linkages, and control as well as their evaluation in uni- or multimodal VEs. The interfaces may be ground- or body-based and may or may not include force reflection or tactile displays. Avoidance of mechanical instabilities and false cues in contact tasks requires capabilities that are at the limits of current technology in the areas of range, resolution, and bandwidth of forces and displacements. Evaluation of the effectiveness of the interfaces is critical to the design of improved versions. Systematic studies to evaluate the human user's comfort in operating the devices and to investigate multimodal display methods for achieving optimal task performance and telepresence (immersion) are barely at the planning stages at this point.
Motion Interfaces In the real world, many kinds of motion occur, including whole-body passive motion (passive transport), whole-body active motion (locomotion), and part-body active and passive motion (e.g., when an arm is moved passively or actively). Also, in many cases such motion is accompanied by a wide variety of stimuli in a wide variety of sensory channels: motion cues may be contained in signals from the vestibular
system, the motor system, the visual and auditory systems, and the proprioceptive/kinesthetic and tactile systems. It is no great surprise, therefore, that, in addition to the existence of many types of motion and many types of motion cues, there are many ways in which motion can be simulated and many types of motion interfaces.
Currently, motion interfaces for passive transport are being used primarily in flight simulation for flight training, in the entertainment industry for "thrill rides," and in research projects directed toward improved understanding of human perception and performance (including motion sickness) in a wide variety of contexts involving real or simulated passive transport.
Motion interfaces for passive transport can be divided into two categories: inertial displays, in which body mass is actually moved, and noninertial displays, in which motion is simulated without moving body mass but by stimulating various sensory channels in appropriate ways. Often, inertial and noninertial techniques are used in combination.
With inertial displays, patterns of force vectors are applied to the body that approximate to varying degrees of completeness and accuracy the patterns that would be present in the real situation being simulated. Such displays are generated by the use of centrifuges, rotating rooms, motion bases, tilting platforms, spinning chairs, etc. In some sense, G-seats, in which the seat and back can be inflated or deflated as well as vibrated, also fall into this category.
With noninertial displays, the body remains stationary, but patterns of stimuli are presented that are usually associated with movement of the body through the environment. The most obvious example in this category, one that has been studied in some detail and frequently applied, involves altering the visual scene in a manner that corresponds to the changes that would occur if the body moved through the given environment (e.g., in a car or plane). Similar results can be achieved using the auditory channel; however, auditory-induced self-movement has been studied less and applied less and, as one would expect because of the poor resolution, the results appear to be less dramatic. Other techniques for inducing the perception of self-motion include stimulating the vestibular system by changing temperature (caloric irrigation) or applying electrical currents (galvanic stimulation); stimulating the cutaneous system by sliding surfaces over the skin (e.g., beneath the soles of the feet) or by stimulating muscle spindles by vibrating muscle tendons. However, none of these techniques, with the possible exception of the one involving cutaneous stimulation, appears to be practical for use in SE applications.
Motion interfaces for active transport (i.e., locomotion) permit the user to experience active sensations of walking, running, climbing, etc., while remaining within a constrained volume of space. The best known
and most widely used interfaces of this type are the common linear, one-dimensional treadmill and the stair-climbing exercise machine. The ideal system would incorporate a six-degrees-of-freedom platform (shoe) for each foot with position and force sensors and force feedback. Although a number of systems that are considered more advanced than the common treadmills and exercise machines are beginning to be developed and evaluated by research groups, no such systems are yet available commercially.
Current research in this general area, apart from work associated with attempts to develop improved technology, is focused on evaluating the different methods of movement simulation, with respect to both achieving the desired sensation of movement and minimizing the extent to which the simulation results in motion sickness (or, more generally, the sopite syndrome).
Other Types of Interfaces The above discussion by no means covers all the interface communication channels of interest. For example, nothing has been said about the olfactory (smell) or gustatory (taste) channels, or about interfaces related to the sensing of heat, wind, and humidity. Apart from influencing the general sense of presence and immersion in various types of environments, such sensations might be of specific use for conveying core information in some major application areas—for example, olfactory information in training firefighters or medical personnel concerned with low-tech diagnostic procedures. Furthermore, the technological problems in creating an olfactory interface do not appear overwhelming. Not only have odor-releasing systems for use in theaters and odor (scratch) records for novelty home use been available for many years, but also significant current research is now being conducted in this area.
Perhaps the two most important methods for interfacing that have not yet been discussed concern speech communication (automatic speech recognition and speech synthesis) and direct physiological sensing and control.
Although the discrete nature of speech makes it less appropriate for conveying information that is represented by continuous variables than by discrete variables, it clearly is one of the most natural methods for humans to use in communicating with another entity, human or machine. Previous research on automatic speech recognition and speech synthesis, which has been driven by needs outside the SE area, has produced a variety of systems that are now available commercially and that can be usefully applied in the SE field. Current speech recognition systems differ from the ideal in that they have limited capacity to handle large vocabularies, different voices, continuous speech (as opposed to isolated words or phrases), interference produced by background noise, and degraded
speech production. Nevertheless, reasonably high accuracies (e.g., 95 percent correct word identification) can be obtained for task-specific applications in which modest demands are made along the just-mentioned dimensions of difficulty. Current speech synthesis systems appear relatively adequate in their ability to produce speech that is highly intelligible (and in comparison to synthesis systems for environmental sounds); however, they are only now beginning to produce speech that sounds reasonably natural or that mimics the idiosyncratic speech patterns of individual talkers. In general, however, there is no question that speech communication interfaces are now becoming available that can be usefully integrated into SE systems for a variety of practical applications, and that the overall quality of such interfaces will continue to improve over the next few years (with or without help from the SE field).
Physiological interfaces (direct stimulation of neural systems or sensing of physiological responses or states of the human organism) have received very little attention in the SE field. Direct stimulation of neural systems seems relatively inappropriate except in cases in which the subject is disabled by loss of sensory function. Even in these cases, however, the appropriate transducers are likely to become part of the subject, (i.e., to be implanted on a more or less permanent basis, as in the case of cochlear implants to mitigate loss of hearing) so that no special requirements are imposed on the SE interface. The use of internally generated physiological signals associated with activities of the brain, muscles, circulatory system, respiratory system, etc. can be used, at least in principle, to indicate general emotional and cognitive human states and to control specific variables in the SE system. Although significant research is being conducted in this area in an effort to determine and improve the reliability of such signals for purposes of control (by the military as well as by those concerned with aiding individuals who have severe motor disabilities), extensive practical use of such signals for SE control purposes appears several years away.
Computer Generation of VEs
To many people, and certainly to most computer scientists, computer generation of VEs is the core of the SE field; to them, human-machine interfaces, telerobots, and even the human operators, are of secondary importance. Furthermore, most past and current work in this area has focused on the generation of visual images. Apart from individuals who are themselves involved in the development or use of teleoperators, when people think of SE systems they tend to think of interactive, computer-generated visual images. Except for speech synthesis, which has been developed primarily by speech scientists rather than computer scientists,
the generation of auditory and haptic images has been ignored. Accordingly, our discussion in this section is focused primarily on the visual channel. Information related to the computer generation of the auditory or haptic components of VEs is found primarily in discussions of human-machine interfaces.
It is possible to imagine a VE system that can create photorealistic images, that can be fully interactive in real time, and that has graphics, computation, and communication capabilities to handle all possible environments of interest with equal ease—that is, a general-purpose system that can generate environments relevant to manufacturing, health care, military training, etc. Such a system is beyond the current technology, and it is anticipated that for a long time to come, trade-offs between realism and interactive capacity will be required. Furthermore, due to these limitations, effective VE implementation will depend on targeted application domains. Some applications, such as architectural visualization, may require photorealistic rendering, whereas others, such as training, may not. Many manufacturing and medical applications may require a much higher level of real-time interaction than an architectural walkthrough. Although there are many applications in which a realistic visual environment is unnecessary (and maybe even undesirable), the ability to generate such an environment is clearly an important target for the development of VE technology.
One requirement for creating a realistic visual environment concerns the frame rate, that is, the number of still images that must be presented per second to provide the illusion of continuous motion. It has been demonstrated that frame rates must be greater than 8 to 10 per s to maintain this illusion. A second requirement concerns the response time a VE system must exhibit to preserve an illusion of instantaneous interactive control. Research shows that such delays must be less than 0.1 s. A third requirement concerns the picture resolution needed for realism. According to some VE technologists, a scene can be rendered in all of the detail resolvable to the human eye with 80 million polygons (Catmull et al., 1984). However, using today's hardware, a system that used 80 million polygons per picture would be far too slow to be truly interactive—thus the current major trade-off between realistic images and realistic interactivity. These requirements, of course, can be highly application dependent. Applications with rapidly moving objects may require significantly higher frame rates and shorter delays, whereas highly abstract or stylized applications may require fewer polygons or lower resolution.
Hardware Maintaining an adequate graphic frame rate is so computationally demanding that special-purpose hardware is often necessary. The main purpose of this graphics hardware is to provide rapid geometric
transformations, clipping, hidden surface elimination, polygon fill, and surface texturing.
Several of today's leading graphic workstations are RealityEngine2 (produced by Silicon Graphics), Pixel Planes 5 and PixelFlow (developed at the University of North Carolina), and the Evans & Sutherland Freedom systems. All of these systems run on parallel architectures (in which graphic rendering operations occur on parallel paths); however, they differ on a variety of characteristics, including frame rate, processing speed, and anti-aliasing capabilities. Both RealityEngine2 and Pixel Planes 5 can process approximately 2 million texture-mapped polygons per s. PixelFlow promises significantly higher polygon processing rates than are available in current designs. None of these machines is able to render photorealistic time-varying visual scenes at high frame rates.
Vectorized or massively parallel super computers are also in use in VE applications to improve computational throughput. However, the use of parallel super computers may not significantly reduce the run times of VE applications that cannot be performed in parallel or that require large amounts of data movement. One approach suggested for maximizing versatility is to base computations in VE systems on a few parallel high-powered scalar processors with large shared memories. In such systems, different processors would handle different parts of the VE. Alternatively, different processors might be dedicated to different types of data (e.g., one might handle all computations related to the density field of a fluid, whereas another might handle all those related to its velocity field).
Other limiting factors relate to speed of data access. These include the time required to find the data in a mass storage device (seek time) and the time required to read the data (bandwidth). In circumstances in which very large datasets are needed for a single computation or picture, the bandwidth is critical and semiconductor memory may be the only viable storage medium in the near future.
Software In order to provide a fully interactive, real-time, natural-appearing environment, software development is required in a wide variety of areas. The real-time generation of VEs requires consideration of interaction, navigation, modeling, the creation of augmented reality, hypermedia integration, and operating system software.
INTERACTION Interaction software makes use of the outputs of human-machine control devices to modify the VE. The control devices now in use include position trackers, mice, keyboards, joysticks, and speech recognition systems. As a rule, tasks in VEs are performed using a number of control techniques in combination because none is adequate by itself. The interaction software takes all such control signals, scans them
for obvious errors resulting from equipment or user malfunction, and then transfers the resulting information to those portions of the system involved in generation of the appropriate VE.
NAVIGATION Visual navigation software controls what the user sees as he or she moves through the VE and turns his or her head. More specifically, what a user sees is determined by two parameters: the user's location in the VE and the gaze direction. Typically, a head tracker is used to sense the position and orientation of the user's head. Changes in the user's location within the VE can be effected by a virtual vehicle (e.g., a human-operated treadmill, bicycle, or joystick that moves the viewpoint of the human user), by the specification of a new vantage point within the environment and the execution of a logical command to fly to that point, or by simulated teleportals in the distance at which the user can suddenly appear without moving through the intervening space.
Two important sets of software issues that arise when such user movements are taking place concern the mapping of the user's control actions into specifications of how the visual scene should be changed (an interaction issue) and the navigator aids provided to the user to prevent the user from getting lost in the VE.
A further very important software issue concerns the need to minimize the load on the graphics processors. Even if the contents of a VE remain static (as in the case of a virtual building depicting an architectural design), the display to the user changes as his or her point of view changes. A number of techniques have been developed to reduce the polygon flow to the graphics processor, but no general solution is yet available. Current solutions are generally application-specific, and some work well only if the underlying environment does not change dynamically. Two general techniques for minimizing polygon flow are the partitioning of the polygon-defined world into volumes that can be readily checked for visibility by the viewer and the low-resolution rendering of objects that are small in the user's visual field (e.g., as the result of being very far away).
MODELING Models that define the form, appearance, and behavior of virtual objects are the core of any VE. Today, geometric models constructed for VEs are developed, for the most part, using commercially available computer-aided design (CAD) systems. The tools provided by these systems aid designers in specifying object shapes and sizes; however, these objects are often difficult to use in situations that were not considered by the original CAD designer. As a result, a substantial amount of manual manipulation may be required to use an object specified with a CAD system for a VE. When a VE application requires a replica of a real environment, it is generally considered preferable to map
the real environment rather than build a model of it. Active mapping techniques, such as scanning laser range finders and light stripes, are used to make three-dimensional measurements directly. The drawback of these methods is that they capture only information visible from a particular viewpoint. To achieve a complete map can require taking multiple views and combining them into a coherent picture. Some passive techniques, such as stereoscopic methods, are also in use. However, none of the stereo algorithms is robust enough to compete with active methods. For many purposes, far more is required of an environmental model than just a map of object surface geometry. If, in applications that attempt to model real-world behaviors, the objects are to be manipulated, the physics of the objects is needed—how they behave, their composition, etc. Even when the relevant physics is well understood, simulations based on this understanding can be tedious and time-consuming (both to construct and to run). In a VE, these simulations must run reliably and automatically; any situation that might arise must be adequately anticipated and handled correctly in real time.
The need for autonomous agents may arise in many VE applications, such as entertainment, training, manufacturing, and education. Although the ability to create fully credible simulated humans is well beyond our grasp in the foreseeable future, we do have the capability to develop simple agents. The agent's body in a VE is a physical object to be controlled to achieve coordinated motion. A computer model of a human figure that can move and function in a VE is called a virtual actor. A guided virtual actor is one whose movement is directed and controlled by the motions of a real human being. An autonomous virtual actor operates under program control and is capable of independent behavior that is responsive to the VE, including both human participants and simulated objects and events. An autonomous actor may touch and manipulate objects, make contact with various surfaces, or make contact with other humans directly (e.g., shaking hands) or indirectly (e.g., two people lifting a heavy object). Autonomous agents need not be literal representations of human beings but may be represented by various abstractions.
AUGMENTED REALITY In an augmented-reality system, virtual and real objects appear in the user's view simultaneously; the artificial or virtual image is overlaid on the real-world image. Creating adequate software for augmented-reality applications is a difficult task that requires a complete model of the real environment as well as of the synthesized environment. Automatic generation of effective augmented reality is still at the research stage. A major issue is the ability to create and maintain accurate registration between the real and synthetic environments, particularly when they are both rapidly varying.
HYPERMEDIA INTEGRATION Hypermedia is multimedia data composed of audio, compressed video, and text that is linked together in a nonlinear manner. A hypermedia system provides an individual with the opportunity to explore a topic by moving through a series of logically linked information nodes. When a node is reached, the individual may obtain all of the information available at that node in a variety of forms. One example is a virtual museum containing hypermedia nodes that provide significantly expanded information about particular artifacts. Once at a node, an individual can pursue a particular object in depth or explore its relationship to other objects in the exhibit. Hypermedia integration software (which involves the blending of computer graphics, video, sound, and, in the future, haptic images) is used to combine hypermedia software with VE. Embedding hypermedia nodes into a VE system allows a participant in a VE to go to a node and gain additional information about a particular experience. Work in this area is currently at the test bed stage.
OPERATING SYSTEM Current operating systems (UNIX, Windows NT) are not geared to supporting the real-time multimodal requirements of VEs, and significant modifications are required if the development of VEs is to proceed efficiently. There is a major need for systems that ensure that high-priority processes (such as user tracking) receive service at short, regular intervals and to provide time-critical computing and rendering with negotiated graceful degradation algorithms that meet frame rate and lag-time guarantees—this is a new computing paradigm.
As indicated previously, a number of important SE applications involve sensing, navigating through, and manipulating objects in real-world environments. Such applications frequently arise in the domains of hazardous operations and medicine. In all such applications, telerobotics plays an essential role; human-machine interfaces and computer-generated VEs are not sufficient.
In many ways, current research activities concerned with teleoperation are similar to those concerned with VEs. Independent of the purpose for which the system is being designed (e.g., to train someone to fly an aircraft or to actually remove some hazardous waste), and independent of whether the relevant environment is real or virtual, in both cases these activities are concerned with the design, construction, and application of multimodal immersive systems that enable the operator to interact usefully with some structured environment. Furthermore, although concern with complex electromechanical systems was previously confined to
individuals working on telerobotics, now, as haptic interfaces are becoming recognized as important components of VE systems as well as of teleoperator systems, the VE community is also beginning to show interest in this area. Independent of whether the haptic master in the interface controls the behavior of a telerobot or of some computer-generated virtual entity, design and construction of the master requires consideration of electromechanical phenomena and devices.
The principal differences that currently exist relate to (1) the design and performance of the telerobots and (2) communication time delays, and (3) the demands of real-time input/output operations. Unavoidable time delays (transport delays) arise in communicating between the human-machine interface and the telerobot when these subsystems of teleoperator systems are separated by a large distance (e.g., a delay of 30 ms between Washington, D.C., and Los Angeles, a delay of a little more than 1 s between the earth and the moon). Such transport delays decrease in importance as the distance decreases, as other delays in the overall systems (e.g., resulting from inadequate computational speed) increase, and as the importance of haptic interfaces with force feedback decreases. It should also be noted, however, that transport delays will increase in importance in VE systems not only as such systems increase their use of haptic interfaces with force feedback, but also as VE networks with tightly coupled players at distant locations come into being. The approaches being used to alleviate the time delay problem, which can arise in connection with problems of both physical stability and perception, include supervisory control and predictive modeling. Both of these approaches are being actively pursued by the teleoperator research community and, more recently, by the VE research community as well.
In typical VE systems, only simple haptic interfaces or trackers may be used, thus placing modest real-time input/output demands on the computer. In such circumstances, the real-time performance of current operating systems may be adequate or could become adequate with some modifications. For telerobotic systems and for VE systems with complicated haptic and other human interfaces, the input/output requirements are too massive to be handled by ordinary workstation-based architectures. The approach is to use a separate microprocessor system for real-time operations, connected to a workstation or PC that acts as a front end.
Apart from the research related to time delay problems and control issues, much of the current research activity in the telerobotics area is focused on telerobotic hardware. Although substantial advances have been made in this area and a number of impressive telerobots have been developed and applied to practical problems, the limitations imposed by inadequate hardware are still substantial. For example, sensor technology (to sense object proximity, object surface properties, and applied
force) remains inadequate. Similar remarks apply to actuator and transmission technology.
Other current activities in the telerobotic area concern the development and use of new materials and the exploitation of advances in microelectromechanical systems. The availability of extremely small structures (including sensors and actuators) is stimulating exciting new work on microtelerobotic hardware and the interface and control problems associated with scaling down movements and forces from human scale to micro scales.
Also, and partly as a consequence of the advances in microelectromechanics, engineers are beginning to think about the possible benefits and feasibility of creating teleoperator systems that make use of distributed telerobotics, that is, a large number of relatively simple, relatively small telerobots with relatively narrow bandwidth communication among these telerobots. The use of multiple patrol telerobots for security purposes is one example of such an application. A major challenge in this area, aside from the development of the telerobots themselves, concerns the nature of the human-machine interface and the design of a display and control system that treats the set of telerobots as an integrated system rather than as a collection of independent entities that require a separate interface for each telerobot (and perhaps a whole set of human operators rather than a single operator).
The general problem of networking telerobotic systems, either in the sense of networking telerobots or networking the human operators, has received relatively little attention.
Communication networks can transform VEs into a shared environment in which individuals, objects, and processes interact without regard to their location. These networks will allow us to use VE for such purposes as distance learning, group entertainment, distributed training, and distributed design.
Currently, the two application domains in which the most networking activity is occurring are entertainment and national defense. In the entertainment industry, VR companies are in the process of forming cooperative arrangements with cable television companies to develop multi-user games and interactive shopping.
Applications for the military have focused on large-scale simulated network training exercises, such as those offered by SIMNET. In SIMNET, as many as 300 soldiers in tank and aircraft simulators located at different military bases can engage in a realistic battle against an intelligent enemy on a common battlefield. Currently, the Defense Department is using
new communication software that upgrades the SIMNET protocol. This upgrade, named Distributed Interactive Simulation, is expected to be used by the military in building future distributed training scenarios as well as in simulating the acquisition process for various pieces of planned equipment. Another example of network use can be found in the experimental program currently being pursued in telemedicine by the state of Georgia. Network applications such as these, all of which are discussed in Chapter 12, constitute only a first step.
To move the communication software development forward to a point at which it can truly support VE applications will require the development of VE-specific, applications-level network protocols. These applications protocols are required for communicating world-state-change information to the various networked participants in the operating VE.
As mentioned in the previous section, the networking of telerobotic systems in diverse locations has also begun to receive some attention. For example, the universities associated with the Space Automation/Robotics Consortium (Texas A&M University, the University of Texas at Austin, the University of Texas at Arlington, and Rice University), in conjunction with the National Aeronautics and Space Administration (NASA), have developed software and protocols to use the Internet to control telerobots in their different locations. This type of effort is under consideration elsewhere and is likely to grow in scope. Advances in networking will strongly impact such developments.
Wide-area network (WAN) hardware is being developed on a variety of fronts. The national telecommunications infrastructure is being radically altered by the installation of fiber optic cabling that is capable of operating at gigabit speeds across the country. As a result, major commercial carriers are installing special switches that handle both synchronous and asynchronous signals at very high speeds. Government-supported networks are also in the process of upgrading. For example, NSFnet, the backbone of the Internet, is currently operating at T-3 speeds (45 Mbit/s) and plans to move to OC-3 (155 Mbit/s) in 1994 and to OC-12 (622 Mbit/s) by 1996. The National Research and Education Network is one of four components in the U.S. High Performance Computing and Communication program, which is supporting the installation of OC-12 networks at five regional test beds for research purposes.
Workstations with three-dimensional graphics will be connected to the WANs discussed above through local-area networks (LANs). Most of these LANs, which currently use Ethernet technologies (10 Mbit/s), do not have the capabilities to support the high-performance demands of VE and multimedia. However, work is proceeding on larger and faster local networks, such as the Fiber Distributed Data Interface. This system currently operates at 100 Mbit/s, but the follow-on, expected by the middle
to late 1990s, will operate at speeds up to 1.25 Gbit/s. Moreover, the Institute of Electrical and Electronics Engineers (IEEE) has issued a draft standard, Integrated Services Local Area Network Interface, which defines a LAN that carries voice, data, and video traffic.
Although networks are becoming fast enough to support the development of distributed VE, we need greater bandwidth to support the very large-scale, multimodal, and multiple-user applications that we foresee in the near future. Another problem is network host-interface slowdowns caused by the multiple layers of the operating system software. A further issue is the high cost associated with buying time on a high-speed WAN. The current estimated cost for one year exceeds the entire budget of most research groups. Finally, it should also be noted that, despite the beginning efforts mentioned above to network teleoperator systems, it is important to focus on teleoperator systems as well as VE systems as work in this area progresses.
Evaluation of SE Systems
SE both draws on and provides research and development challenges to several well-established disciplines, including computer science, electrical and mechanical engineering, sensory physiology, cognitive psychology, and human factors. In each discipline, the requirements associated with creating SE technology raise new questions that call for research and evaluation. Some examples include: (1) identifying the capabilities and limitations of human beings as criteria for system design, (2) developing the hardware and software that can deliver SEs in a cost-effective manner, and (3) determining areas in which SE can make an important difference in human experience or performance.
As with the introduction of many new technologies, SE technology has not been adequately evaluated. Moreover, evaluation with regard to ultimate effectiveness is difficult because the technology is at an early stage in its evolution and, as a result, does not provide the high-fidelity environments and the natural interfaces that are planned for the future. In addition, SE offers some particularly complex evaluation issues because of its interdisciplinary nature and the requirement to integrate several technologies to create a full SE system. Human studies are needed to generate requirements for both component and full system design. Moreover, it is desirable to have cost-effectiveness evaluations for each component as well as for prototype systems in each application area. Some evaluation questions concern the engineering reliability and efficiency of components or full systems. Other questions focus on how well the design accounts for human perceptual and cognitive features or for human responses to alterations in sensorimotor loops. In the cognitive area, research indicates that it is difficult to make generalizations about the
relationship between types of tasks, task presentation features, and human performance. This indicates a need to conduct studies to explore these relationships as the technology improves and ideas for new SE applications are proposed.
Another set of questions for evaluation includes the possible medical and psychological effects of SE technology on human beings. For example, studies will be needed to ensure that the technology will not have any adverse effects over time on human visual, auditory, or haptic systems. Furthermore, there are the important concerns about induced motion sickness in SEs and the potential after-effects of adaptation to SEs when an individual moves back into the real environment.
There are many types and levels of evaluation that can be used to provide direction, understanding, and a general picture of performance effectiveness. Standard evaluation methodology offers a range of options, including: (1) empirical studies using observation techniques or experimental designs to collect data in laboratory or field settings and (2) analytic studies involving theoretical modeling, heuristic evaluations, and simulations of varying system functions. Each of these methods can be used at various points in the system's evolution.
At present, relatively little evaluation of SE systems is taking place. The extent to which the novel aspects of SE will require new evaluation tools remains to be seen.
The committee's overarching conclusion is that SE systems have great potential both for helping to satisfy various societal needs and for stimulating advances in some important areas of science and technology. In our deliberations, when reservations about the value of investment in some particular SE area were expressed, they often reflected the judgment that the importance of advances in the particular area were dwarfed by the need to modify related social and political factors rather than that the area was unimportant or inappropriate from a scientific or technical point of view.
In recommending topics and areas for concentrated research and development efforts, the committee rejected the approach of developing a small number of high-priority (''star") recommendations. The possible applications cover such a broad range of societal activities, and the advances required to realize truly cost-effective systems for these applications cover such a broad range of research and development activities, that the star approach seemed totally inappropriate.
In constructing this research agenda, the committee used three kinds of criteria. The first is concerned with advancing the state of the art.
Specifically, it is concerned with the extent to which a project under consideration can be expected to lead to improved understanding of important phenomena and/or to improved technology. We refer to criteria of this type as science and technology criteria.
The second is concerned with the likelihood that the project in question will have important practical—and positive—consequences within the not-too-distant future (i.e., the next five years). We refer to criteria of this type as practical applications criteria.
The third is concerned with factors such as leverage, cost-effectiveness, and ratio of payoff to effort. In addition to technical matters, evaluation of these factors must involve consideration of current conditions and forces in society, which go beyond those the committee was appointed to examine. In general, we refer to criteria in this category as leverage criteria.
Finally, it should be noted that the recommendations have not been prioritized in any detailed manner. This is due primarily to our judgment that successful development and application of SE systems depends on an entire matrix of interrelated factors, not on one or two isolated factors. We nevertheless feel that it is important to stress the crucial need for improved hardware technologies to enable development of improved interface devices and improved computer generation of multimodal images. Unlike the situation in the area of teleoperation, in the area of VEs there are relatively few individuals who have primary interests or backgrounds in hardware; most individuals in the VE area are involved primarily in the software end of computer science, in communication or entertainment media, and in human perception and performance. Thus, the importance of adequate hardware, without which the VE field will never come close to realizing its potential, tends to be underplayed by the VE community. A somewhat similar comment concerns the issue of user comfort. To date, a very large fraction of VE usage has occurred in the context of short demonstrations, a context in which the degree of comfort is relatively unimportant. However, if the comfort of VE systems (particularly head-mounted displays) cannot be radically improved, the practical usage of these systems will be limited to emergency situations or to very short time periods. In other words, adequate comfort, as well as technically adequate hardware, are essential to realizing the potential of the SE field.
The research agenda we propose covers four main categories:
Some psychological considerations,
Development of improved SE technology, and
Evaluation of SE systems.
In this section we present recommendations for research and development in the near future in terms of these categories. We also indicate to the extent possible the role played by the various types of criteria in making these recommendations. Further details are provided in the chapters of this report. In addition, we make comments and suggestions for government policy and infrastructure based on the experience and judgment of committee members. They are suggestive of the kinds of tools and mechanisms that federal agencies might use to encourage coherence, integration, and overall development of the field.
Application Domains of SE Systems
RECOMMENDATION: The committee concludes that four application domains show the most promise for SE: (1) design, manufacturing, and marketing; (2) medicine and health care; (3) hazardous operations; and (4) training. We recommend that the research needs in these domains be used as one of the principal means to focus SE technology development and testing.
Our review shows that each of these domains includes tasks that are particularly compatible with the projected capabilities of SE. Each of these domains received high scores with respect to the science and technology and practical applications criteria. The domain of hazardous operations also received a high score with respect to the leverage criterion because of the relative lack of attention and funding given to this domain.
The committee has not assigned priority to the application domains of education; information visualization; and telecommunication and teletravel. Although committee members agreed that the education domain is exceedingly important—perhaps the most important of all the domains considered—it was not assigned priority because of our judgment that the development of improved education technology will have only a minor effect on the quality of education actually received. In other words, the main current obstacles to achieving substantial improvements in education are social, political, and economic, not technological. Thus, even though the education domain can be viewed as a high-leverage domain with respect to funding considerations, it is regarded as a low-leverage domain overall. Also, the committee did not rate this domain highly with respect to the science and technology criterion. Although further scientific research is required to determine how SE technology can best be utilized in K-12 education, it is believed that other application domains are likely to play a more important role in driving SE technology. If the relevant infrastructure undergoes changes that greatly facilitate widespread and in-depth use of technology within this area, then priority for the education domain would be indicated.
For the present, however, it is believed that efforts with the highest payoff relevant to the education domain will be those that are directed toward the general development of improved technology, toward alteration of the relevant social, political, and economic infrastructure, and toward influencing the entertainment industry to include programs, activities, systems, and facilities in the soon-to-be upgraded interactive home entertainment centers that have increased educational value. To the extent that the introduction of improved communication technology that links students and teachers (and parents) at disparate locations can help improve the current educational infrastructure as well as directly improve the quality of education, the committee would strongly support such an effort.
The information visualization and the telecommunication and teletravel domains are not assigned priority for a number of reasons. Paramount among them, and consistent with their cross-cutting characteristics, is that work in the specified priority domains will necessarily include work in these domains as well. Thus, for example, work in design, manufacturing, and marketing as well as medicine and health care will necessarily include work on information visualization. Similarly, work on hazardous operations and training will necessarily involve work on telecommunication and teletravel. An additional reason for denying the information visualization domain priority is the fact that work in the area of scientific visualization using VE is already quite active. Thus we did not feel that this domain deserved a high score on the leverage criteria.
Although equivalent remarks could be made about the cross-cutting nature of the training domain—that is, applications of SE technology to training will occur in connection with each of the other high-priority domains—training was judged to be so important and the potential for achieving substantially improved cost-effectiveness in training so great, that it retained a priority rating.
RECOMMENDATION: The committee recommends two projects for special attention: (1) modeling the human body for purposes of medical education, surgical planning, and providing explanations of procedures and outcomes to patients and (2) studying the transfer of knowledge and skill gained in training in a VE to performance in a real-world task environment.
Modeling the human body is required for many of the VE applications considered within the general domain of medicine and health care. For example, within the subdomain of surgery, physical models of various
bodily organs, skeletal structures, physiological subsystems, etc., are needed for the planning of surgery, the training of surgeons, and explaining possible procedures and outcomes to patients. These models must be sufficiently accurate to serve the purpose at hand yet sufficiently simple to satisfy the computational constraints imposed by the limited power of the available VE system. In many cases, the fidelity of the virtual body parts, processes, and systems will be limited not only by inadequate VE facilities, but also by inadequate scientific knowledge (i.e., inadequate empirical data about the phenomena to be modeled or inadequate theoretical techniques for the quantitative modeling of these phenomena). Nevertheless, we recommend this project because it satisfies all three criteria and because even very crude models are likely to provide results that are substantially superior to those now achieved. Thus, for example, it should not be overwhelmingly difficult to develop physical models with enough fidelity to create VE educational experiences (for medical personnel and patients) that usefully supplement those now being realized by means of static two-dimensional illustrations, conventional videos, and spoken or written words. Not only should it be possible to develop models in which the visual components are superior to those now achieved with conventional techniques, but the addition of the auditory and haptic channels in such efforts should also greatly facilitate the education process.
Studies of training transfer, the second special project area cited, are essential to the successful development of VE training procedures for a wide variety of training tasks in a wide range of application domains. Although it is generally recognized that VE systems have great potential for cost-effective training, in order to realize this potential it is necessary to determine and understand how various differences between the situation faced by the trainee in the VE training system and the situation faced by the trainee in the real-world situation for which the individual is being trained influence the effectiveness of the training as measured by task performance in the real-world situation. Research on this transfer-of-training issue has, of course, been conducted for many years in connection with other types of training. However, the class of differences between the training and real-world performance situations that will arise when VE training is used are likely to include numerous elements that have not been studied before (e.g., the differences associated with unique points of view or instructional cues that can be generated by VE systems but not by conventional training systems). Furthermore, even when no such new elements exist, understanding of training transfer issues remains rather limited. As with the human-body modeling project, the training transfer project has high ratings on all three criteria. It is very challenging scientifically, advances will have important and
immediate applications, and modest amounts of additional funding are likely to significantly increase the probability of success in VE training applications.
RECOMMENDATION: The committee recommends that support for psychological studies be organized around the following objectives:
Development of a comprehensive, coherently organized review of theory and data on human performance characteristics (including consideration of basic sensorimotor resolution, perceptual illusions, information transfer rate, and manual tracking) from the viewpoint of SE systems.
Development of a theory that facilitates quantitative predictions of human responses to alterations in sensorimotor loops for all channels, with special emphasis on: (a) degradations in performance resulting from deficiencies in SE technology (e.g., in the form of distortions, time delays, and system noise), (b) supernormal performance achievable through introduction of purposeful enhancing distortions, (c) radical sensorimotor transformations that arise, for example, in connection with the use of sensory substitution or strongly nonanthropomorphic telerobots, (d) methods of accelerating both adaptation to various types of alterations and readaptation to normal conditions, (e) channel-interaction effects that occur with multimodal interfaces, (f) factors governing the occurrence, kind, and magnitude of sopite sickness from SE exposure, and (g) factors governing the strength of subjective telepresence and its relationship to objective performance.
Development of cognitive models that will facilitate effective design of VE systems for purposes of education, training, and information visualization.
Development of improved understanding of the possible deleterious effects of spending substantial portions of time in SE systems.
Although a wide variety of research on human performance characteristics has already been performed, the results have not yet been coherently organized to reflect the viewpoint of SE systems. Such a review would not be too difficult or expensive to prepare, and would be extremely useful to a large segment of the SE research community. In addition to providing important relevant information to this community, it would help delineate the further research that is required in this area to guide design of improved SE systems.
The task of characterizing and modeling human responses to alterations in sensorimotor loops constitutes a major challenge. Knowledge in this area, which is fundamental to the design of essentially all SE systems for all types of applications, is seriously inadequate. Currently, it is difficult to predict how such alterations will influence either objective performance or subjective state, and in particular, how performance and state will change over time as the user gains experience with the alteration.
Some of the subareas in which we urge special emphasis here may be strongly application-dependent. We also recognize that past efforts in some of these subareas have not always been as fruitful as one would like. Nevertheless, because of the importance of these subareas to the progress of the SE field, the committee's recommendation to pursue this research received high ratings according to all three classes of criteria.
The development of improved cognitive models for characterizing the manner in which experience gets organized, problems get solved, world views are formed, and learning takes place is essential to a wide variety of VE uses; however, it is particularly important in the application domains of education, training, and information visualization. Without such models, and without the proper integration of such models into a general framework that includes the effects of multimodal sensorimotor experience on perception and cognition, the design of VE systems for these application areas will be seriously handicapped.
A final set of topics for research concerns the study of the possible deleterious effects of spending substantial portions of time in SE systems. To some extent, problems of this type will be addressed automatically as a consequence of the government-mandated human-use monitoring of experimental research involving human subjects; however, the precautions associated with this monitoring are not likely to adequately address possible negative effects of continued usage in everyday life. In any case, it is obviously essential to cover issues related to the effects of SE on relatively deep psychological factors such as self-image, cognitive style, affective state, and motivation.
Development of Improved SE Technology
Human-machine interfaces include all the devices used in an SE system to present information to the human or to sense the actions or responses by the human that control the machine in question.
RECOMMENDATION: The committee recommends support of research on visual displays, haptic interfaces, and locomotion interfaces, with emphases as outlined below.
These recommendations were highly rated on all three criteria: science and technology, applications, and leverage.
Visual Displays The development of adequate head-mounted displays is very important to the SE field. The main deficiencies in current HMDs involve the quality of the visual display and the ergonomics of the helmet used to mount the visual display on the user's head. (Both the quality and the ergonomics of the auditory component of HMDs are currently adequate.) Despite the substantial effort that is already being directed toward reducing these problems by the entertainment industry (whose focus is on low-end systems) and by the military, because of the large payoff that would result if substantial improvements were made, the committee recommends that strong support be provided to research in this area. Of particular importance for the visual display is substantially improved combinations of resolution and field of view. Also to be considered in this area is the inclusion of see-through options for use in augmented-reality applications.
Work on the ergonomics associated with the HMD should include not only consideration of mass and center of gravity (as well as fit), but also consideration of wireless, broadband communication to eliminate the customary tether used. Integration of visual, auditory, and position tracking in one device (possibly helmet, but much preferably, sunglass-sized) should take account of detailed data and understanding of norms and variations in human heads. Of particular importance is exploring alternative materials and configurations and evaluating them in a VE context.
Because there is no guarantee that HMDs that are fully adequate for all important tasks will be available even within the next 10 years, attention must also be given to improving off-head displays. In particular, the committee recommends that OHD research be carefully monitored and support be provided to those display projects that appear promising from the performance point of view (e.g., some of those concerned with autostereoscopic displays) but are too risky from the business point of view to be supported by industry. The projects supported by the military in this area have not led to displays that are affordable, and the projects conducted by private hobbyists and inventors in this area are likely to die out because of inadequate funding.
Special attention should also be given to the study of perceptual effects associated with the merging of displays from different display sources (as in augmented-reality applications). Independent of whether the images that are combined with computer-generated VE images are derived directly from the real environment by means of see-through displays or indirectly through the artificial eyes of a telerobot, the psychophysics of such merged displays must be carefully studied. Not only is
there relatively little past work in this area, but also the use of such merged displays is likely to become increasingly common. Equivalent psychophysical studies in the auditory and haptic channels are regarded as worthwhile but less urgent.
Haptic Interfaces Many important potential SE applications cannot be realized adequately without the development of substantially improved haptic interfaces. Although the haptic devices that can be developed over the next 5 to 10 years will undoubtedly fall short of the ideal device, they can vastly improve the range and quality of haptic interactions that are now available in SE systems. Particular attention should be given to the development of tool-handle interfaces in which the possible haptic interactions are constrained by the nature of the tool.
Special support also should be given to basic haptic science relevant to the development of improved haptic interfaces. Empirical data and theoretical models describing phenomena for this channel are much less adequate than for the visual and the auditory channels, and considerable scientific work is needed to support technology development for this channel. This work should include studies confined entirely to the haptic channel (e.g., exploration of haptic illusions), as well as studies concerned with the interaction effects of this and other channels (especially vision).
Locomotion Interfaces Another technology area for which the committee recommends special support is that of locomotion interfaces. The range of possible applications would be substantially increased by the availability of improved locomotion interfaces, and there seems to be no fundamental obstacle to the creation of such interfaces.
Position Tracking and Mapping The development of improved devices for position tracking and mapping is extremely important.
RECOMMENDATION: In this area the committee recommends a multiphased approach: (1) conduct research and development on mechanical trackers and inertial trackers, (2) explore the possibility of obtaining improved cost-effectiveness in tracking by using hybrid systems, and (3) carefully monitor commercial developments in magnetic, acoustical, and optical trackers, in eye trackers, and in trackers directed toward registration problems in augmented reality. If market forces do not drive the development of these trackers, federal research support is urged.
The committee's recommendations in this area are complex because of factors relating to the leverage criteria.
Market forces in the development of mechanical trackers, inertial trackers, and hybrid systems currently appear minimal, so research in
these areas requires support. In the committee's judgment, the other devices mentioned in the recommendation will continue to improve without special support. For example, it is believed that commercial interest in the manufacturing and health care domains is sufficient to drive development of improved optical systems for registration of images from different sources in augmented-reality applications. If this should turn out to be incorrect, however, the committee would assign high ratings to research in this area.
Testing and Evaluation A further important recommendation concerning interface technology concerns the physical testing and evaluation of interface devices.
RECOMMENDATION: The committee recommends the establishment of a set of standards or an independent laboratory to evaluate SE interface devices.
Because of the lack of reliability in manufacturer's specifications, and because the techniques and standards employed by different laboratories in such efforts vary widely (even if the laboratory performing the work has no vested interest in the device), it is recommended that an independent laboratory be established to evaluate SE interface devices or, alternatively, strict standards be set for such evaluations.
Other Interface Issues We have not included recommendations concerning support for auditory displays, speech communication interfaces, physiological interfaces, or other types of displays previously mentioned (related to olfaction, temperature, wind, etc.). For auditory, speech, and physiological interfaces, this determination results from our judgment that any required advances in these areas will be driven by forces outside the SE field and thus no special support is required. Although work on the other types of interfaces may not be driven by such outside forces, we do not regard the need for these other types of interfaces to be as important for most application domains. It should be noted, however, that their inclusion is likely to provide a major increase in the sense of presence or immersion in VEs and, to the extent that such subjective effects are likely to have a strong positive influence on performance, they may be important.
Computer Generation of VEs
Hardware Advances in computational and communication hardware are essential to the full realization of VEs. The hardware capabilities available today have given researchers, entrepreneurs, and consumers just a taste of virtual worlds and a promise of possible applications. Because of
the potentially wide appeal of VEs and the large variety of applications with differing performance requirements, it is important to continue hardware development at several levels, from high-end supercomputer workstations to low-end personal workstations with modest capabilities.
Extrapolating current trends, we expect that VE applications will continue to saturate available computing power and data management capabilities for some time to come (dataset size will be the dominant problem for such applications as physical modeling and scientific visualization). In the future, high-end VE platforms will require the following features: very large physical memories (> 15 Gbyte), multiple high-performance scalar processors, high bandwidth (> 500 Mbyte/s), low latency (< 1 millionth s) mass storage devices, and high-speed interface ports for various input and output peripherals.
The research and development required to achieve the hardware capabilities stated above gets high ratings in terms of both the science and technology criteria and the practical applications criteria. However, we are much less clear about the leverage criteria. On one hand, current market forces do not appear to be driving adequate development of coherent and integrated architectures for multimodal modeling, representation, and rendering. Even within the confines of the visual channel, it appears that relatively little attention is being given to time-deterministic generation of images (i.e., systems that guarantee compliance with appropriate bounds on graphics update rate and lag, possibly at the expense of resolution). On the other hand, if the overall commercial market for computer hardware continues to grow at the same rate as it has in the past, or if the SE field looks very promising to the industry, then such advances will probably require no special support. Therefore:
RECOMMENDATION: The committee recommends no aggressive federal involvement in computer hardware development in the SE area at this time. Rather we conclude that hardware development remain largely a private-sector activity. Should serious lags in development occur, the government might then consider strategies for leveraging private-sector development efforts.
Software In the past, research on and development of software for VE has been conducted through small, independent research programs.
RECOMMENDATION: The committee recommends that a major unified research program be created that focuses on those areas of development directly related to the generation, implementation, and application of VEs. The basic topics that need to be considered in such a program include: (1) multimodal human-computer interactions, (2) rapid specification and rendering of visual, auditory,
and haptic images, (3) models and tools for representing and interacting with physical objects under multimodal conditions (including automated model acquisition from real data), (4) simulation frameworks, (5) a new time-critical, real-time operating system suitable for VEs, (6) registration of real and virtual images in augmented-reality applications, (7) navigational cues in virtual space, (8) behavior of autonomous agents, and (9) computer generation of auditory and haptic images.
There is a need to develop methods and software to interpret and respond to multimodal inputs from a wide variety of devices, including those associated with position tracking, haptic manipulation, and speech commands, often occurring in concert. Improved software must be developed to determine and filter out errors in these control signals and to provide the appropriate resulting information to the software generating the VE.
Fast and realistic specification and rendering of visual, auditory, and haptic images is a fundamental topic for VE research. The combined requirements of realism and interactive performance are extraordinarily severe. For example, in the visual modality, it has been estimated that the creation of a totally realistic interactive environment would require 80 million polygons per picture and a minimum of 10 pictures/s—a total of 800 million polygons/s, far beyond the capacity of current graphics workstations. Across modalities, there is a need to develop representations and rendering algorithms that dramatically increase effective throughput without sacrificing realism. A key issue will be to generalize static rendering methods to effectively handle dynamic scenes, for example by exploiting temporal coherence and by automatically adjusting the level of detail to match what the user will be able to perceive. Research and development in parallel rendering will also become important.
Real-time rendering for interactive systems poses additional problems when the auditory and haptic modalities are included. For each modality, the software for rendering images receives the output of the physical model and generates the commands needed to drive and control the interface devices. The major issues are: (1) the accuracy of rendering in relation to computed output of the model, the capabilities of the display device, and the sensory resolution of the human user in each of the modalities; (2) minimization of time delays in the rendering for each modality; and (3) synchronization of displays among multiple modalities. Efficient rendering requires that the capabilities of the rendering software should be commensurate with those of the physical model, the display device, and the human user. The higher the display resolution, the more time-consuming is the rendering, leading to time delays. Such delays, if
excessive, generally lead to perceivable lags and distortions. In the case of force-reflecting haptic interfaces, they may also cause mechanical instabilities. Therefore, rapid rendering software that minimizes time delay while retaining optimal display resolution is critical for each of the modalities. Under multimodal conditions, the additional condition of synchronization of the modality-specific displays needs to be satisfied. We recommend support for development of software tools for rapidly driving visual and auditory display devices, together with fast, real-time control of haptic interfaces. Such software can have both device-dependent and device-independent components.
There is a major need to develop more powerful methods for acquiring and representing realistic models of physical objects and for realistic simulation of the physical behavior of these objects. To construct realistic models of truly complex environments is all but impossible with current computer-assisted design tools. Creation of such complex models will require a combination of automated model acquisition from real data and automated model synthesis based on concise descriptions. Such models will ultimately need to capture not only the object characteristics relevant to the visual channel, but also all of the physical properties that must be specified to realistically simulate objects' appearance and behavior in the broad multimodal sense. Simulating the mechanics of the everyday world will be of central importance in giving virtual environments a sense of solidity and allowing users to effectively manipulate virtual objects through haptic interfaces. The problems that arise in generalizing standard batch-simulation methods to handle interactive VEs are analogous to those that arise in the extension of static rendering techniques.
Research into the development of environments in which object behavior as well as object appearance can rapidly be specified is an area that needs further work. We call this area simulation frameworks. Such a framework makes no assumptions about the actual behavior (just as graphics systems currently make no assumptions about the appearance of graphical objects). A good term for what a simulation framework is trying to accomplish is meta-modeling. Such frameworks would facilitate the sharing of objects between environments and allow the establishment of object libraries. Issues to be researched include the representation of object behavior and how different behaviors are to be integrated into a single system.
Because most current operating systems are built on commercial versions of UNIX, which is not designed to meet real-time performance requirements, the committee recommends that approaches to a new operating system suitable for VEs be studied. In principle this could be achieved by creating a new operating system architecture or providing upgrades or enhancements to existing operating systems (e.g., UNIX, Windows NT).
The operating system capabilities required for VE include support of very large numbers of lightweight processors communicating by means of shared memory, support of automatic or transparent distribution of tasks to multiple computing resources, support of time-critical computation and rendering, and very high resolution time-slicing and guaranteed execution for high-priority processes (to within 0.001 s resolution). Although not specifically addressing all of these concerns, the efforts of the IEEE Posix standards committee are starting to bring real-time capabilities to the open-system workstation environment.1 Supporting these capabilities in the operating system will significantly facilitate the development of many VE applications, especially larger, more ambitious efforts). The commercial sector cannot be expected to perform the necessary research and development in this area without incentives from the federal government. Specifically, we recommend that the government participate with industry in funding the upgrades and enhancements needed to provide an operating system that will meet the performance requirements for implementing VEs. Moreover, these joint funding efforts should be accompanied by a plan to move the new or upgraded systems to commercial adoption. To ensure that VE systems are written using an appropriate operating system, a financially sound transition plan must be formulated, funded, and executed.
Another important area for development is registration of real and synthetic images for augmented-reality applications. To create the illusion that synthetic and real objects exist in the same world requires highly accurate registration. For example, to make a synthetic object appear to rest on a real table, the object must move with the table as the observer moves, and accurate registration requires both a good geometric model of the scene and good measurements of observer motion. In addition to the purely geometric aspects of registration, illumination effects (casting synthetic shadows onto real objects) must be handled. Note also that significant misregistration may be disastrous in certain applications (e.g., surgery).
In addition to the general areas discussed above, the committee recommends that research and development on the following topics be supported: navigational cues in virtual space, the behavior of autonomous agents, and the computer generation of both auditory and haptic images for VE. Navigational cues are important because there is a great tendency in current VE systems for users to lose their way during virtual travel (or even simply during rotations of the head). Work on autonomous agents
is important because many future applications are likely to require such agents, and the task of designing appropriate psychological and physical models for ''driving" these agents is an extremely difficult one. With regard to computer generation of auditory images, spatialization, synthesis of environmental sounds, and auditory scene analysis are judged to be the most critical; in the haptic channel, because so few results are currently available, a wide array of research projects should be supported. Although certain components of some of these problems relate primarily to design of human-machine interface devices, others relate primarily to software.
Recommendations in the area of telerobotics that are not already included elsewhere concern: (1) the effects of communication time delays on teleoperator performance, (2) telerobotics hardware (structures, actuators, and sensors), (3) microtelerobotics, (4) distributed telerobotics, and (5) real-time computational architectures.
RECOMMENDATION: The committee recommends that support be given to improving control algorithms, improving methods for constructing and using predictive displays, and improving methods for realizing effective supervisory control strategies.
Unless communication delays are properly handled, teleoperator performance will be severely degraded and may, under certain circumstances, become unstable. In order to combat the effects of such delays, continued efforts should be directed toward the development of improved control algorithms that ensure stability and yet, to the extent possible, provide reasonable gains. At the same time, continued effort should be directed toward the development of improved methods for constructing and using predictive displays and for realizing effective supervisory control strategies. Advances in combatting the delay problem are required not only in connection with hazardous operations, but also in connection with certain components of telemedicine (particularly telesurgery).
RECOMMENDATION: The committee recommends work in four areas of hardware development: (1) multiaxis, high-resolution tactile sensors, (2) robot proximity sensors for local guidance prior to grasping, (3) multiaxis force sensors, and (4) improved actuator and transmission designs.
Multiaxis high-resolution tactile sensors are needed to provide the telerobot with an effective sense of touch. Robot proximity sensors are required to provide local guidance prior to grasping. Such guidance
would greatly facilitate the development of adequate supervisory control. Multiaxis force sensors are needed to measure the net force and torque exerted on end effectors. For example, miniature force sensors of this type could be mounted on finger segments to accurately control fingertip force. Improved actuator and transmission designs are required to provide high-performance joints and improved performance of telerobotics limbs.
RECOMMENDATION: The committee recommends that research be conducted on issues that arise when microtelerobots are used in teleoperation.
As the field of microelectromechanics evolves, and smaller and smaller telerobots can be constructed, the need for both basic and applied research in this area will steadily increase. For example, it will be necessary to address problems associated with the scaling of movements and forces. Because the mechanical behavior of objects in the micro domain are radically different than in the macro domain, such scaling will require the development of new types of telerobotic controllers.
RECOMMENDATION: The committee recommends that consideration be given to the development and application of distributed telerobotic systems.
Relatively little attention has been given to teleoperator systems in which the human operator is interfaced to a distributed set of telerobots. Because many functions require sensing or acting over a region that is large relative to the size of an individual telerobot (e.g., patrolling land or structures for security reasons), such systems, if appropriately designed and developed, would have many important applications. Issues that need to be addressed in this area include the careful selection of specific applications, the design of the communication system for transmitting information among the telerobots and between the set of telerobots and the human operator, and the design of human-machine interfaces that are well matched to human sensory and control capabilities in situations involving multiple telerobots.
RECOMMENDATION: The committee recommends the establishment of intercommunication standards for point-to-point connections in coarse-grained parallel computational architectures. However, for applications with demanding input/output operations, the committee does not recommend new real-time development systems or operating systems.
The most demanding VE system will require powerful real-time input/output capabilities to handle haptic interfaces, trackers, visual displays,
and auditory displays. In robotics and telerobotics, in which the requirements are similar, there is a general movement toward coarse-grained systems based on point-to-point communications, such as transputers and C40 systems. Commercial development environments and real-time operating systems are adequate for such systems. However, there has not yet emerged a high-speed intercommunication standard for point-to-point computational architectures, which would offer users great flexibility in mixing and matching components across vendors and different processor types.
The committee anticipates that in the future most VE applications will rely heavily on network hardware and software. Although networks are now becoming fast enough for distributed VE applications, development is needed to provide the enormous bandwidth required to support multiple users, video, audio, haptics, and possibly the exchange of three-dimensional interaction primitives and models in real time. Moreover, handling the mix of data over network links will require new applications level protocols and techniques. Because of the central nature of network technology to the implementation of VEs, the committee sees network hardware and software development as critical to advancing the science of VE and its applications. However, we believe that the hardware necessary to support VE applications will be developed without intervention from the VE research community. In other words, there are forces in both the federal government and the private sector that are driving major advances in hardware. As a result, we do not recommend additional investment in network hardware development at this time.
Nevertheless, it is important to acknowledge the existence of significant infrastructure problems that could impede the use of networks for VE applications. For these problems, specific effort should be provided in support of VE requirements. One infrastructure issue is the high cost of research on large-scale networked VEs. A very limited number of universities can afford to have dedicated T-1 lines (with installation expenses of $40,000 and operating costs of $140,000 per year, as for the Defense Simulation Internet currently) needed to support these activities. Various approaches, such as an open VE network and the necessary VE applications protocol, should be considered for providing research universities with access to the needed facilities. Unless costs are significantly reduced, it will not be possible to initiate a concerted effort to develop software solutions for networked VE.
Perhaps our greatest infrastructure concern is the need for the development of network standards that will be compatible with the long-range
needs of distributed VEs. One danger is that the entertainment industry, with its interest in interactive games for the home, will set the networking protocol standards at the low end, and the military community will set the standards at the high end. Therefore:
RECOMMENDATION: The committee recommends that the federal government provide funding for a program (to be conducted with industry and academia in collaboration) aimed at developing network standards that support the requirements for implementing distributed VEs on a large scale. Furthermore, we recommend funding of an open VE network that can be used by researchers, at a reasonable cost, to experiment with various VE network software developments and applications.
Evaluation of SE Systems
RECOMMENDATION: The committee recommends that the federal government encourage the SE system developers it supports to include a comprehensive evaluation plan in the early design stages of their research projects. It also recommends that the federal government help coordinate the development of standardized testing procedures for use across studies, systems, and laboratories, particularly in those areas in which the private sector has not acted.
SE technology is in the early stages of development, is growing rapidly, and is the subject of highly optimistic projections about its usefulness. In contrast, the extent to which its usefulness has actually been seriously evaluated is vanishingly small.
In general, evaluations are required not only to compare overall cost effectiveness of SE approaches with other approaches addressed to the same goals, but also to provide insights to guide modifications and new design directions. To be optimally effective, such evaluations must take place both at the overall system level, at the component level, and at all stages of the development process. Although many of the specific questions to be addressed in an evaluation effort are likely to depend to some degree on the structure and purpose of the system or component in question, it should be possible to determine a common framework for a substantial portion of the evaluation needs.
In order to help ensure adequate SE evaluation, the federal government should encourage individuals involved in federal supported research and development to include serious evaluation plans in the design of their projects. Such plans should address questions about engineering performance, user needs and acceptance, dependence of human performance and safety on various system or component features, and costs of
development, implementation, and marketing. Furthermore, in order to facilitate consistency across SE projects, the federal government should help coordinate the development of standardized testing procedures for use across studies, systems, and laboratories, particularly in those areas in which the private sector has not acted. These procedures should include methods for identifying key system dimensions that affect task performance, developing special metrics uniquely suited to evaluating SEs, and comparing SE system performance to performance of other systems intended to meet the same or similar goals.
Suggestions for Government Policy and Infrastructure
The magnitude, quality, and effect of the SE-oriented research and development that is accomplished will clearly depend on the role played by the federal government. The current status of the SE field is sufficiently embryonic, compared with what is likely to develop over the next 10 years, that the federal government now has a rare opportunity to foster coherent planning in this area. Furthermore, the recently established National Science and Technology Council at the White House would appear to be an appropriate organization to provide oversight for such a planning effort. Also, in conducting such a planning effort, substantial benefits would be gained by attending carefully to the developments that are already taking place in the other areas of the administration's planning effort—for example, the Advanced Technology Program of the National Institute of Standards and Technology, the High Performance Computing and Communications program, and the programs associated with defense conversion.
In this section, we discuss a number of mechanisms that illustrate the kind of leadership role that the government could play. We see that role as both informing and complementing the federal agencies' strategic planning for their support of research and development programs.
Establish an Effective Information Infrastructure
A national information system that provides comprehensive coverage of research activities and results in the SE field in a user-friendly way to a wide variety of users could be a useful tool for promoting cross-fertilization and integration of the research and development efforts. The free flow of ideas and information among researchers, users, and individuals in government, academia, and industry who require information for SE planning and decision making is crucial to the development of this new field. Also, in order to diminish the increasing threat of a major societal division between the technologically advantaged and the technologically deprived (as well as to counter the current hype about virtual
reality), the public should have information of the appropriate type in an easily available form. Although information by itself cannot prevent such a division, it is a necessary ingredient of any program that could.
We suggest that the federal government consider establishing a national information system in order to promote these vital communication goals. To reduce costs and to realize potential benefits as soon as possible, consideration could be given to integrating the SE information system with other public information systems currently being developed. For example, such a system might be an ideal component of the national digital library based on high-speed networking envisioned in the National Information Infrastructure (NII) initiative of the Clinton administration. Issues of ownership and control, as well as technological issues, will be important to consider in the design of an SE information system.
To some extent, the technology, procedures, and ideas being developed within the SE field itself could be usefully exploited in the design of the SE information system. Such a system might eventually have uses well beyond those initially envisioned; for example, it might include a library of computational models. Although for many years there has been a tendency for scientists to express their understanding of various systems and phenomena in terms of computational models, this tendency is clearly being accelerated by the role such models play in the generation of VEs. Indeed, it seems possible that, in the near future, computational models will constitute one of the society's primary forms of knowledge representation. Thus, for example, reading a book about Newtonian mechanics is likely to be augmented by interacting with a virtual world based on a computational model that includes Newtonian mechanics and then, perhaps, "reading" the computational model. The same kind of evolution is, of course, occurring with fiction and imaginary worlds; independent of whether a structure or a series of events is real or imaginary, much of the relevant information can be stored in the form of a computational model. In order to make such computational models available to society, the federal government might consider establishing a national system for standardizing, collecting, storing, and disseminating such models. In view of society's current concerns with health care, initial efforts in this area might be focused on computational models related to the structure and function of the human body and modifications of the human body associated with injury, trauma, disease, aging, and medical and surgical treatments.
Encourage Appropriate Organizational Structures and Behaviors
Two major factors that could inhibit advances in SE involve the ability of researchers to communicate and cooperate across disciplines and
across organizations. Because the creation of effective SE systems requires contributions from many different disciplines (with many different associated cultures), special efforts are required to ensure adequate communication and cooperation across disciplines. Similarly, because of the high value placed on competition within our society, special efforts are needed to ensure adequate communication and cooperation across government agencies, military branches, industrial firms, and academic institutions. At present, the organizational barrier appears to be more debilitating than the disciplinary (or cultural) barrier. In fact, the lack of cooperation among competing organizational entities (for example, competing companies) probably constitutes the main obstacle to achieving a truly satisfactory solution to the information infrastructure problem discussed above. Consideration of explicit incentives for cooperative behavior might be very useful.
In order to reduce these problems, the committee suggests that the federal government consider establishing a small number of national research and development teams, each of which would focus on a specific application area. These teams could involve government, industry, and academia, as well as the various disciplines relevant to the given application area. Funding could be provided jointly by the federal government and the private sector.
The work to be performed by each national research and development team would include basic research, technology development, functional prototypes, technology evaluation, and technology transfer to industry. Despite the emphasis on applications that is implied by how the teams are defined, the work could be directed toward long-term as well as short-term goals, and the basic research needed to achieve these goals would then be a priority for support. Also, to the extent feasible, it might make sense to connect these collaborative teams or applications consortia not only to already existing federal activities (as has been the case, for example, with the textile partnership AMTEX that is being managed by the Department of Energy), but also to already existing professional societies. In setting up these teams, the choice of leadership for the activity will be crucial. In some cases, federal leadership may be appropriate; in others, industrial; in still others, academic. Finally, each of the envisioned teams might well find it appropriate to develop a powerful networked communication system among its members to ensure true collaboration at the working level.
Use SE Systems Within the Government
It might be useful for some federal agencies and offices to explore the use of SE to meet their own administrative and program needs. In addition
to the application of SEs to the defense and space programs already under way, other application domains, such as training, telecommunication and teletravel, and information visualization, are relevant to the activities of many agencies. One way for the government to facilitate development of the SE field would be to select a few agencies to serve as test beds for synthetic environment technology in these general domains.
There are a number of reasons for suggesting that the government make use of SE systems in conducting its own activities. Government agencies (local as well as federal) are natural early users of new technology: they could help spearhead development efforts and provide feedback to the developers. Also, such use could increase the cost-effectiveness of government activities. In addition, such use could create a market for SE systems and thus stimulate private industry to become involved in the design and production of SE systems.
At present, uses within the government that are receiving the most attention are those associated with the Department of Defense; however, other entities, such as NASA and the Department of Energy, are also involved. Although military applications trail behind those associated with the entertainment industry as an economic driving force, they nevertheless constitute a force that is significant. This significance is derived not only from the overall magnitude of the associated economic activity, but also from the special role played by defense agencies in stimulating the development of relatively high-quality systems for military applications. Also of interest in this connection are the current efforts to explore the use of SE systems in the Department of Defense for education and training. If the results of these studies are positive, they could play a significant role in stimulating the use of SE systems for education and training throughout the nation as a whole.
The use of SE systems in NASA appears to hold great potential not only with respect to training people for operations in hostile environments, but also with respect to performing the operations themselves. The enormous expense associated with manned space flights and space stations may well serve as a strong stimulus to the use of teleoperation in space activities.
Developing National Standards and Regulations
Although it is probably too early in the development of SE systems to establish national standards and regulations, it is not too early to begin to evaluate the work already under way in connection with the formulation of standards and regulations for the telecommunications and entertainment industries. Problems that are already of concern but are likely to become of even greater concern as the SE field develops relate to technological
compatibility and interoperability issues, enforcement and control issues, and social and ethical issues. For example, in the technological area, problems related to the timing of information flow in SE networks merit special consideration. Similarly, in the social and ethical area, the potential of SE for providing participants with powerful emotional experiences (including those related to sex and violence) needs to be addressed.
In general, it appears that SE, because of its mass entertainment potential, is likely to become one of the largest uses of high-speed communication networks, and its use should have an early and continuing part in the development of standards, regulatory principles, and tariff-setting models for such networks. The recent congressional attention that has been given to the kinds of material that are appropriate for the media to present is but a mild precursor to the public debate that is likely to arise when advanced VE technology becomes widely available.
It will be critical for the federal government to consider VEs in the formulation of national standards and regulations. Studies could be undertaken to illuminate issues related to technological compatibility and interoperability, enforcement and control, and social and ethical problems raised by the use of VEs in society.
Analyze and Evaluate Market Forces and Societal Impact
The extent to which government funds will be directed toward specific SE research depends, at least in part, on the likelihood that such projects will be funded independently, i.e., by industry. Estimating this likelihood requires not only an analysis of current market forces, but also predictions of how market forces will evolve in the future. Although such predictions are notoriously difficult to make with accuracy, and market forces are as likely to be shaped by the results of the research and development as they are to shape the research and development that is performed, failure to consider market forces in making funding decisions is likely to seriously reduce the extent to which the funding is effective in advancing the field. For these reasons, it would be prudent for the federal government to monitor market forces as part of developing its strategic plan for the allocation of scarce resources.
As with most other technologies, the effects of the advances in SE are likely to be mixed; some effects will be positive and others negative. And as with the predictions of market forces, although accurate predictions of societal impact are difficult to derive, serious attempts to consider such factors would be decidedly worthwhile. It cannot be assumed that all technological advances, even those that are likely to have substantial practical applications, will necessarily be beneficial.