Some Psychological Considerations
Because human beings are an essential component of all synthetic environment (SE) systems, there are very few areas of psychology that are not relevant to the design, use, and evaluation of SE systems. For example, if the system under consideration is a virtual environment (VE) system that is intended to provide realistic simulations, then all the issues relevant to the identification of the effective stimulus in real environments, as well as the issues that focus on how equivalent perceptions or responses can be achieved with more simply synthesized artificial stimuli, must be examined. If it is a VE system that is intended to maximize information transfer to the user and incorporates special distortions for this purpose, or if it is a teleoperator system that incorporates a nonanthropomorphic telerobot, then all the issues relevant to the perception of, adaptation to, and learning about altered perceptual cue systems must be considered. In addition, to the extent that the system can be thought of as an extension of traditional manual control systems, many of the concepts and findings relevant to such systems are likely to be applicable. Further issues arise in connection with higher-level processes related to learning and the formation of problem-solving strategies and cognitive models, as well as with the effects of SE experience on affect, motivation, personality, etc. A similarly broad range of issues is generated when one scans across the various application areas of SE systems.
The topics covered in this chapter—which represent only a minute sample of all relevant topics—were chosen to illustrate some of the types of issues that need to be considered. Although the topic of discomfort
obviously contains elements outside the domain of psychology, it is included here for convenience.
RESOLUTION, ILLUSIONS, AND INFORMATION TRANSFER
Perhaps the most obvious kinds of knowledge about human perception and performance that are needed to design cost-effective SE systems concern the resolution of the human's input and output systems and the way in which effective resolving power is changed as these systems are integrated with SE interface systems having various kinds of displays and controls. (The term resolution refers here to the ability to separate out and independently sense different signals as well as to detect small changes in isolated signals.) Given such knowledge, one can then examine implications for task performance for various types of tasks, and the cost-performance trade-offs for these tasks.
Knowledge of normal human resolving power on the input side, i.e., the sensory side, allows one to predict the display resolution beyond which finer resolution could not be perceived and would therefore be wasted. A similar statement holds for the output, i.e., control, side. Although knowledge of human resolving power in vision and audition is incomplete, it is sufficiently advanced to provide designers of SE systems with solid background for design choices. Areas in which current knowledge is considerably less adequate include both the input (sensory) side and the output (motor) side of the haptic system, as well as the ways in which performance is degraded when displays and controls (in any of the modalities) with less-than-human resolution are used. Information on resolution for specific modalities (e.g., vision) is provided in the chapters concerned with these modalities.
A further and related set of issues that is important to consider in the design of SE systems concerns perceptual illusions. Generally speaking, a given perception is thought of as illusory to the extent that it appears to be generated by a stimulus configuration that is different from the actual one. VEs themselves can be regarded as integrated sets of illusions. Detailed study of both intrasensory and intersensory illusions is important because, in many cases, the existence of illusions enables SE system design to be simplified and therefore to increase its cost-effectiveness. At the opposite end of the spectrum, the occurrence of unexpected illusions can seriously interfere with the expected performance of the system. Elicitation of motion sickness often involves the occurrence of illusions concerning the position, orientation, and movements of various portions of the body.
It is possible to regard certain types of illusions, such as the illusion of continuous motion that can be generated by sequences of static images at
rates of 30 Hz, merely as reflections of imperfect sensory resolution and therefore to assume that studies of resolution will automatically include studies of illusions. However, other types of illusions, such as the Muller-Lyer illusion, are more appropriately characterized in terms of response bias and therefore cannot be regarded in this manner. Thus, it is necessary to consider illusions as a separate issue from resolution.
Much of the past work on illusions has focused on the visual channel and on the implications of these illusions for theories of visual perception and cognition. However, some results, such as the continuous motion illusion just cited, clearly have direct implications for SE design. Other illustrative results relevant to SE design include those on the dominance of vision over audition and haptics in cases of intermodality conflict (e.g., as evidenced in the ventriloquist effect) and on the use of auditory stimuli to improve the perception of events that are represented primarily in the visual or haptic domains (as in the use of sound effects). Material on illusory effects for vision can be found in Howard and Templeton (1966); for audition in Bregman (1990); and for haptics in Loomis and Lederman (1986), Hogan et al. (1990), and Fasse et al. (1990).
It should also be noted that relatively little work has been done on sensorimotor illusions associated with whole-body movements. The factors involved in these illusions, which usually involve the perception of body movement, support surface stability, and visual field stability, are likely to be of considerable importance in SE designs that include voluntary locomotion through virtual space. Further material on these kinds of illusions can be found in Chapter 6.
Finally, it should be noted that the merging of data from different sources in augmented-reality systems is likely to lead to a whole new set of illusory effects that will require study. Relatively little is known about the effects of different merging techniques (even if one restricts one's attention solely to the merging of visual images).
Issues related to information transfer rates tend to be very complex because such rates depend not only on basic resolving power, but also on factors related to learning, memory, and perceptual organization. With respect to information transfer rate, an SE system can be thought of as consisting of a human operator, an artificial machine (a computer or telerobot), and a two-way communication link consisting of displays that send information from the machine to the human operator and controls that send information from the human operator to the machine. One of the main goals in such systems is to optimize the efficiency of communication in both directions. For many purposes, it is useful to characterize the imperfections in the communication channels in terms of statistical variability (noise), to include in this noise both channel noise and noise internal to the human and/or the machine, and to measure the efficiency
of the communication by the information transfer rate. Crudely speaking, the information transfer is defined as the information gain resulting from the communication, which in turn can be defined as ''how much more the receiving system knows about the state of the transmitting system after the communication signal is received than before it is received." The information transfer rate is then defined simply as the rate at which information is transferred. Within this context, a good human-machine interaction technique is one in which the information transfer rate is high and the amount of training required to achieve this high rate is low. Extensive background on the use of information theory concepts in characterizing human performance is available in Quastler (1955), Garner (1962, 1974), Sheridan and Ferrell (1974), Stelmach (1978), and Rasmussen (1986).
In general, in order to get high information rates into the human operator via displays or out of the human operator via controls, the human operator must be very familiar with the information coding scheme employed. Perhaps some of the coding schemes with which individuals are most familiar are those related to language. Estimates of maximum information transfer rates involving language reception and transmission in various modalities have been presented by Reed and Durlach (1994). The results indicate that maximum rates for reading English (vision), listening to spoken English (hearing), and observing the signs in American Sign Language are all roughly the same and lie in the range 60-70 bits/s. Reception of language in the haptic domain (by means of Grade 2 Braille, the feeling of signs in sign language, or by the Tadoma method, in which certain deaf and blind individuals receive speech tactually by placing one's hand on the face of the talker and monitoring the mechanical actions of the speech articulation system) shows maximum rates of roughly one-half those obtained via the visual and auditory channels.
The maximum output rate for the motor actions of the speech articulators in speech production is estimated to be roughly the same as the maximum rate for listening to speech (60 bits/s) and for the motor actions of the hands in typing to be roughly 20 bits/s. To the extent that (1) the assumptions that underlie these results are valid (in particular, that the estimate of 1.4 bits of information per letter for sentence length segments of speech is reasonable) and (2) these results provide an upper bound on the information transfer rates that can be achieved in communicating between human operator and machine in SE systems, the amount by which the results achieved with a given SE system fall below these rates provides a measure of the room for improvement. It should also be noted that the figure of 20 bits/s appears to provide an upper bound on the rate that can be achieved in simple discrete spatial tracking tasks (e.g., one in which the transmitted signal consists of lighting up a randomly selected square in a checkerboard array presented on a visual display and the
correct response consists of touching the lit square with a finger or directing one's gaze at the lit square).
Unfortunately, although there are a number of general statements that can be made about the properties of a good coding system (e.g., it should be well matched to the properties of the sensory or motor system involved, it should make use of perceptually high-dimensional stimuli or responses, it should have cognitive properties that make it easy to learn, etc.), there is no theory available that enables one to make reliable predictions of performance as a function of the detailed coding scheme and detailed training procedures employed.
These problems become even more challenging when one replaces the individual human operator by a collaborative team. In such cases, the system designer must also consider how to best break up the input information and output control among a number of individuals.
MANUAL CONTROL, TRACKING, AND HUMAN OPERATOR MODELS
In SE systems, motor outputs of the operator are used to control the behavior of a simulation or a telerobot. Furthermore, these outputs usually take account of feedback and occur in real time. In an important sense, therefore, such systems can be regarded as descendants of traditional manual control systems, with their emphasis on tracking paradigms and human operator models. The term manual control is taken to mean the receipt of sensory information about the desired state of a system and its current state by a human operator and the use of that information by the operator to command inputs to the system through mechanical devices (hand controllers, pedals, etc.) so as to minimize some function of the error between those two states (Sheridan and Ferrell, 1974). The principal differences between current SE systems and traditional manual control systems consist of the increased variety and complexity of the displays and controls, as well as of the constraints imposed on the relevant transfer functions.
Research in manual control was most active in the 1950s, 1960s, and 1970s due to the interest in operator control of aircraft, automobiles, and other vehicles (a comprehensive overview is found in Sheridan and Ferrell, 1974). Additional useful references include Jex (1971), Kleinman et al. (1974), and McRuer et al. (1965). In manual control systems, there is a closed loop that includes the behavior of the human operator (human operator dynamics), the system or plant being controlled, and feedback to the operator regarding the performance goals and plant state. Classification of manual control systems according to the type of input to the human operator is as follows.
Compensator The operator controls the system to reduce the error between the state of the system and a fixed reference state.
Pursuit The operator controls the system to reduce the error between the state of the system and a changing reference state.
Preview The operator, having knowledge of future values of the reference state, controls the system to reduce the error between the state of the system and a changing reference state.
Precognitive The operator, having foreknowledge of input in terms other than a direct view (for example, statistics on the input), controls the system to reduce the error between the state of the system and a changing reference state.
Human operator dynamics influence the closed-loop performance in manual control systems as much as do the plant dynamics. For example, the control of aircraft cannot be predicted unless a model of the pilot is factored in as well (Sheridan, 1992c). Initially it was thought that a human operator model independent of the manual control task could be formulated; however, research shows that, depending on the controlled process, the behavior of the human operator is modified to achieve satisfactory performance. A number of models of the human operator of varying complexity have been proposed; the more well-known of these are listed below.
Linear, quasilinear models The simplest is a linear model of the operator with adjustable gain and a remnant (noise). These models have been further extended to handle reaction-time delay (Sheridan and Ferrell, 1974).
Crossover model Experiments by Elkind (1956) and McRuer et al. (1965) show that operator behavior was dependent on the error signal; the operator was found to change his dynamics so that the combined system had good servo behavior at crossover frequency. These results were valid for compensatory tracking tasks but were not valid for high-bandwidth tasks or control of high-order linear and highly nonlinear systems.
Optimal control model These are another class of models of the human operator in manual control tasks based on results obtained from compensatory tracking experiments in which the operator is modeled as an optimal controller within limits of internal constraints and knowledge of task objectives (Bryson and Ho, 1975).
As human manual control behavior became better characterized and the role of the human operator in teleoperation changed to performing complex manual and decision-making tasks in unstructured environments, research shifted to modeling the higher-level functions performed by the operator (Baron et al., 1982; Kok and Van Wijk, 1977; Rasmussen,
1983; Sheridan, 1976, 1992c). Current related work has focused on the incorporation of manual control models of the operator in the characterization of the human operator as a supervisor and decision maker.
Subsystem Decomposition of the Human Operator
The human operator models mentioned above are black box models based on control theory formulations. The original ideal of developing human operator models independent of the task is still a good one; however, this requires a detailed understanding of cognitive, perceptual, and motor functions that is still far from complete. A subsystem decomposition of the human operator for pursuit tracking based on physiology would include the following elements (Jones and Hunter, 1990):
Sensory processing, in which the central nervous system (CNS) measures tracking error (actual minus desired target position) either visually or kinaesthetically;
Cognitive processing, in which the CNS generates appropriate central commands to the motor neuron pool in the spinal cord;
Excitation/contraction coupling (muscle), which reflects the propagation of action potentials through muscle as a result of efferent nerve input and the initiation of contractile mechanisms (cross-bridges), which produce muscle force;
Limb mechanics, in which muscle force generates limb displacement; and
Reflexes and nerve delays in which displacements of the limb excite muscle spindles that feedback to the motor neuron pool in the spinal cord over afferent nerves and sum with the CNS-generated motor neuron input to produce muscle activation over efferent nerves.
Each subsystem contributes its own dynamics to the overall human operator dynamics. Delays associated with each subsystem are different. Whereas the transformations are linear and time-invariant for some subsystems, they are nonlinear and time-varying for others. For the neuromuscular subsystems, recent advances in stochastic system identification techniques now make it possible to determine the dynamics of each subsystem for every individual operator (Kearney and Hunter, 1987).
Methodologically, the dynamics of the sensory and cognitive subsystems are what remain when the dynamics of all the other subsystems have been subtracted from overall human operator performance. Of course, modeling the cognitive aspect is the most difficult and incomplete. Certain aspects of the modeling of the sensory attributes are discussed in Chapter 1 for vision, Chapter 2 for audition, and Chapter 3 for mechanical interface variables. In particular, a knowledge of the sensory
resolution limits for length, force, stiffness, viscosity, and mass is important for understanding human operator performance and for design of haptic interfaces (Jones and Hunter, 1992; see also Durlach et al., 1989; Pang et al., 1991; Tan et al., 1992, 1993). Increasing the stiffness of a manipulandum decreases the response time but also the accuracy (Jones and Hunter, 1990) while increasing the viscosity decreases the delay and the natural frequency of the human operator (Jones and Hunter, 1993). This enhancement of performance with the addition of stiffness and viscosity to the manipulandum may be due to an increase in proprioceptive feedback from the periphery.
An ongoing area of research is the substitution of feedback to alternative sensory channels of the operator, that is, presentation of sensor information to the operator in a sensory channel other than that in which it was sensed by the teleoperation system. There are a number of reasons for choosing sensory substitution:
Shielding the operator from hazards but still conveying information on the conditions of the environment, for example, in chemical spill, high temperature, and radioactive environments.
Presenting sensor information in visual form, for example, dials, gauges, etc., for lack of other suitable choices.
Using the higher sensitivity available in the alternative operator sensory channel, for example, the representation of temperature may be to tenths of degrees on a visual display; this is far above the ability of the operator to discriminate when sensing actual temperature.
Reducing the cognitive load on the operator.
Overcoming the drawbacks of force reflection due to instabilities when operating with time delay. Additional reasons are to reduce the size of hand controllers and eliminate reaction forces on the operator.
Reports of research in this field have concentrated on sensory substitution of force reflection. Some early work in this area was that of Bertsche et al. (1977) and Bejczy (1982) on the use of visual displays of force feedback. More recently, Massimino and Sheridan (1993) documented the use of tactile and auditory displays for force feedback. There are many more applications yet to be tested, particularly in situations in which data from different types of sensors are to be fused into a single measure of interest. Further results on sensory substitution are available from studies concerned with aiding individuals who are deaf or blind or deaf and blind (e.g., Bach-y-Rita, 1972, 1992; Reed et al., 1982; Warren and Strelow, 1985; Reed et al., 1989).
THE SUBJECTIVE SENSE OF TELEPRESENCE
One major feature of a user's experience when operating an SE system concerns the extent to which the user is immersed in, and actually feels present in, the remote or synthesized environment, i.e., the extent to which subjective telepresence occurs. Whereas objective telepresence refers to the use of teleoperator technology for sensing and manipulating remote entities, subjective telepresence refers to the sensations and perceptions experienced by the user. For simplicity, in the following remarks on subjective telepresence, the modifier subjective is dropped and we use simply the term telepresence. Also, because essentially all the issues with which we are concerned here are independent of the distinction between VE and teleoperation, in the following remarks we ignore this distinction; the term telepresence is used in connection with all types of SE systems. (Those wishing to make such a distinction might use the term virtual presence for VEs, as suggested by Sheridan, 1992a.)
At present, there appear to be three main questions of interest in relation to the concept of telepresence: How should telepresence be defined operationally? How can one create telepresence? What is telepresence good for?
Although there have been a number of discussions of telepresence in the literature (e.g., Akin et al., 1983; Fontaine, 1992; Heeter, 1992; Held and Durlach, 1991, 1992; Loomis, 1992; Pepper and Hightower, 1984; Schloerb, 1994; Sheridan, 1992b; Steuer, 1992; Zeltzer, 1992), there is still no generally agreed-on operational definition of telepresence. It should be noted, however, that a serious effort in this direction has recently been initiated by Schloerb (1994) and that a number of individuals are now attempting to conduct empirical research on telepresence.
Almost all of the articles just cited attempt not only to define telepresence, but also to identify the factors that play a role in the creation of telepresence. Although the particular set of factors considered or emphasized varies with the author's viewpoint, there are a number of factors that are relatively obvious. One set of such factors concerns the exclusion of stimuli originating in the immediate environment. In other words, the sense of telepresence is likely to be reduced if the operator is constantly reminded of his or her presence in the real environment by stimulation originating in this environment. Such stimulation can arise from sources outside the system (e.g., the auditory component of the human-machine interface provides inadequate attenuation to prevent the operator of the
SE system from hearing a door slam in the room that houses the SE system) or from the system itself (e.g., the helmet used for the visual display is too intrusive to be ignored).
A second set of such factors concerns the existence of user-predictable interactivity. Telepresence is likely to increase when the user's actions, and the consequences of these actions as represented by the subsequent stimuli sensed by the user (i.e., the feedback), constitute a rich and easily perceived and influenced interaction pattern. When the synthetic world is highly realistic, such conditions will be satisfied automatically. When it is unrealistic, the extent to which they are satisfied will depend on the extent to which the user can adapt to the new world. As the user adapts, the degree of telepresence (and the transparency of the interface) will generally tend to increase. The extent to which the user can adapt to the new world, however, will depend strongly on both the nature of this world and the nature of the user's exposure to it. If the world is incomprehensible (either because the relations between the user's actions and the effects of these actions are random or simply because they are so complicated that they appear random), adaptation will not occur. Further discussion of adaptation is presented in the next subsection.
A third set of such factors relates to higher-level, more cognitive features of telepresence and to similar experiences that occur outside the domain of SE technology. In general, the ability of humans to be transported into unreal worlds is the basis of most art, literature, theater, and entertainment, not to mention hypnosis and the use of hallucinogenic drugs. The extent to which consideration of these other forms of transportation will prove useful in the study of SE telepresence is not yet clear. It does appear likely, however, that the variation among individuals in their susceptibility to transportation by these other methods is likely to also occur with SE telepresence.
The question "What is telepresence good for?" has not yet been adequately answered. The interest in the concept of telepresence is due in part to intrinsic philosophical and scientific interest in issues concerned with reality and illusion. It is also due in part, however, to an implicit assumption that a high degree of telepresence is positively correlated with good performance. That this is not generally the case, however, can be easily demonstrated merely by noting that one of the primary motivations for the use of teleoperator systems in hazardous environments is to prevent the operator from experiencing noxious stimuli present in the real environment (i.e., reducing the sense of presence in the real environment). In general, the relationship between telepresence and performance has not yet been determined. Furthermore, even if telepresence were adequately defined in an operational sense, and even if it were determined using this operational definition that in most situations telepresence and
performance were highly correlated, it still would not be clear that the concept of telepresence has practical significance. In particular, it is not clear that it would enable one to design better SE systems. In order for this to be the case, it would be necessary to show that models and measurements of telepresence can be usefully substituted for models and measurements of performance, or at the very least, that models and measurements of telepresence provide significant added value to the results that can be achieved solely through the use of models and measurements of performance.
ALTERATIONS OF SENSORIMOTOR LOOPS
In practically all SE systems, the human operator's normal sensorimotor loops will be altered by the presence of distortions, time delays, and noise (statistical variability) in the system. In many cases, such alterations will be introduced unintentionally and will degrade performance. For example, time delays may result from the need to communicate over long distances, and time delays, noise, and unwanted distortions may result from the inclusion of imperfect system components. In some cases, however, these alterations (specifically distortions) may be introduced intentionally in an attempt to achieve performance that is better than normal—for example, in a teleoperator system that incorporates a telerobot that is intentionally nonanthropomorphic. Because of the lack of isomorphism (i.e., structural and functional similarity) between the operator and the telerobot in such systems, the mapping between the human operator and the telerobot will necessarily result in altered sensorimotor loops for the operator. Similar conditions will exist in all virtual environment systems in which special features of the environment are artificially emphasized and unrealistic methods for interacting with this environment are employed in an effort to achieve superior task performance. Attempts to achieve improved resolution by magnification of perceptual cues represent only one line of investigation in this area.
Independent of the nature and origin of the alteration, in order for a system designer to predict the performance of a candidate system, theoretical models must be available for characterizing human responses to the alterations associated with the use of the system. Such models should be able to predict the effect of the alterations on such variables as simulator sickness (and also, perhaps, telepresence), as well as on objective task performance, and to describe how the various response components change over time due to sensorimotor adaptation and learning.
Although considerable work has been performed in this area (e.g., see the extensive review by Welch, 1978), there are as yet no adequate models available for predicting performance. For example, no adequate models
are available for specifying the amount of sensorimotor adaptation that is achievable with different kinds of distortions using different types of training procedures. Similarly, no attention has been given in the research on adaptation to changes in resolution; attention has been focused almost exclusively on changes in response bias (i.e., the deviation between the mean response and the correct response). Furthermore, with only minor exceptions, interactions among different kinds of alterations (distortions, time delays, and noise), many of which are likely to be present simultaneously in SE systems, have been ignored. A few modality-specific comments on sensorimotor adaptation are available in Chapters 2, 3, and 4. Extensive further discussion of sensorimotor adaptation in the context of whole-body motion is available in Chapter 6.
An important prerequisite for widespread use of SE systems is that they be comfortable for people to use. Independent of whether the discomfort caused by the system is most appropriately considered under the heading of "motion sickness," "poor ergonomics," or the ''sopite syndrome" (see Chapter 6 for a definition of this syndrome), such discomfort must be reduced sufficiently to permit individuals to make effective use of the system over extended periods of time. Despite significant previous research on some components of this problem, substantial further research in this area is warranted for a number of reasons.
First, as the situation now stands, discomfort is a real threat to the effective use of SEs. For example, quite apart from the deficiencies in currently available helmet-mounted displays with respect to the visual information provided, they tend to cause such a high degree of discomfort that daily long-term use seems almost out of the question. In fact, the combination of relatively limited visual information and relatively high discomfort is leading some individuals to seriously consider using off-head displays in their SE systems (discussion of both helmet-mounted displays and off-head displays is presented in Chapter 2).
Second, past work on the sources and effects of discomfort has not yet resulted in adequate understanding of the phenomena involved. More specifically, we do not yet know how the magnitude of each discomfort component depends on the characteristics of the system (the properties of the visual and auditory displays, the weight of the devices mounted on the head, the method by which movement through space is simulated, etc.) or on the characteristics of the individual user (including the user's prior experience with the system). Although some progress has been made in related areas (e.g., studies of motion sickness conducted in connection with flight simulators, sophisticated use of anthropometric measurements
for cockpit design), this progress has not yet led to adequate comfort for SE users.
Third, many of the situations created by evolving SE technology are new; the stimulus-response configurations to which individuals are exposed in SE systems are often not covered by those previously studied. In other words, the increased flexibility associated with the new technology provides us with opportunities not only to perform certain tasks with increased cost-effectiveness, but also to expose ourselves to situations that exhibit new kinds or magnitudes of discomfort.
Fourth—and illustrative of the point made elsewhere in this report about the use of SE systems as basic laboratory facilities for psychological research—evolving SE technology provides us with new tools to study some of these issues. Not only is work in this area essential to the realization of practical applications, but it should also advance our understanding of the human organism.
Further material related to the discomfort issue is available in later chapters. In particular, extensive discussion of motion sickness and the sopite syndrome appears in Chapter 6.
LEARNING AND PROBLEM SOLVING
Understanding how humans learn and solve problems is critical to the development of educational and training systems regardless of the nature of the instructional tools employed. VE is one such tool, and the more we know about how the mind works, the better able we will be to create experiences that facilitate learning by its use. Thus, in general, an appreciation of human cognition is an important element in using synthetic environment technology to alter human behavior.
Work in the development of cognitive models has a long history. As early as 1932, Bartlett developed the notion of a schemata as a large knowledge structure—the basic unit of memory and thought. Schemata are conceived to exist at all levels of abstraction and to be hierarchically organized and interrelated; they are used to comprehend new and complex situations and are, in turn, modified by experience. This concept was revived in the late 1960s by cognitive psychologists and computer scientists who used it as a paradigm to test hypothesis about human mental processes.
A similar concept was brought into use by computer scientists who were also concerned with modeling cognitive processes. They used the term frames to describe the organization, structure, and developmental process of human memory (Minsky, 1975; Kuipers, 1975). Much of their work involved creating computer programs that acquired knowledge, followed procedures as described in scripts, and solved problems in ways
that were hypothesized to mimic human problem-solving strategies (Newell and Simon, 1972). The results of these studies led to new hypotheses and more fully elaborated theories.
More recently, researchers have used cognitive models to develop intelligent tutors. One of the more long-standing research efforts in this area is the work of John Anderson and his colleagues at Carnegie Mellon University in which the ACT (Adoptive Control of Thought) theory of learning and problem solving was used to build intelligent tutors in algebra, geometry, and LISP programming language. The ACT theory makes a distinction between factual or declarative knowledge and procedural knowledge. Declarative knowledge involves the acquisition of facts (the content of a theorem); procedural knowledge in the form of production rules relates to the development of cognitive skill (the ability to apply a theorem). The early stages of learning are dominated by declarative knowledge; the later stages by procedural knowledge. According to a recent review by Anderson et al. (1993), the 10-year effort has led to further understanding of human cognition as well as to an appreciation for how to implement the system in the classroom. One important finding was that the original conception of tutoring as a process of human emulation changed to the notion of a tutor as a learning environment in which helpful information can be provided and useful problems can be selected.
Another important line of research in cognitive science is modeling the knowledge structures and judgments of experts and novices and comparing the two as a basis for understanding the nature of expertise and for training novices to become experts. For example, Chi et al. (1981) have examined the differences in the knowledge structures and problem approaches of expert and novice physicists as a way to better understand how the acquisition of knowledge and rules changes problem-solving strategies. The schema, algorithms, and heuristics used by experts were explicated by using such methods as cognitive task analysis or think-aloud protocols (Newell and Simon, 1972). According to Glaser et al. (1991), several knowledge models representing various stages in moving from a novice to an expert would be useful in guiding the learning process.
Using cognitive task analysis, Lesgold and his colleagues have described various stages of expertise in electronic troubleshooting as a basis for developing dynamic assessments of learner competence. The resulting computer system, known as Sherlock, uses information on the stages of expertise to track a learner's performance, to diagnose strengths and weaknesses in both knowledge and process, and to provide corrective feedback (Lesgold et al., 1990; Lajoie and Lesgold, 1992).
Although cognitive researchers have made considerable progress in
understanding how the mind works, there are still many issues to be examined. Specifically, we need to know much more about what aspects of synthetic environments will facilitate learning. Research is needed to further understand the relationship between content, types of tasks, the individual's knowledge state, and preferred information presentation features. Moreover, it will be important to examine the special contributions of immersive environments that are feasible through the use of VE compared with other formats for enhancing education and training. Of particular interest here is the opportunity for extensive sensorimotor involvement provided by VE systems.
A question that cuts across the objective-subjective boundary is whether immersive environments contain intrinsic advantages with respect to motivation or incentives to participate in entertainment and educational experiences. Again, it is too early for definitive answers. However, anecdotal reports suggest that young people can become so immersed in computer-based group games that they neglect such basic activities as eating and sleeping. In education, Neuman (1989) has shown that young people with learning difficulties have taken to computer-based instruction with enthusiasm that lasted longer than could be attributed to a novelty effect. Observations suggested that the children were transforming the otherwise ordinary lessons into competitive games. For young people, computers and game playing seem to be conceptually linked. Perhaps this link can be constructively exploited by giving VE instructional experiences a suitable game like quality to strengthen student engagement.
In more recent studies, another explanation for student enthusiasm has surfaced (Coombs, 1993; Morrison et al., 1993; Coldevin et al., 1993; Dubriel, 1993). According to this explanation, the computer appears to empower students who are otherwise educationally disadvantaged. The promise of VE is that its effects should be stronger than those of other computer-based instructional facilities. For example, students with physical handicaps could be given experiences via VE that they otherwise could not have. Likewise, students in remote settings could participate in experiences otherwise impossible to obtain—such as a stroll through a big city. Finally, VE might be an ideal means for simulating certain types of experiences for which there are emotional, political, or economic barriers. For example, animal rights advocates have made a political issue of the dissection of animals for biology instruction. VE could provide a means for learning anatomical details that would be free of negative emotional overtones and might be more economical.
Although some skeptics have argued that most of the observed advantages attributed to all forms of computer-based instruction derive from novelty effects, there is some additional evidence to the contrary. Introspective reports from adult learners confronting difficult material (Vasu and Vasu, 1993) and from teachers for whom the novelty had certainly worn off after repeated use of the same courseware (Novelli, 1993) indicate a well-sustained sense of continued interest in using computer-based resources.
ATTITUDE, OPINION FORMATION, AND PERSONALITY EFFECTS
From both scientific and social perspectives, problems exist with respect to the possibility that VE might change individuals' attitudes or self-perceptions. The most credible hypotheses are that such powers are limited by both technical and psychological factors. With regard to the psychological aspect, changes in attitudes and opinions are seen to be most effectively brought about by person-to person interactions. For example, it has long been known that personal contact plays a major role in expediting changes in the attitudes that control acceptance or rejection of innovations. Likewise, of the development of a sense of self appears to depend strongly on social transactions (e.g., Mead, 1934). In other words, one's idea of self—including self-regard or self-esteem—depends on what messages are sent day after day by a set of significant others. We do not yet know the extent to which and manner in which such social transactions change as the world in which the person is operating becomes more imaginary.
This emphasis, in both attitude formation and self-concept development, on social processes highlights specific technical properties of the VE situation. That is, how cost effectively can VE generate images of humans who can interact realistically with the subject? In particular, can VE generate images of specific significant others, such as family members? And how rigidly can or must such images be programmed in advance? Although social interaction might not be absolutely necessary to engender attitude change, such change may be greatly facilitated by social persuasion.
Finally, in the domain of personality development, there are possible changes that might be situationally but not socially induced. A good example is risk tolerance. A subject could be exposed via VE to situations in which normal physical laws do not operate, as in the mode of Wiley Coyote in a Roadrunner cartoon. After experiencing in a VE situation any number of events whereby the action of gravity was significantly delayed and the ultimate consequence of being hammered down by two tons of
rock was only a slight dizziness, would one tend to act less cautiously in the real world? The answer, based again on arguments by analogy, appears to be that some such shifts in perception are possible, but only if the conditions are tightly controlled. One crucial parameter to be controlled is task ambiguity. Research on effects of social influence (Kidd, 1958) shows that some effects can be induced very quickly if the situation is ambiguous and there are no serious consequences for making the wrong choice.
Another potentially crucial parameter is the initial attitudes of participants. If they are young, inexperienced, and uncertain about underlying event probabilities or vulnerable to certain forms of peer pressures, change can be readily induced. Such subjects could possibly be more influenced by elaborately contrived experiences in a VE situation—again, particularly if the VE system fabricated specific images of other people, such as important peers (Sjoberg and Torell, 1993; Benthin et al., 1993).
Finally, there is some anecdotal evidence that computer role-playing games and video games that portray violence may have some influence on individual attitudes and behavior (much of this preliminary research grows out of the efforts to demonstrate the influences of television on attitudes and behavior). For example, there are cases of individuals becoming so involved in playing Multiple User Dungeons (a computer fantasy game) that they leave little or no time for other activities. A recent article in the Washington Post (Schwartz, 1994) described a college student who flunked out of college and stayed up almost all night every night to play a fantasy character and interact with other fantasy characters in the Multiple User Dungeons game. Basically, this young man lost his real identity to a character in a game.
With regard to violent video games, there are some recent studies that suggest that children who play these games are more aggressive as a result. Some preliminary studies in this area include Fling et al. (1992), Funk (1992), and Cesarone (1994). These studies are based on questionnaire and survey results rather than empirical evidence of changes in performance. Nevertheless, they suggest that further attention should be given to the potential effects of such games, particularly as these games become more realistic and more interactive.
In summary, VE could probably be used to engender substantial changes in the psychic structure of participants. The magnitude of such changes will depend on the quality and types of images that the VE system can generate, the congruence of the inclusion of some form of controlled message content in the VE programming, the initial susceptibility of participants, and the ongoing willingness of such participants to accept messages or situations that appear to contradict or deviate from their other, non-VE experiences.
Many of the needs in the area of psychological research, implicitly outlined in the above discussion, will be automatically satisfied as the field of experimental psychology follows its normal evolutionary course of development. Without special effort, however, much of the information and understanding required to guide evolution of the SE field will be available too late to be useful. In order to significantly increase the cost-effectiveness of the SE research and development work, as well as to determine the likely psychological effects of heavy SE usage before these effects are prevalent throughout the society, substantial work must be done within the next few years.
As indicated previously, the goals of this work should be to develop (1) a comprehensive, coherently organized review of human performance characteristics from the viewpoint of SE systems; (2) a theory that facilitates quantitative predictions of human responses to alterations in sensorimotor loops that are likely to occur in SE systems; (3) cognitive models that will facilitate effective design of VE systems for purposes of education, training, and information visualization; and (4) increased understanding of the possible deleterious effects of spending substantial portions of time in SE systems. The important issue of user comfort is partially addressed by item 2 in that feelings of discomfort such as those associated with simulation sickness constitute a particular type of response to alterations of sensorimotor loops. Many other aspects of discomfort, such as those related to poorly fitting helmets for visual displays, are best thought of purely in terms of physical effects.