Position Tracking and Mapping
Position tracking and mapping of the human body and environments is a basic requirement that permeates virtual environment (VE) systems: (1) head and eye tracking for visual displays; (2) hand and arm tracking for haptic interfaces; (3) body tracking for locomotion and visual displays; (4) body surface mapping for facial expression recognizers, virtual clothiers, and medical telerobots; and (5) environment mapping to build a digitized geometrical model for simulation.
This review is approached from a standpoint of basic sensory systems for position tracking and mapping: mechanical linkages, magnetic sensors, optical sensors, acoustic sensors, and inertial sensors. These systems vary in such aspects as accuracy, resolution, sampling rate, latency, range, workspace, cost, encumbrance, convenience, susceptibility to obscuration, ease of calibration, the number of simultaneous measurements, and orientation versus position tracking. VE systems are likely to include a mix of these basic sensory systems, because each system has particular strengths and weaknesses and the requirements on position tracking depend on the particular application. We now attempt to state some requirements on position tracking for particular applications.
For normal arm movements during reaching, a fast motion is accomplished in about 0.5 ms, and wrist tangential velocities are about 3 m/s (Atkeson and Hollerbach, 1985) and accelerations are about 5-6 g. For the
fastest arm motions such as throwing a baseball, good pitchers release the ball at 37 m/s and accelerate their hands at more than 25 g. Motion bandwidths of normal arm movements are around 2 Hz (Neilson, 1972); the fastest hand motions including handwriting are at around 5-6 Hz (Brooks, 1990a, 1990b). In teleoperation, it is commonly presumed that 10 Hz is the maximum frequency of position commands from the human operator (Fischer et al., 1990).
If hand motion is being used to drive a telerobot or a dynamic simulation of an arm, then the general rule of thumb is that the sampling rate should be around 20 times the bandwidth (Franklin et al., 1990) in consideration of such factors as sensor noise. Taking 5 Hz as defining the frequency content of normal arm motion, then a sampling rate of roughly 100 Hz is called for. Andersson (1993) has proposed a virtual batting cage, in which a batter swings at virtual pitches shown through a head-mounted display (HMD). The bat must be tracked to simulate the hit (or miss). Andersson proposes that sampling rates of about 1 kHz are required to track such motions.
Latency requirements are determined by the psychophysical requirements of the application and are harder to define. For force feedback applications, the hand-tracking latencies must be very low, because the human arm is part of the control loop. For non-force-feedback applications, the hand-motion-to-visual-feedback lag can probably be much longer.
Eye movements can be as fast as 600 deg/s. The smallest time constant for saccades is around 50 ms; the smallest saccades can be finished in 60 ms. The power spectral densities can have significant power up to 50 Hz for position and 74 Hz for velocity (Bahill et al., 1981). Given again the engineering rule of thumb that the sampling rate should be 20 times the bandwidth for noisy measurements, it has been recommended that eye movements be sampled at 1 kHz (Inchingolo and Spanio, 1985). This should allow sufficiently precise tracking of the eye trajectory to characterize the movement time and end-point. As mentioned in Durlach et al. (1992), the eye sees continuous images when display temporal frequency is 60 Hz and above. With 1 kHz sampling rates for eye movement, display targets can be well chosen every 1/60th of a second.
Head movements can be as fast as 1000 deg/s in yaw, although more usual peak velocities are about 600 deg/s for yaw and 300 deg/s for pitch and roll (Foxlin, 1993). The frequency content of volitional head motion
falls off approximately as 1/f2, with most of the energy contained below 8 Hz and nothing detectable above about 15 Hz. Tracker-to-host reporting rates must therefore be at least 30 Hz. Delays of 60 ms or more between head motion and visual feedback are known to impair adaptation and the illusion of presence (Held and Durlach, 1987), and much smaller delays may cause simulator sickness. Head-trackers should therefore contribute as little as possible to system latency, no more than 10 ms for high-performance systems. Accuracy requirements are very application dependent. To maintain apparently perfect registration between the real and virtual worlds with infinite-resolution see-through HMDs, the absolute accuracy must be 0.01 deg for yaw and pitch and roll and about 0.03 mm for translation. For purely virtual opaque HMD applications, large offset errors are tolerable; tilt error must not exceed about 15 deg, because that would cause vestibular conflict. However, the resolution needs to be 0.03 deg for orientation and 0.1 mm for translation to achieve perfectly smooth, jitter-free motion.
BODY MAPPING AND TRACKING
For the body-part tracking applications above, it is sufficient to track a few points or landmarks on limb segments. If the position tracker does not directly yield orientation, then multiple-position measurements on a limb segment can be employed to derive orientation. Body motion can be considered to have similar measurement requirements to hand motion tracking.
By contrast, for body surface mapping or real-environment sensing, the position tracker must be able to scan a volume or surface to yield a dense array of points. The real-time requirements are typically absent when environmental reconstruction is the goal, although the image must be captured sufficiently rapidly if the environment (e.g., a body surface) can move. Accuracy requirements depend on the application; for example, in medical imaging systems, accuracies of 1 mm or better are desirable; for environmental mapping, accuracies of a few mm might be acceptable.
Additional information may be found in Durlach et al. (1992) and Meyer et al. (1992). Various implementations of the basic sensory system types in the context of head tracking are well covered in Meyer et al. (1992), and additional information on eye trackers appears in Durlach et al. (1992).
Mechanical trackers can be an inexpensive, relatively accurate means of tracking head or body-segment positions. Mechanical trackers can
measure up to full body motion and do not have intrinsic latencies. Force reflection is readily incorporated by mounting actuators at the linkage joints. We distinguish two types, depending on whether the mechanical linkages are entirely worn (body-based) or are partly attached to the ground (ground-based).
Body-based linkages—exoskeletons or goniometers—have been frequently used in biomechanics for joint angle measurement. For haptic interfaces, they form the basis for master gloves and force-reflecting exoskeletons. Because they are worn, goniometers are portable and facilitate mobility; however, if there is body motion, some other tracking method is also required. They typically have the same workspace as the natural motion of the attached limbs and hence permit the full range of normal motion to be measured.
We may distinguish two ways to use the goniometric data: to infer joint angles and to infer end-point positions. The latter could be employed, for example, to track hand position relative to the body or fingertip position relative to the hand. Then the goniometer may be viewed like a hand controller that is manipulated by the operator; the output is based on calculations from the goniometers angles, and there is no concern as to how this maps to limb joint angles.
For inferring joint angles, there are significant difficulties. Attachment is problematic, as it is for other tracking technologies, because of the soft tissue and potential for relative motion between the goniometer and limb. How best to clamp perturbation devices to limbs has been a concern for the biomechanics community; for the arm, tight clamping to the wrist is possible because of its bony features (Jones et al., 1991; Xu et al., 1991).
It is difficult to align goniometers with joints, especially for multiple-degree-of-freedom (DOF) joints such as the shoulder. Since goniometers are exoskeletons outside the limb, the centers of rotation of the goniometer differ from the joint rotation centers. Due to this kinematic mismatch, there has to be slippage between the goniometer attachments and the limb during motion. One solution in the context of hand masters is to employ four-bar linkages to project the measured centers of rotation to the finger joints (Rohling et al., 1993); because distances between joint centers are lost with this method, a finger calibration scheme for each individual is required (Rohling and Hollerbach, 1993).
Accentuating this problem is that human joints are not perfect hinge joints or spherical joints: the axes of rotation move with the joint angles. This problem is shared by any tracking method that seeks to determine joint angles. According to Rohling and Hollerbach (1993), a master glove
in conjunction with an Optotrak 3D Motion Tracking System (Northern Digital, Waterloo, Ontario) was employed to calibrate the human hand geometry for fingertip control during teleoperation. It was found that the resulting fingertip accuracy of a few millimeters was primarily limited by assumptions of ideal joint structures (e.g., hinge joints) and by the accuracy of joint angle measurements by the master glove due to relative movement.
More information on goniometers can be found in Chapter 4 on haptic interfaces.
Ground-based linkages are primarily used for 6-DOF end-point tracking such as the position and orientation of the head or of the hand. Hence the issues surrounding joint angle measurement do not typically arise. It is assumed that the human operator can grasp the manipulandum rigidly, or that the head-mounted system is rigidly attached to the head; both of these assumptions may be problematic. Although the head is bony, it is nevertheless difficult to design helmets for visual displays that are fitted for each individual and that do not experience relative movement during fast head motion. Ground-based linkages are easier to actuate for force reflection than are body-based linkages, because actuators do not have to be placed and carried on the body.
One drawback of ground-based linkages is that the operator is tied to the ground and hence the workspace is limited. Even if one is willing to increase inertia and lower the mechanical resonance frequency in order to have longer arms, the range of a two-segment arm is ultimately limited to about 2 m. Nevertheless, they are a good option when operators are seated or not moving very much. Another drawback is the restricted numbers of degrees of freedom that can be measured, usually six. This is a problem for simultaneously measuring multiple limbs or for measuring redundant linkages. For example, multiple-finger motion is not conveniently measured with a ground-based linkage; a goniometer is better. Moreover, the human arm is a redundant 7-DOF mechanism, and a ground-based linkage cannot by itself resolve the redundancy.
For hand tracking, there are many examples of hand controllers, joysticks, and other haptic interfaces mentioned. For head tracking, a commercial example is the ADL-1 six degree tracking system by Shooting Star Technology. A related example is the BOOM viewer from Fake Space Labs, in which the visual display is not worn but supported on a pedestal through the BOOM linkage. Based on experiences in robotics, accuracies of 0.1 mm and high sampling rates should be achievable with such systems, when properly and routinely calibrated (Mooring et al., 1991).
For linkages attached to a helmet, one possibility is to use counterbalancing to reduce the gravity load of a head-mounted display on the wearer (Maeda and Tachi, 1992). One potential drawback is the inertia of the counterbalancing arms, which might impede head movement. One possible solution (not yet implemented as far as we know) is to use an active linkage servo-controlled to remove all dynamic loads on the head; this solution would obviously be complicated and expensive.
An inexpensive method for position measurement is to use three wire potentiometers. This method has been employed in robot calibration (Fohanno, 1982; Payannet et al., 1985), in which submillimeter accuracy has been reported. A cable connected between a torque motor and the human wrist has been employed for movement perturbations (Soechting and Lacquaniti, 1988). Force-reflecting hand controllers have been built using multiple cables attached to motors with pulleys at the edge of a frame (Agronin, 1987; Atkinson et al., 1977; Kawamura and Ito, 1993; Sato, 1991); however, the handle in the middle of this frame has a restricted workspace. Similarly, actuated strings have been employed by Iwata (1991, 1992) to apply force to the feet for walkthrough simulation.
The most popular position trackers are magnetic because of low cost, modest but reasonable accuracy, and convenience of use. Magnetic trackers do not suffer obscuration problems, although they are sensitive to environmental magnetic fields and ferromagnetic materials in the workspace. Multiple trackers can be employed to map whole-body motion and to increase the range of tracked motion to a small room (Badler et al., 1993).
The commercial trackers from Polhemus and Ascension Technology are currently the most frequently used. The Polhemus Fastrak is a recent introduction and improvement over their original sensor: the commercial brochure states static accuracies of 0.03 in and 0.15 deg and an update rate of 120 Hz. The update rate decreases with the number of sensors, which need to be multiplexed. Although latencies are stated as 4 ms, an independent source has determined a 20-30 ms latency range. The useful range is 1 m. The Ascension Bird sensor has a stated accuracy of 0.1 in and 0.5 deg, and an update rate of 120 Hz; the latencies are 30 ms.
In general, the advertised performance specifications of commercial magnetic sensors should be treated cautiously, as they do not meet their specs in realistic situations. The accuracies for both sensors depend on how close the transmitter and receiver are to each other. In the case of the Fastrak, which employs AC magnetic fields, neighboring metal surfaces will degrade the accuracy because of induced eddy currents; there is no a priori way to gauge the effect other than individual environmental testing.
The Ascension Bird sensor employs DC magnetic fields and hence is less sensitive to surrounding metal, although a reluctance effect might alter the magnetic path. In Hollerbach et al. (1993), the Bird sensor was compared with the Optotrak in the context of robot calibration; the relative accuracy was found to be roughly proportional to price.
There are a variety of approaches to optical sensing for position tracking and mapping. Distance may be measured by triangulation (e.g., stereo vision), by time of flight (laser radar or ladar), or by interferometry. The passive light of the environment may be employed (stereo vision systems), structured light may be projected (laser scanning), light may be pulsed (ladar), or active markers (infrared light emitting diodes or IREDs) or passive markers may be placed on a moving body. Cameras or detectors may employ linear or planar charge-coupled device (CCD) arrays, position-sensing detectors (PSDs), or photodiodes. Below we discuss some major approaches to position tracking and mapping according to the following categories: passive stereo vision systems, marker systems, structured light systems, laser radar systems, and laser interferometric systems. A more complete review of active range imaging sensors may be found in Besl (1988).
Passive Stereo Vision Systems
Substantial effort is being expended by the computer vision community on developing artificial human-like vision capabilities. Passive stereo vision systems employ ambient light and square-array CCD cameras, which have a typical resolution of roughly 600 × 400 pixels. A key issue is to solve the correspondence problem: relating the same points in two different images. Although the vision community is far from solving the general stereo vision problem, substantial progress is being made. Advances in algorithms and hardware are resulting in real-time (30 frames/s) three-dimensional imaging at moderate resolutions (Inoue, 1993; Kanade, 1993). Passive stereo vision systems are unlikely to be useful in VE in the near term, as robustness and accuracy are not yet comparable to active ranging systems. In the long term, as the computer vision community continues to advance, the use of passive vision for mapping and tracking is likely to become quite prevalent in VE.
The stereo correspondence problem is solved in marker systems because a few, easily identifiable fiducial points are tracked on a moving
body. The simplest and most accurate approach is to use a number of IREDs, which create very bright spots in the image. The IREDs may be pulsed in sequence with camera detection to uniquely identify each marker. For detection, one frequently employed approach is PSDs (also called lateral effect photodiodes), which are available as roughly 1 cm2 squares. Incident light induces a current that is measured at each edge of the square to yield the XY location of the centroid of the incident light.
Commercial examples include the Selspot II system developed by Selcom and the Watsmart system developed by Northern Digital. IREDs are multiplexed at high rates (3,000 Hz) and sensed by two cameras, which triangulate the markers to an accuracy of about 5 mm at a distance of 2.5 m. The cameras are mounted on tripods; a calibration cube mounted with precisely known IREDs is employed to calibrate camera poses for triangulation calculation. Multiple markers can be tracked to yield orientation and to follow multiple bodies simultaneously. Workspace is distance-dependent: at 2.5 m, it is about 1 m3. Researchers at the University of North Carolina have demonstrated a high-performance tracker using four CCD cameras mounted on a helmet pointing up toward a custom ceiling grid with IRED markers mounted at known positions in the tiles (Ward et al., 1992). The inside-out arrangement allows for much better accuracy in orientation and a large, scalable work area.
A fundamental problem with the use of PSDs is reflections of IRED light from environmental surfaces that move the apparent centroid of the sensed light; the amount of reflected light is high, about 25 percent of the total. The result is that it is very difficult to get camera resolutions beyond 1 part in 4,000; for this reason, Northern Digital has abandoned the Watsmart in favor of the Optotrak, which employs multiple cameras with 2,048-element linear CCD arrays. An IRED beam is transformed by a cylindrical lens into a line of light projected onto a linear CCD array. Because of the Gaussian spread of light intensity, an area of the CCD array is illuminated, which allows subpixel localization by area fitting. The result is a camera resolution of about 1 part in 200,000. Reflections are removed by image processing and thresholding: if there is another peak, it is detected and simply removed. This is the substantial advantage over the use of PSDs. At a viewing distance of about 2.5 m, the accuracy is about 0.1 mm and the resolution is about 0.01 mm. According to Marc Rioux (National Research Council of Canada, Ottawa) the limiting factor is air turbulence, whose effect is about 0.01 mm at the 2.5 m distance. The Optotrak comes in two forms. The series 2000 employs two housings, each with two cameras, that can be mounted on tripods; larger viewing distances are possible, at the cost of the use of a calibration cube. The series 3000 is a single housing, with three cameras embedded in an aluminum block; the camera ensemble is calibrated at the factory and
hence there is no need for field calibration as for the series 2000 (at the cost of a smaller workspace). The Optotrak is capable of tracking full poses of two moving bodies at 100 Hz through multiple marker placement.
There are several commercial stereo vision systems that employ passive rather than active markers. Accuracies and sampling rates are not as good as those for the commercial active marker systems, although the absence of wires on the body is an attraction. Position and orientation tracking is also possible with a single square-array camera instead of two; Meyer et al. (1992) refer to this method as pattern recognition. When several points on a plane with precisely known relative locations can be recognized, then reasonably accurate estimates can be obtained. This alternative is especially attractive for head tracking because only one camera is used.
Structured Light Systems
Another approach to solving the correspondence problem in stereo vision is to employ structured light, usually a precisely known ray or plane of light. In one common configuration, a plane of light, created by passing a laser beam through a cylindrical lens, is swept across a scene using a galvanometer-driven mirror. At each position of the plane, a light stripe is created, which is sensed by a two-dimensional camera. The intersection of the known plane and the line of sight from the camera determines the three-dimensional coordinates. To reduce the cost of such systems and improve the frame rate, Kanade (1993) has developed a cell-parallel light stripe range sensor, based on custom very large scale integrated (VLSI) design. The camera employs ''smart" photosensitive cells, which can detect the time at which peak incident light falls. Resolution is currently 32 × 32 pixels, acquisition time is 1 ms, and accuracies are about 0.1 percent of the range.
Another common configuration is laser spot scanning, using a variety of different movable mirror arrangements. A common misconception is that the baseline in a laser scanning system (distance between light source and detector) must be large for accurate ranging. Instead, Rioux (1984) developed a synchronized scanner in which the horizontal position detector and beam projector are located nearly collinearly and oppositely. A first scanned mirror between the two directs light on one of its surfaces from the source to the scene via a fixed mirror to a second scanned mirror. Reflected light is redirected by another fixed mirror to the opposite surface of the first scanned mirror to the detector. This reduces the shadow effect over other laser spot scanners and yields a more compact system.
One version of the synchronized scanner is a random access laser scanner, in which the first scanned mirror is a simple two-sided mirror,
and the laser spot can be arbitrarily directed to any point in a scene (Beraldin et al., 1993). The advantage of random access is the ability to scan just a portion of the scene, for example to track an object using a Lissajou scan pattern, or to scan a full scene coarsely and then scan a selected scene portion more finely. The working range is from 0.5 to 100 m, the field of view is 40 × 50 deg. In a digitization mode, the sampling rate is 20 kHz (sample points per second) with CCDs and 10 MHz with PSDs. In a tracking mode, single objects can be tracked at 130 Hz; this rate is divided by the number of objects to be tracked. When the range is less than 10 m, the accuracy is typically 0.1 to 0.2 mm. At the far range of 100 m, retroreflectors must be used. Reflectance can also be inferred from light intensity. A VE application of this scanner involved digitizing for simulation the cargo bay and experimental setups of the Space Shuttle Orbiter (Maclean et al., 1990). This simulation was used by the National Aeronautics and Space Administration (NASA) during a mission. Because reflectance is measured, the graphical depiction was reported as being very realistic; during blackout one could hardly tell that a simulation was being used.
A second version is a raster scanner that employs as the first scanned mirror a multifacet pyramidal mirror, which is rotated continuously at a high rate. Linear CCD arrays or PSDs may be used for detection. Frames of 480 lines by 512 pixels are collected at video rates (33 Hz). In an early prototype, the working volume was a 50 mm cube (Beraldin et al., 1992); Hymarc is developing a version with a 1 m cube field of view. Resolution in the field of view is 9 bits (1 part in 512).
Laser radar, or ladar for short, involves calculating the time required for a light beam to travel from the source, reflect off an object, and travel back to a detector; this principle is similar to ultrasonic ranging. The time of flight (in the picosecond range) can be directly measured for laser pulses emitted in rapid succession and scanned. Ladar is more appropriate for long distances than triangulation systems; for example, Maclean et al. (1990) employed a triangulation laser scanner for short range (0.5 to 10 m) and a ladar scanner for long range (up to 2 km).
In one commercial example, IBEO Lasertechnik obtained accuracies of 2 cm at 4,600 point measurements per second. Two other methods for calculating this time are: (1) the phase shift between outgoing and incoming amplitude modulated (AM) light beams and (2) the beat frequency of a frequency modulated (FM) beam (Besl, 1988; Blais et al., 1991).
For the AM method, the diffused reflected beam is typically six orders of magnitude less than the outgoing beam and is detected by a telescope
near the outgoing beam (Chen et al., 1993). Mirrors, as in Rioux (1984), are employed to scan the beam in a point-to-point fashion. Because phase is being detected, there is an ambiguity interval of 1/2 wavelength, typically around 1 m. Two notable examples of such ladar systems are the Environmental Research Institute of Michigan (ERIM) system (Hebert and Kanade, 1989) and the Odetics 3D Laser Imaging System. The Odetics system has a resolution of 9 bits, a frame of 128 × 128 pixels, a frame period of 835 ms/frame, and a field of view of 60 × 60 deg. By use of advanced calibration procedures, accuracies of 0.15 in have been reported (Chen et al., 1993).
Further information on ladar sensors, as well as on millimeter-wave radar, may be found in the review of remote vehicles in Chapter 9.
Laser tracking systems employing interferometry have been used in robot calibration and tracking in the past decade; good surveys are presented in Jiang et al. (1988), Kyle (1993), and Mooring et al. (1991). The precision and accuracy of interferometry is very high, although the cost is currently too great for routine use.
A laser beam is steered to a retroreflector on a robot end effector or moving body by a servo-controlled mirror on a two-axis, galvanometer-driven scanner. Either a mirror retroreflector (an open corner) or a solid glass retroreflector (referred to as a cat's eye) is used (Kyle, 1993). The mirror retroreflector has a smaller working range than the cat's eye (±20 deg versus ±60 deg), but can be made smaller. The retroreflector reflects the beam back to the scanner, where a beam splitter directs the beam to a photodetector for interference fringe counting and to a PSD. Based on the PSD output, the outgoing beam is deflected to the center of the retroreflector for tracking. The calculation of the three-dimensional position of the retroreflector is done in one of two ways:
The two mirror angles on the scanner are measured (Lau et al., 1985). Then one scanner is sufficient, as the spherical coordinates are provided. A commercial version is the Smart 310 system by Leica.
The two mirror angles are not employed, but three laser scanners are required for three distances. Chesapeake Laser Systems, Inc., has developed a commercial system.
A problem with laser interferometers is that only incremental displacement is provided. To obtain absolute distance, some calibration procedure must be followed. For the Chesapeake Laser Systems device, it has been shown that by adding a fourth scanner the system can self-calibrate (Zhuang et al., 1992).
Another problem is to provide orientation as well as position. One solution is to use a steerable mirror instead of a retroreflector (Lau et al., 1985), which provides two orientation measurements. More recently, Prenninger et al. (1993a, 1993b) have determined all orientation components by imaging of the diffraction patterns of the edges of a modified mirror retroreflector. Orientation resolutions of 1 arcsec are stated, and motions can be tracked that accelerate at 100 m/s2.
Acoustic trackers employ at least three microphones to triangulate an emitter on the moving body. They have been employed in robotics for calibration (Stone, 1987) and in biomechanics for motion tracking (Soechting and Flanders, 1989). Commercial implementations for the VE market include the GP8-3D developed by Science Accessories, the Logitech 3D/6D Mouse, and the Mattel Power Glove. For point tracking at modest accuracies and speeds, ultrasonic trackers are a reasonable and very inexpensive alternative to magnetic sensors: the ranges are larger, and magnetic interference is not a problem. However, a clear line of sight must be maintained and the latency is proportional to the largest distance being measured. Because at least three points are required to infer body orientation, it is difficult to measure full pose at adequate rates.
Most such systems measure the time of flight of ultrasonic pulses. There are a number of technical problems that make it difficult to achieve good accuracy, speed, and range with this technique. The first factor is the frequency of the ultrasonic carrier wave. A shorter wavelength makes it possible to resolve smaller distances, but atmospheric attenuation increases rapidly with frequency starting at about 50-60 kHz. Most current systems use 40 kHz tone pulses with a wavelength of about 7 mm. Metallic sources such as jingling keys produce enormous quantities of energy in this frequency band, making it extremely difficult to achieve immunity to acoustic interference. The use of higher frequencies could avoid some of the interference and increase the resolution, but atmospheric attenuation would limit the range. Furthermore, at high ultrasonic frequencies it is difficult to find an omnidirectional radiator, and microphones are expensive (over $1,000 each) and require high voltage.
Another major problem is echos from hard surfaces, which can have as much as 90 percent reflectivity to ultrasonic waves. At the SIGGRAPH '93 Convention, Bauer of Acoustic Positioning Research, Inc., demonstrated an ultrasonic tracker that uses patented algorithms to achieve robust noise and echo rejection while tracking over a 25 ft range with 1 in resolution. This system, called GAMS, also employs an unusual inverted
strategy, with the sound sources mounted on the ceiling and the microphones on the users, so that any number of users may be tracked simultaneously without having to reduce the 30 Hz update rate. By contrast, the GP8-3D has an update rate of 150 Hz divided by the number of emitters being tracked. An advantage of the traditional one-emitter-at-a-time approach is that echo cancellation is easier: the first arrival is always the direct path unless the line of sight is blocked.
Celesco Transducer Products, Inc., is advertising a new wireless 40 kHz time-of-flight system called the V-scope, with 0.1 mm resolution over a 16 ft range.
Instead of time of flight, phase coherence (Meyer et al., 1992) is an incremental motion technique (like interferometry) for which absolute distance must initially be calibrated by some other means. Phase-coherent tracking also has problems with reflections and some environmental noises. In order to overcome the drift problem, Applewhite (1994) has developed a variation called modulated phase-coherence, which can achieve submillimeter accuracy.
The use of accelerometers or of angular rate sensors for motion tracking is becoming increasingly attractive because of advances in sensor design. For example, silicon micromachining has begun to produce very small inertial sensors. IC Sensors markets solid-state piezoresistive accelerometers, which employ a micromachined silicon mass suspended by multiple beams to a silicon frame. The GyroChip developed by Systron Donner employs a pair of micromachined tuning forks, which sense angular velocity through the Coriolis force.1 The GyroEngine developed by Gyration is a miniaturized spinning wheel gyroscope that is even smaller than the GyroChip.
To derive position or orientation, the output of these sensors must be integrated. The result is sensitive to the drift and bias of the sensors. Another serious problem is that any inertial sensor based on beam bending will suffer inaccuracies due to nonlinear effects. Force-balance accelerometers, such as those from Sundstrand, avoid nonlinear beam-bending problems; they are remarkably stable and have been employed in kinematic calibration after double integration (Canepa et al., 1994).
Nevertheless, some drift is inevitable. Either an inertial package must periodically be returned to some home position for offset correction, or it must be used in conjunction with some other (possibly coarse) position
sensor and an appropriate method of data fusion. The latter option could be quite attractive. An inertial orientation tracker has been built at MIT using triaxial angular rate sensors with gravimetric tilt sensors and a fluxgate compass for drift compensation (Foxlin and Durlach, 1994). It achieves 1 ms latency, unlimited tracking volume, 0.008 deg resolution, and 0.5 deg absolute accuracy (no drift). The system is now being extended to track translation as well as orientation.
Inertial sensors are also useful in conjunction with other position tracking systems for lead prediction. In the high-end HMD system from CAE Electronics, the outputs of a fast optical head-tracker are combined with angular velocity measurements to predict future head orientation. In this manner, the 100 ms graphics rendering latency is effectively shortened to under 60 ms (Ron Kruk, CAE Electronics, personal communication, 1994). Lead prediction has also been implemented using Kalman filters (Friedmann et al., 1992; Liang et al., 1991). Additional information on the use of inertial sensors is presented in Chapter 9.
Eye movement trackers have been generally surveyed by Durlach et al. (1992). The general types are electroocular (measurement of the corneoretinal potential with skin electrodes), electromagnetic (measurement of magnetically induced voltage on a coil attached to a lens on the eye), and optical (reflections from the eye's surface). Only the optical-tracking methods seem particularly suited for general use, because they are less invasive and can be reasonably accurate.
The RK-416 Pupil Tracking System developed by ISCAN Inc. employs a video camera to track the pupil of the eye. The camera coordinates are converted to eye rotation angle through a calibration procedure involving fixation at known targets. Accuracy is stated as 1 deg and bandwidth as 60 Hz. Compensation for minor head movements is accomplished by relative tracking of the first Purkinje image or by head spot tracking.
The Series 1000 Infrared Eye Movement Spectacles developed by Microguide, Inc. employ differential reflections of infrared light from two sides of the iris, detected by photodiodes. Sensitivity is stated as 0.1 deg, bandwidth as 100 Hz, but linearity of only 10 percent. Sources of artifact for such systems due to inaccurate positioning of sensors were considered by Truong and Feldon (1987), due to changes in reflectivity among sclera, iris, and pupil. The sensors are mounted on a post projecting down from a band wrapped around the forehead. The ober2 developed by Permobil Meditech, which operates by a similar principle, employs goggles instead.
In general, position trackers are required that have adequate performance at reasonable costs. For limb segment tracking, some forms of optical sensing are already accurate and fast enough but are too expensive for routine use (in the $50,000-100,000 range), especially when multiple placements to overcome obscuration or to increase workspace are considered. Whatever sensory system is employed for limb tracking, there will be difficulties in identifying reliable fiducial points because of the softness of tissue and clothes. To infer joint angles, some calibration procedure must be applied to set up coordinate systems in limb segments; subsequent joint angle inferences will only approximate the true biomechanical angles because of less than ideal joints and measurements.
For whole-body tracking, the size of the workspace is an important issue. Ideally, one should be able to track a moving person in a sufficiently large space without loss of resolution or worries about obscuration. People should even be able to move from room to room in a building without loss of tracking. If all of the major body segments are to be tracked, then sensory systems that require mounting something on the body (reflectors, markers, goniometers) are less attractive than systems that scan the body as is. For body and environmental surface mapping, some forms of laser trackers are already accurate and fast enough to be generally useful but are still expensive.
The long-term goal for VE purposes is for a high-accuracy, real-time optical mapping system that tracks the human body as required. This could be a stereo vision system with natural light or active laser scanners. One step in this direction has been taken by Mulligan et al. (1989), who employ a model-based vision system to track a moving excavator arm that is controlled in Cartesian coordinates by a hand controller. The excavator joints are not sensed but are calculated from the locations of the boom, stick, and bucket of the excavator. In another project, a human hand is observed in a grasping task, then the grasp is duplicated by a robot hand (Kang and Ikeuchi, 1993). For HMD applications, the user has to wear something anyway, so head-tracking, which has much more demanding requirements than body-tracking, can probably be more accurately accomplished by sensors on the HMD.
Below we comment on particular needs for each basic type of sensory system.
Mechanical trackers For goniometers, research is needed on difficult issues of fit and measurement, such as adjustments to different individuals, alignment with joints, sufficiently rigid attachments, and calibration of the linkage plus human limb. For body-based linkages, the ability to track multiple limb segments and limb redundancies needs to be addressed.
Hybrid ground-body based systems are likely to be required; for example, finger motion should be tracked as well as hand motion.
Magnetic trackers Significant current disadvantages that limit usefulness include modest accuracy and high latency (20-30 ms). The high latencies are particularly troubling, as this limits their usefulness in real-time interaction. The accuracies are not competitive with most other tracking technologies. Furthermore, the influence of extraneous magnetic fields in the case of AC sensors makes it difficult to know the accuracy one is getting; there is no simple way of determining and compensating for interfering magnetic fields. It is an open question to what extent the accuracies and latencies can be improved.
Optical sensing Optical sensing is one of the most convenient methods, but has drawbacks due to visibility constraints. These drawbacks can be partially ameliorated by using multiple camera placement or target placement. The ability of passive stereo vision systems to process arbitrary environments is a long-range goal of the computer vision community; when eventually developed, stereo vision would represent an extremely attractive method for position tracking and mapping. In the meantime, developments in laser scanning and laser radar are promising, as sampling rates, fields of view, and accuracies are becoming quite reasonable.
Laser interferometers are capable of the highest accuracy, which does not change with viewing distance. If the cost could be brought down, they would represent an attractive method of end-point or head-tracking. Because of the need for retroreflectors, it will be relatively difficult to track multiple limbs. Visibility constraints are a problem; if beams are ever interrupted, the absolute reference is lost. Relatively robust methods for establishing absolute references are required, perhaps through redundant sensing.
In general, costs will have to be brought down for optical trackers to be more widely used.
Acoustic trackers These trackers have a definite role to play in VEs, because the costs are relatively modest and the accuracies are often sufficient. If in situ calibration of the speed of sound in air could be performed, or if ambient measurements could be taken that feed into a model, then the accuracy of acoustic trackers could be improved. Improvements in detection methods could probably reduce the effect of echos. By using multiple frequencies, it should be possible to track multiple markers simultaneously. Drift problems with phase coherent systems might be resolved by dead reckoning with time-of-flight measurements.
Inertial trackers Further reductions in sensor size and cost are needed to make inertial trackers a convenient and economical alternative to magnetic trackers. Hybrid systems combining inertial sensors with
other technologies need to be developed for high-performance HMD-tracking applications requiring both accurate registration and fast dynamics. A particularly promising combination is an all-inertial orientation tracker combined with a hybrid inertial-acoustic position tracker.
Eye-movement trackers An ideal eye tracker would satisfy three requirements: linear response over a large range (roughly 50 deg), high bandwidth (1 kHz), and tolerance to relative motion of the head. To date, no devices satisfy all three requirements. The CCD imaging systems such as ISCAN have a reasonably linear response, but the sampling rates are too low and range is limited to about 20 deg. The infrared reflection devices have the bandwidth, but are linear only in a small range; calibration to overcome their nonlinear responses is difficult.
Characterization methods One of the most confusing issues when considering which position-tracking solution to adopt is the lack of agreement on the meaning of the performance specifications and how they should be measured. Standards need to be set defining how to measure accuracy, resolution, latency, bandwidth, sensitivity to interference, and jitter. Equipment for making in-house measurements should be made commercially available, and the establishment of an independent testing laboratory would also be beneficial so that consumers would not be forced to rely on manufacturer's specifications, which usually have little relation to actual performance.