Read "Behavioral and Social Science: 50 Years of Discovery" at NAP.edu

« Previous: Some Developments in Research on Language Behavior

Page 249 Cite

Suggested Citation:"Visual Perception of Real and Represented Objects and Events." National Research Council. 1986. Behavioral and Social Science: 50 Years of Discovery. Washington, DC: The National Academies Press. doi: 10.17226/611.

Page 250 Cite

Page 251 Cite

Page 252 Cite

Page 253 Cite

Page 254 Cite

Page 255 Cite

Page 256 Cite

Page 257 Cite

Page 258 Cite

Page 259 Cite

Page 260 Cite

Page 261 Cite

Page 262 Cite

Page 263 Cite

Page 264 Cite

Page 265 Cite

Page 266 Cite

Page 267 Cite

Page 268 Cite

Page 269 Cite

Page 270 Cite

Page 271 Cite

Page 272 Cite

Page 273 Cite

Page 274 Cite

Page 275 Cite

Page 276 Cite

Page 277 Cite

Page 278 Cite

Page 279 Cite

Page 280 Cite

Page 281 Cite

Page 282 Cite

Page 283 Cite

Page 284 Cite

Page 285 Cite

Page 286 Cite

Page 287 Cite

Page 288 Cite

Page 289 Cite

Page 290 Cite

Page 291 Cite

Page 292 Cite

Page 293 Cite

Page 294 Cite

Page 295 Cite

Page 296 Cite

Page 297 Cite

Page 298 Cite

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Visual Perception of Real arid Represented Objects arid Events JULIAN HOCHBERG INTRODUCTION Experimental psychology started with the study of how we perceive pictures and of the conditions under which one object is an effective sur- rogate for another (that is, the two objects elicit the same effect). Such study has served the purposes of other disciplines as well, and remains inherently interdisciplinary. Prior to 1850 the problem was primarily pursued by artists and philos- ophers, and the conceptual tools were essentially those of physics and geometry. In the classical period, roughly from 1850 to 1950, the primary theoretical concerns were those of neurophysiologists and psychologists. Major applications-in visual prosthesis (e.g., optometry and ophthal- mology), the visual media (e.g., photography, print, and eventually tele- vision), and the interface between human and machine currently called human factors motivated much of the research that provided a rich base of technical data. The present period of tremendous ferment started around 1950. The problems of perception continue to engage all the disciplines already men- tioned; in addition, computer science is now a major presence in the field, providing tools and motivation in several distinct but closely related ways: as a source of techniques for research, theory testing, and modeling; as a source of analogies and metaphors; as an overlapping enterprise, seeking to devise machines that will "perceive" in the same way that people do; and in the context of learning how to generate and display computer images that humans can readily and accurately comprehend. 249

250 JULIAN HOCHBERG THE PRE-1850s: ARTISTS, PHILOSOPHERS, AND PHYSICISTS Artists have known for centuries that one way to produce a picture is to make a surrogate object that (ideally) offers the eye the same pattern of light as that offered by the scene itself. The most famous example of this is Leonardo's window (Figure 1A): By tracing the outlines of objects on a plane of glass interposed between his eye and the scene, the artist discovers the characteristics of a two-dimensional projection of a three-dimensional scene. Of course, the method could be used to provide pictures of existing (B) FIGURE 1 Surrogates and their preparation. A: One of the optical aids that artists have used for centuries (surer) to help in preparing a surrogate that provides the eye with much of the same stimulus information as the object or scene being represented. B: By studying the tracings made of scenes viewed through a glass pane Leonardo advised that artists could learn the characteristic two-dimensional projections of three- dimensional layouts and could then construct pictures of imagined scenes.

VISUAL PERCEPTION OF REAL AND REPRESENTED OBJECTS AND EVENTS 251 scenes with no need for the artist to learn anything: the scene could be traced directly on the glass (Figure 1B) or with the growth of technology- by photographic or video media. Some traditional features that result from projecting normal three-di- mensional scenes on two dimensions appear in Figure 2: these include linear perspective, familiar size, relative size, and interposition. Note that even a perfect picture produced in this way is inherently ambiguous, in that both the flat surrogate and the very different three-dimensional layout it repre- sents offer the same light to the eye. This is the aspect of pictures that made them, and visual perception, of interest to the philosophers the epistemological issue of how we can know what is true. Philosophical concerns aside, the ambiguity is inherent as a matter of simple mathematics, and provides both the opportunity for pictorial communication and a tool for psychological and physiological inquiry. The artist who learns to use signs of depth, as in Figure 2, can produce surrogates of scenes that do not and perhaps could not exist virtual scenes of grottos, unicorns, and biblical and extraterrestrial events. Indeed, we shall see that in Me interest of visual comprehensibility it is necessary to depart from pure projection, and most pictures are therefore to some extent surrogates of virtual rather Man actual scenes. Today computers provide an increasing proportion of Me still and moving pictures that humans confront. For Hem to do so programmers must learn how to project ~ree-dimensional layouts in two-dimensional arrays and to generate the play of light and shade by which different surface textures are FIGURE 2 The major pictorial (monocular) depth cues: the tracing of the scene in Figure 1B. ~ Linear Perspective: paral- 3 ~ 19px 8 , let lines ~8, 7-9, etc., ~ ~ TIC >\ converge in the picture f ' l. / 5 plane. Interposition: the / / nearer object4 occludes I ~ / part of the farther object Em\/ / 4 5. Relative Size: the trac- / ~ ~ /C ~ sing of boy 1 Is larger / )\ / \ than that of boy 2. Tex- ~ /~ / 7 ture-Density Gradient: the evenly spaced bars on the field 6-7-8-9 project an image whose density increases with distance. Familiar Size: if man 3 is known to be larger than boy 1, and they are the same size in the picture plane, then the man must be proportionally farther away in the represented scene. Tracings on the picture plane

252 JULIAN HOCHBERG perceived (Figure 31. The study of such rules-traditionally called depth cues (Woodworth, 1938) and lately called "ecological optics" (Gibson, 1979) is fundamentally a branch of physics, but one that must be pursued with the psychological and neurophysiological limitations and contributions of the human viewer firmly in mind. Surrogates are therefore more than means of pictorial communication: they tell us about the limits of the information that the sense organ can pick up and about how the brain organizes that information. Perhaps the earliest major instance of that point was in Newton's (1672) famous experiment in visual sensation, showing that an appropriate mix of three narrow wave- lengths of light bands of color taken out of the spectrum, such as red, green, and blue can serve as a surrogate for any and all colors in the spectrum, and thus match any scene (Figure 41. This is not a fact about photic energy the light itself remains unchanged by the mixture. It is instead a strong clue about our sensory nervous systems, and it provided the background for the classical theory of perception and the nervous system, which we consider next. PSYCHOLOGY AND PHYSIOLOGY FROM 1850-1950 Given the facts of color mixture, the most parsimonious model of visual perception was the Young-Helmholtz theory (Helmholtz, 18661: that color perception is mediated by three kinds of specialized receptor neurons, the cones, each responsive to most of the spectrum, but each with a different sensitivity function. The three types were thought to be most sensitive to light that looks red, green, and blue, respectively, and their response to FIGURE 3 Computer-drawn image. A picture programmed directly from blueprints of a building, using a polygon facet approach with a simple lighting model that simulates direct sun and diffuse sky illumination. Paul Roberts, Computer Vision Lab, Columbia University.

VISUAL PERCEPTION OF REAL AND REPRESENTED OBJECTS AND EVENTS 253 ,~,, A 650 ~ -__ 580 530 D I 460 _ _ I E _650 B 1 650 ~ 580 1 53o 460 1 RED C ORANGE YELLOW GREEN BLUE V IOLET ' 460 - - \ A I (_) ' ~ I \ O ~ ~ 9, ~ ~ > _ ~O G CC Z ~ ~ o Z o ~ 100 50 460 530 6s0 -VV 400 500 600 700 H FIGURE 4 Color surrogates. To the visual system G. a suitable mixture F of just Tree wavelengths selected by slits E from the visible spectrum D can, in principle, be a surrogate for any hue C, that is, any set of wavelengths in the spectrum. According to the traditional Young-Helrnholtz theory, the physiological explanation involves three types of retinal cone cells with the Tree sensitivity functions shown in H. From these we can see, for example, that a mix of equally effective intensities at 650 and 530 is indistinguishable from 580 and could serve as a surrogate for the latter. photic energy was thought to underlie the experience of those colors. The retina was envisioned as a mosaic of independent triads of the three cones (Figure 5), and the light provided to the eye by any scene was thought to be analyzed into the point-responses of the three component colors. The research most directly relevant to this theory was the attempt to map the sensitivity of each type of cone to the wavelengths of the visible spectrum and to map the spatial resolution of the retinal mosaic what detail the eye could be expected to resolve. Such information as the limits of resolution and the bases and specification of colors provided the first goals for what has become visual science and its applications, which now run from the prescription of spectacles to the design of television characteristics. It was also the foundation of the classical view of the perceptual process in general, diagrammed in Figure 6: at left, the object in the world, with its physical properties of distance, size, shape, reflectance (surface color). These do not affect the sense organs directly, of course, but only by means of the light they reflect to the sensitive cells. All things that cause the cells to respond in some specific way elicit the same sensory experience: the light coming from the object itself, the light produced by some surrogate of that object, the effects of mechanical or

254 JULIAN HOCHBERG FIGURE 5 The sensory mosaic. ~ > In the simplest view, the retina of '~ the eye contains a mosaic of light- Ad. X sensitive cells. The spacing of the ~.F . ~ mosaic determines what detail can ( ~\ be seen: e.g., to distinguish a "C" ' from an "O." at least one cell (x) \>~ ~ must go unstimulated. The visible ~ it, i~ = portionoftheelectromagneticra- ye ~3 _ TO BRAIN diation incident at each point in the ~ iit.~3~ ~ retina that is capable of full color ~ ~. vision is coded into the output of \_ each of three cones according to its ' sensitivity curve (Figure 4H). This is, of course, essentially the way in which video cameras analyze the light they receive from D.S. UL1 Lit - SIZE REFL. SPACE ETC. RETINAL MOSAIC, P.S. SENSATION PERCEPTION 0 0 0~ Cam a' L,A L',RGB CUES CUES Z ~ o o Co O SIZE ROYGBIV WB (REFL.) SPACE ETC. FIGURE 6 The classical theory (1850-1950~. The distal stimulus, D.S. (an object or layout of objects), with such physical properties as size, reflectance, position in space, etc., impinges on the sensory surface by way of the proximal stimulus pattern, P.S., consisting of regions that vary in their spatial extent (~) and spectral distribution [lu minance (L), wavelengths (A)~. Sensory responses to each region (sensations) were thought to vary correspondingly in brightness (L') and hue (the mix of Red, Green, and Blue) over some extent all. Because of the regularities of the world and its geometry, the proximal stimulation will generally contain patterns (e.g., the cues in Figure 2) that are characteristic of and therefore provide information about the distal properties. The perception of such properties (objects' sizes, surface reflectances, spatial location, etc.) were thought to derive from the underlying sensations by associative learning and by computational processes.

VISUAL PERCEPTION OF REAL AND REPRESENTED OBJECTS AND EVENTS 255 electrical stimulation of the eye, etc. Insofar as different objects and events produce the same responses, information about the world is lost in this encoding process. This is what makes surrogates possible. And the fact that very different objects, and indeed different patterns of light, have the same effects on the nervous system provides a tool with which to study that system's structure and function. The visual system thus conceived is a mosaic of receptors (the retina) on which the eye's optical system projects a focused image of the light provided by the object. The receptors (the three types of cone, supplemented by rods, which do not differentiate color) analyze each small region of that image into points of red, green, and blue. This conception of the visual system has now been embodied in the television camera: Television, like the Helmholtzian visual system, reduces the countless objects and events of the world to the different combinations of a set of three colors in a spatial mosaic. It is important both for the Helmholtzian theory and for television as a medium that such a simple set will suffice. In both cases, all of the remaining properties of the objects that we perceive in the world their sizes, forms, and reflectances (i.e., surface colors), their distances and movements are lost in the encoding process and must be supplied by the viewer. The simplest theory about such nonsensory processes was inherited from centuries of philosophical analyses of perception: the theory that we have learned the perceptual properties of objects from our experiences with the world. It runs as follows: The sense organs analyze the world into~ndamental sensations. Those sensations are, in the case of vision, the sets of points that differ in hue (R. G. B in Figure 6, signifying red, green, and blue sensory experiences) and brightness (L') over some effective extent, D'. These packets of sensations normally come in characteristic patterns that are im- posed by the regularities of the physical world, patterns such as the depth cues in Figure 2. By learning these regularities and their meanings, we learn to perceive the physical world and its properties. The theory seems to be economical and elegant. The principles of learning appeared to be at hand. For almost two centuries (from Hobbes in 1651 to James Mill in 1829), the British empiricist philosophers had discussed how the "laws of association," offered in essence by Aristotle, could serve to build our perceptions and ideas about the objects and events of the world. And a plausible neurophysiological explanation of association readily of- fered itself in terms of increased readiness of nerve cells that had been repeatedly stimulated simultaneously to fire together. This outline of how we perceive objects and their pictures fitted nicely into a general theory of knowledge and of science, spanning from neuro

256 JULIAN HOCHBERG physiology to sociology and political science. With respect to the last, for example, the view that all our ideas about the world derive from our experiences with it leads readily (but not ineluctably) to the belief that human intelligence and character are generally perfectible through educa- tion, and to the advocacy of egalitarianism and individualism over a wide range of social and political issues. Although formulated by Helmholtz only one academic generation after his teacher, Johannes Mueller (1838), first undertook the scientific analysis of experience, what I am calling the classical theory of perception thus had wide and deep connections with the mainstream of Western thought, and it remained the dominant theory in neurophysiology and psychology until the l950s. It was not without opposition, however. Some opposition was based on a cluster of purely psychological flaws. Although, for example, the theory tells us which different stimuli will act as mutual surrogates that is, which different objects will produce the same perceptual experience it does not tell us what that experience will be like. It does not predict the attributes of the experience itself, i.e., it tells us that light composed of a mixture of 650 nanometers (red) and 540 nanometers (green) is indistinguishable in appearance from light of 580 nanometers (nary) (yellow), but it gives us no basis for predicting how that appearance is similar to and different from other colors. As we will see, alternative theories, almost as old as the Helmholtzian one, offer much more in the way of accounting for appear- ance. Notable among these proposals based on phenomenology (the study of appearances as such) were the following: Hering (1878) argued that perceived colors comprise red-green, yellow-blue, and black-white oppo- nent systems; that connections between cells of the two retinas provide for an innate sense of depth; and that lateral inhibition between adjacent regions of the visual system make their appearances mutually dependent. Mach (1886) proposed (among other things) that such lateral connections provide networks that are sensitive to contours and not merely to incident energy. A related problem is illustrated in Figure 7. In most situations in the real world, the local stimulation that is projected to the eye is not by itself information about object properties. Even if the two gray target disks on the cube are of identical lightness or reflectance (R0, the luminance or photic energy each provides the eye is different ELI, L2) because the illu- mination falling on each is different (E~, E21. Again, even if the two vertical rods on the right are of the same physical size (S), the size of the retinal image each provides all, 02) differs because the rods lie at different distances ODE, D21. Nevertheless, we tend to perceive such object properties correctly, despite changing retinal stimulation. The classical theory held that this object constancy, as it is now known, is achieved when the viewer takes

VISUAL PERCEPTION OF REAL AND REPRESENTED OBJECTS AND EVENTS 257 Ems ~ E2 Rt = L1/E1 = L2/E2 S= DxTan ~ FIGURE 7 Object constancy. Although both target disks on Me cube have the same reflectance (Rt), the luminances ELI, L2) differ to the eye because the illuminations (E~, E2) differ. Similarly, objects of the same size (S) provide images of different extent aft, 02) depending on Den distances ODE, Do. We tend to see objects' relatively permanent qualities, such as their reflectance and size, as constant even though the proximal stimulation they provide is in flux. In the classical theory we do this by learning to process visual information according to the formulae R(reflectance) = L(lum~nance)/E(illum~nation), and S(size) = D(distance) x tangent of D(visual extent). the conditions of seeing into account: in effect, by using the depth cues to perceive depths Do and D2, and then using the latter to infer the object sizes from the retinal sizes All, 02~; similarly, to use cues to perceive the illu- minations En and E2, and, using the latter, to infer the reflectances of the parts of the scene from their luminances. This explanation is now commonly called "unconscious inference." Its operation assumes that the viewer has learned the constraints in the physical world (e.g., that L= R x E, that S =kDtanD, etch. These constraints, once learned, provide a mental structure that mirrors the physical relationship between the attributes of the object and those of sensory stimulation, per- mitting the viewer to infer or compute the former from the latter. A general form of this explanation is that we perceive just that state of affairs in the world that would, under normal conditions, be most likely to produce the pattern of sensory responses we receive. The learning processes that might underlie such computations have never been formally and explicitly worked out. What we would now call "lookup tables" (for example, with grouped entries for S. 0, and D) would be compatible with theories about associative learning. Helmholtz and others often wrote, however, as though we learn to apply the rules that mirror those of the physical world; they did not say explicitly, however, how such abstract principles, as distinguished from lookup tables listing the elements of sense data, are learned.

258 JULIAN HOCHBERG The Helmholtzian idea that our perceptions of objects rest on compu- tational or inferential process was, like the classical theory's failure to predict appearances, roundly criticized over the years as being uneconom- ical, mentalistic, and unparsimonious. Gestalt theory, which had a signif- icant impact in psychology and art theory between the two world wars, was particularly vocal in this regard. But the criticisms of the classical theory did not amount to much until the end of World War II. Then the needs of new technology (flight training, radar and sonar displays, etc.), the devel- opment of new instrumentation (notably, direct amplifiers that made the measurement of very small bioelectrical tissue responses common and re- liable), and the effects of grants that made the research career a viable occupation, all combined to turn the tables. As we will see, Helmholtz was right about the three cones and in some sense about the existence of mental structure and computation. But most of the rest of what lay between those points was wrong, and most of the alternative proposals that had been made by the critics of that dominant approach, especially those of Hering and Mach, were quite remarkably vindicated within a period of a very few years, after having been largely ignored for many decades. THE 1950s AND AFTER: "DIRECT" SENSITIVITY TO OBJECT ATTRIBUTES The two main arguments on which the classical theory rested were, first, that it was the simplest answer to the problem of analyzing the world of sensory stimulation, and second, that it was in accord with neurophysio- logical observation. In the l950s both of these supports were withdrawn. Technically, as is widely recognized, the most important single advance in instrumentation was the microelectrode, which made it possible to record the activity of individual nerve cells in the visual system and brain of an essentially intact animal that is exposed to various sensory displays. It quickly became evident that most of the cells observed in this way respond not to individual points of local stimulus energy but to extended spatial and temporal patterns-to adjacent differences in intensity, specific features, and movements in one rather than another direction (Figure 81. They appear to do so by means of networks of lateral connections, which were very much what Hering and Mach had argued. In the 1950s lIurvich and Jameson (1957) offered sensitivity curves for the red-green and yellow-blue opponent process cells that Hering had pro- posed, using procedures based on colors' appearances and not just on their discriminabilities (Figure 9A). They "titrated" the response that each of these hypothetical red-green and blue-yellow opponent pairs makes to wavelengths throughout the spec

VISU~ PERCEPTION OF REM AD REPRESENTED OBJECTS AD EVENTS 259 'it, - MICROELECTRODE '; 3 C 0¢ STI MU LUS + 3 + 2 -| ~ _ RESPONSE '/~` ~ STIMULUS +3 +2h +O 4 5 - RESPONSE FIGURE 8 Pattern-sensitive neural network. Microelectrode recordings from individ- ual cells in the visual system (Hartline, 1949; Hubel and Wiesel, 1962) reveal far more complex organization than the simple individual punctate analysis of Figures 5 and 6. For example, receptors in the retina 1 send both excitatory ~ +) and inhibitory (- ~ connections to more proximal cells 2; those connections are arranged in networks so that cells at level 2 are stimulated by light falling in a center region and inhibited by light falling in its surround. Other cells 3 still deeper in the system are so connected as to be more highly stimulated by a bar or edge falling on the line of 2 than by bars of other orientation 4. Cells farther in the nervous system are sensitive to a bar of specific orientation moving in a specific direction 5. trum by determining how much of each pure hue was needed to cancel all traces of its opponent. That is, how much pure red was needed to cancel the greenishness at each point between approximately 480 and 580 nm, thus indicating the strength of the response labeled G; how much pure green would cancel the reddishness of wavelengths above and below this region, thus indicating the strength of the response labeled R; etc. Wavelengths that appear as blue, green, yellow, orange, and red are shown at 1-5, respectively, and what they look like can be read off the graph. In explanation of these functions, Hurvich and Jameson (1974) proposed the following networks (see Figure 9B): Given three cones with the sen- sitivity functions they are now known to have (only approximately those of Figure 4H) and the network of excitatory connections (solid lines) and inhibitory connections (dotted lines) that is shown, each of the rightmost cells would serve as one or another of the yoked opponents by firing above or below their baseline activity. Informed by the opponent-process theory, microelectrode research iden- tified cells in the visual system of the goldfish (Svaetichin, 1956) and in the rhesus macaque (DeValois and Jacobs, 1968) that responded to wave- length in just these ways. Moreover, cells have been found that respond to lines and edges, at particular orientations, moving and stationary (Hubel and Wiesel, 1962),

260 0 75 0.50 0.00 0.50 0.75 JULIAN HOCHBERG ( ~ R 0.25 _~\o 0.25 _ ~-/B y r1 400 500 600 700 A 570 530 440 EMR :~ WHITE (~=G REEN (it/ - __ 4= BLUE B FIGURE 9 Accounting for color appearance by opponent-process networks. The func- tions of the Young-Helmholtz theory in Figure 4H explain why Tree wavelengths suffice to match any color, but do not explain color appearance. Herring had proposed two kinds of units, one that responds with a red sensation to some parts of the spectrum and with a green sensation to others, and a second that responds either blue or yellow. These units, by their combined activity' would account for the appearances of all hues. Hurvich and Jameson (1957) charted the amount of these components in the appearance of each section of the spectrum (see text), suggesting the functions in (A) as the response curves of the two kinds of unit, and suggesting a simple network (B) to encode the responses of the three kinds of cone (ax, ,B, lye into the opponent process hues plus black and white (Hurvich and Jameson, 1974~. Anticipated and guided by these analyses of perceptual experience, opponent process cells have been identified and studied by neurophysiological means (Svaetichin, 1956; DeValois, 1968~. to what may be thought of as sine-wave gratings of a particular frequency (Blakemore and Campbell, 1969), to disparities in the two eyes' views (Barlow et al., 1967), etc. Even though the Helmholtzian model (Figures 4-6) may be the simplest, we must conclude that it does not accord with the neurophysiological facts. These new neurosphysiological structures raise two questions: How do they themselves work, and what perceptual functions do they serve? With respect to how these structures work, they are widely believed to result from the activities of suitably interconnected networks of lateral inhibition and excitation (von Bekesy, 1960; Ratliff, 1965), like the sketches in Figures 8 and 9; this was very much what Mach and Hering had speculated to be the case. With respect to their possible perceptual functions, such pattern-sensitive networks open the way to very different kinds of explanation of the per- ceptual process. One of these is that the visual stimulus is analyzed into fundamental elements that do a great deal of what had been considered the task of learning and of unconscious inference. Three examples that have been given a great deal of attention will be mentioned and must stand for a larger number of such proposals. The first is that our visual world might

VISUAL PERCEPTION OF REAL AND REPRESENTED OBJECTS AND EVENTS 261 be assembled out of such fragments as edges and corners, providing a sort of feature list of which all scenes must be composed. The physiological mechanisms for such analyses could be provided by receptive fields in the striate cortex of the brain that are responsive to lines of a particular ori- entation anywhere within a local region of the retina (Hubel and Wiesel, 1962, 1968), with cells in the inferotemporal cortex responding to primitive shapes or even faces as stimuli, independent of position or orientation (Gross and Mishkin, 1977; Perrett et al., 19821. A second class of alternatives is to find neural structures that respond directly to specific properties in sensory stimulation that are themselves directly correlated with the distal, physical properties of objects in the world. Thus, cells that are sensitive to a disparity in the two eyes' images might provide a visual mechanism (Barlow et al., 1967) that is directly sensitive to an object's distance, as Hering originally argued. This possibility can be entertained, however, only to the very limited degree (see Gogel, 1984) that binocular space can be considered in such point-by-point fashion; in general, we must deal with extended patterns of stimulation and therefore with spatially organized and extended neural mechanisms. Spatially organized and extended neural structures are exemplified in a third class of alternatives that is based on the following idea of spatial- frequency channels: A sine-wave grating is a set of dark and light bars in which the intensity of the light varies in a sine wave. The width of the bars in such a grating defines its spatial frequency (i.e., the number of bars or cycles per degree of visual angle): high spatial frequencies mean fine detail, and the light-to-dark ratio (or contrast sensitivity) needed to discern the bars of each frequency characterizes the acuity of the visual system in terms that are compatible with those used to evaluate television transmissions and displays (Schade, 1956~. But such spatial frequencies are more than just a useful engineering measure. Because the rings of lateral inhibition that surround each stimu- lated point in the peripheral visual system come in different sizes, cells in the visual system are differentially responsive to spatial frequency. Cells have been found in the cortex that respond electrophysiologically to a particular range of frequencies within a restricted range of orientations in the retinal image (Movshon et al., 1978; DeValois et al., 19761. Moreover, in rough correspondence to these facts, viewers' abilities to detect com- binations of sinusoidal gratings (Campbell and Robson, 1964; Graham and Nachmias, 1971), and the aftereffects of exposure to a particular grating (Blakemore and Campbell, 1969; Pantle and Sekuler, 1968), both suggest that different spatial frequencies are being processed by separate channels. The relationship between such channels and the physiological finding of specialized response is not clear, nor is it clear what perceptual function,

262 JULIAN HOCHBERG if any, they serve. They have been proposed as the fundamental units of analysis of the patterned retinal image: Analogous to the channels of color in Figures 4 and 9, channels of differing spatial frequency and orientation might perform what amounts to a two-dimensional Fourier analysis on the retinal image (Campbell and Robson, 1964; Ginsburg, 1971; Kabrisky et al., 1970; see Graham, 19811. They have also been invoked by many researchers to explain a variety of phenomena in form and motion percep- tion, but their actual role in the perception of objects and events remains in question (see recent reviews by Braddick et al., 1978; Cavanagh, 1984; Foster, 1984; and Graham, 1981~. Many of the present studies searching for the mechanisms of sensory analysis depend on the use of microelectrodes, but units of sensory analysis much like these had been investigated long before the microelectrode was developed. For example, by showing that prolonged exposure to a particular stimulus event provides the kind of aftereffect that one would expect to find if a receptor were depleted or "fatigued" by that exposure, an argument could be made for the sensory nature of the response to the event. Thus, after exposure of the receptor to a set of horizontal stripes moving contin- uously downward, a stationary set of such stripes appears to move upward, supporting the argument that the perception of movement rests on a direct sensory response to motion (Wohlegemuth, 19111. This method has pro- liferated in recent years (see Graham, 1981; Harris, 1980), but such findings can be interpreted in other ways, and the search for new sensory units received greater legitimacy from the neurophysiological findings. If we change what we take to be the units of sensory analysis, then what we attribute to more central processes must in general change as well. Of greatest theoretical significance are those sensory mechanisms whose output remains invariant even though the local stimulation at each point on the retina may vary, i.e., mechanisms that respond to aspects of the stimulation that covary directly with the physical properties of objects and events. For example, the frog's retina contains cells that respond not to the intensity of light in some part of the retinal image, but to the ratio of intensities of surrounded and surrounding regions (Campbell et al., 19781. As has been realized since Hering and even Helmholtz, that ratio remains invariant regardless of changes in illumination as long as both regions are equally illuminated so that as sketched in Figure 10 equal ratios of luminance in the proximal stimulation (P.S.) signify equal ratios of reflectances between the object and its background as distal stimuli (D.S.~. It has been argued therefore that our perceptions of lightness are responses to adjacent ratios of luminance (Wallach, 19481. Such mechanisms might explain the con- stancies directly, that is, no additional process of computation or inference need be postulated. They therefore make possible very different explana

VISUAL PERCEPTION OF REAL AND REPRESENTED OBJECTS AND EVENTS FIGURE 10 A direct response to an object's reflectances. Given that visual neurons are organized into networks, alternative explanations of perception, very different from the classical theory, become possible. For example, if there are networks directly responsive to adjacent ratios of luminance (L~/L~), then direct response to an object and its background whose reflectances stand in the ratio R~/R2 would remain constant regardless of changes in illumination, E. c, A m D S. / P.S. OBJ R2 R1 L, _ = _ R2 L2 L1 263 O.. _ O~ ~ lions of how a given visual attribute of objects (color, size, form, distance, velocity) is perceived, explanations that need not draw on speculations either about learning or about computation. Indeed, because such proposals are useful as perceptual theories only insofar as they identify some aspect of stimulation that "specifies" (i.e., that is highly correlated with) some object property, they need not even be concerned with neurophysiology. The search for such directly informative variables of stimulation therefore actually antedates the neurophysiological discoveries (Gibson, 1950) and remains an influential approach today. The most sweeping and radical proposal of this kind is a direct theory for allofperception(Gibson, 1966, 1979~:0ur nervous systems"resonate" to stimulus properties that remain invariant when the light at the eye un- dergoes transformations (e.g., the optical flow patterns and motion parallax, Figure 1 1) due to relative motion between the viewer and the objects being viewed. This is of course very different from the traditional approach. The latter posed the original perceptual problem as this: How are we to account for the objects and layouts we do in fact perceive, given that the light at the eye is ambiguous and can be provided by very different surrogates? And it solved that problem by appealing to associations and computations that the individual perceiver has learned from experiences with the world. To the earlier direct theories that opposed this answer (including those of Hering and Mach) and that aimed at explaining particular perceptual abilities, evolution has provided specific mechanisms that so constrain the viewer's responses that they will usually be the correct solution. Some of the newer direct theories seek a much more general principle and are therefore not to be identified with some specific physiological mechanism. The "invariance" principle is the most general explanation of this kind

264 JULIAN HOCHBERG 1 A 5~2_ .G.. \3._ B C .... ...d - ..... .~.---. 1 11 111 _ - L 1 2 -BIN,' row FIGURE 11 Information about layout provided by motion. The views that an observer moving from point 1 to point 3 in A would have of three fixed posts i, ii, iii are shown in B. The motion parallax in those views provides information about the objects' spatial layout and sizes. For example, although the same objects at different distances provide images of unequal size, and are displaced by different amounts (vectors iv, v in B3), the ratio of image size to parallactic displacement should be invariant. Gibson (1951, 1966) has emphasized several ways in which the changing pattern of light to the moving observer, such as the optical expansion patterns in C, provide potentially usable in- formation about spatial layout and offer invar~ants that, if responded to directly, might explain the perception of distal object properties. to have been offered: Most objects and parts of the environment do not themselves change in form (as smoke or fog do), i.e., are rigid. When applied to these cases, the invariance principle means that we perceive those unchanging, rigid shapes and layouts in the world that project the changing, nonrigid two-dimensional patterns of light to the eye. This assumes that our nervous systems perform the required "reverse projective geometry" (Johansson, 1980), and that wherever the projected light at the eye permits a rigid source to be perceived, it will be. Because such theories can only account for perception obtained by mov

VISUAL PERCEPTION OF REAL AND REPRESENTED OBJECTS AND EVENTS 265 ing observers, they take the perception of still objects and pictures to be a special case, governed by special and unknown principles (Gibson, 1951, 1979; Johansson, 19801. In this view, normal perception occurs only when an observer moves about in a natural environment; research done in other situations is artificial and therefore misleading about the nature of our perceptual systems as they have evolved. Both within and outside of this approach, this rigidity principle has recently become quite popular. Directly or indirectly (in the form of the assertion that we perceive the invariant), objectwide or more locally, a rigidity principle has been adopted by many psychologists (Gibson, 1966, 1979; Johansson, 1977, 1980; Rock, 1983; Shepard, 1981; Todd, 1982) and computer scientists (Marr, 1982; Ullman, 1979~. There are at least three reasons why this principle is theoretically attractive. Exploring those reasons, and why the rule must nevertheless be rejected in any strong form, will provide a convenient survey of a critical part of the present landscape of perceptual inquiry. The Evidence for Perceptual Rules Rather Than Lookup Tables It is easy to see how learning by association might invest specific patterns of stimulation with specific perceptual meanings, and to speculate about a neurophysiological basis for such associative learning, but it is harder to be specific about a learning process through which abstract rules might be learned. (This is the distinction, made earlier, between '`lookup tables" and an inference or computational process that solves some internalized formula). Criticisms of the classical theory are often simply demonstrations intended to show that perception is determined by rules rather than by familiar associations, rules that operate without, or even against, familiar patterns. This was the central thrust of Gestalt theory, which mounted a serious challenge to Helmholtzian theory between the two world wars to find such rules, and from them to deduce the nature of the underlying brain processes. These rules, called the "laws of organization," were held to determine whether we will perceive some object at all (Koffka, 1935; Kohler, 19291. Figure 12A is a demonstration of the "law of good contin- uation": a familiar number is concealed in i but not in ii and iii because the configuration in i requires us to break the unfamiliar but smoothly continuing shape in order to see the number. These rules were also held to determine whether flatness or tridimensionality is perceived (Kopfermann, 19351. In Figure 12C, the pattern looks flat because the good continuation must be broken to perceive (1) and (2) as dihedrals at different distances,

266 JULIAN HOCHBERG A B ~ 4 , a. i · ~ 11 · ~ ~ 111 ~V=L C D E FIGURE 12 Organization and its limits. Before microelectrodes showed extensive cross-connections to exist, similar interaction had been postulated by Gestalt theorists to explain "laws of organization" as demonstrated in (A) and (B). (A) Good continuation: a number is concealed in (i) by the smoothly continuing lines that embed it (ii), but not by mere clutter (iii). (B) Gestalt factors in conflict. In (i), we perceive a sine wave crossing a square wave, against the factor of closedness, which would otherwise yield the perception of closed shapes (ii). (C,D) By the minimum principle that we see the simplest organization-(C) looks flat and (D) looks tridi- mensional because (C) is simpler as a flat pattern than (D). (E,F). The evidence is against such global organization. While you gaze at inter- section (1) in (E), the vertical line soon appears nearer than the horizontal, which is inconsistent with the simple figure fixed by intersection (2), and does so even with a moving, tridimensional cube (Peterson and Hochberg, 19831. (P) An impossible, yet apparently tridimensional, picture.

VISUAL PERCEPTION OF REAL AND REPRESENTED OBJECTS AND EVENTS 267 but Figure 12D looks tridimensional because the dihedrals would have to be broken at (1), (2), etc., for the pattern to look like a set of closed polyhedra. On a practical level, such demonstrations remind us that we cannot assume that a picture will be comprehensible if only it is an accurate projective surrogate (as in Figure 1) and as long as the object represented is itself a recognizable and familiar one. Every amateur photographer learns that flowerpots and lampposts lurk in the background, ready to appear in the picture looking very much as though they are growing out of the sitter's head. And any text on protective coloration shows the striped tiger or zebra disappearing into its cluttered background. On a theoretical level, such demonstrations have been used to argue that associative leaning does not determine perception: In Figure 12Ai specific familiarity is overcome by what seems to be an abstract configurational rule. The literature contains a large number of Gestalt rules, but each is sup- ported only by a few unquantified and untested demonstrations. Nor have the rules been used to explore brain processes. But they do appear to be of the utmost importance inasmuch as they seem to determine what shape or object will be perceived. Because several Gestalt rules usually apply in any real case, however, and because they will as likely as not work against each other, they are not of much use in their present state, lacking quan- titative measurement and with no combinatorial rules of any kind. It is not Sue, as some computer scientists and neurophysiologists have claimed, that these rules have been abandoned because they were inherently subjective and unverifiable (e.g., Marr, 19821. They stand neglected rather than aban- doned. The fact is that, until recently, only a handful of scientists were concerned with the problem of organization, and they were deflected by two more promising lines of attack on that problem which seemed to offer themselves in the 1950s. The Promise of a Minimum Principle To make the insights of Gestalt psychology scientifically or practically useful we need either a great deal of quantitative and object measurement of the strengths of the different rules, along with an appropriate combinatorial principle, or some equally quantitative and objective overarching rule that supplants the set of indi- vidual rules. For the latter purpose Gestalt psychologists offered a minimum principle, i.e., that we perceive the simplest organization the simplest alternative object or arrangement-that fits the stimulus pattern (Koffka, 19351. Attempts were initiated in the 1950s (Attneave, 1954, 1959; Hoch- berg and MacAlister, 1953) to formulate an objective minimum principle, one that would require no intuitive judgments in order to apply it. It would rest instead on measuring each of the alternative objects that could fit the

268 JULIAN HOCHBERG stimulus, to decide which alternative is simpler (e.g., number of dihedrals or edges, number of inflection points, etc. [Hochberg and Brooks, 196011. With an objective and quantitative rule of this sort, a computer could in principle assess any picture before displaying it, and Hen select for display only those views for which the object to be represented is in fact the simplest alternative (e.g., Figure 12D rather than 12C). Although no computer programs that would apply these principles to image generation have actually been attempted to my knowledge, devel- opment of this approach continues today (Buffart et al., 1981; Butler, 1982; Leeuwenberg, 1971), and it has recently been applied as well to the per- ception of simple ambiguous patterns of moving dots (Restle, 19791. Such research would be theoretically important if a minimum principle were in prospect and practically important even if all it did was contribute to solving the problems of object representation. But both its theoretical and practical meaning must be questioned in view of facts that have been known to perceptual psychologists for decades. These facts tell us that stimulus mea- sures alone cannot provide a general explanation or prediction of object perception. This will receive increasing stress in the balance of this paper. Here we note that in Figure 12E (p. 266), the place that one attends de- termines how the object is perceived: when one attends intersection (2), the cube is so perceived that the vertical edge is the nearer, in accordance with both the Idle of good continuation and with any simplicity principle; when one attends intersection (1), the perspective soon reverses, against the good continuation at the other intersection and against overall simplicity (Hochberg, 1981~. Both real and pictured objects exhibit this phenomenon (Gillam, 1972; Peterson and Hochberg, 19831. These demonstrations introduce us to the fact that the viewer's attention, and not merely the measurable pattern of stimulation, helps determine what is perceived. (We will return to this point shortly.) With respect to the minimum principle, Figure 12E is completely incompatible with any rule based on the entire object. On the other hand, it is not evident how a minimum principle based on separate parts of an object can even be formulated and tested. In any case, no advocate of the ~nnlir~tinn of the. minimum ~nncinle to entire figures has vet attempted to deal with this problem, despite the fact that it was clearly implied by discovery of the famous "impossible figures" by Penrose and Penrose in 1958 and by their popularization in the graphic art of Maurice Escher. The object in Figure 12F (Hochberg, 1968) appears tridimensional and contin- uous, even though careful inspection of the two sides shows them to be inconsistent. If the distance between left and right sides is made very short, the figure then becomes flat, and the inconsistency more evident, although the minimum principle is then no more or less applicable. ~ e ~ ~ _ ~ ~, ~ ,, ~^, _ ~ BAR ~,

VISUAL PERCEPTION OF REM ED REPRESENTED OBJECTS ED EVE=S 269 Let us next consider Be other factor that deflected attention from the objective study of organizational principles, the assumption that they applied only to stationary drawings. The Doctrine That Event Perception Is Both Fundamental and Veridical As Leonardo noted in the fifteenth century, a two-dimensional picture cannot provide a moving viewer with the motion parallax that would be provided by the ~ree-dimensional scene it represents. As Be viewer moves, nearer objects in a three-dimensional scene are displaced more in the field of view than are farther ones (Figure 131. Because Be spatial relationships between the parts of the flat picture all remain fixed, the picture is no longer a surrogate for the scene. The relative motions produced by a given dis- placement are (with certain constraints or assumptions) specific to the layout of the points and surfaces of the scene in space. The differential motions within the stimulus pattern offered by the scene provide the moving observer with rich information about the structure of the world. A critical question being explored today is how much of that information is used, and in what form. A y04 Ox ~ ~-~ ~_ B C FIGURE 13 As Leonardo knew, pictures are not surrogates for a moving viewer. In A, a viewer moves from x to y. If the display is a picture, all parts are displaced equally in the field of view B. but if it is a window, objects at different distances undergo different parallax C. Those who take the invariants of the moving stimulus array to be fundamental to perception (see Figure 11) have yet to explain how it is that we perceive pictures.

270 JULIAN HOCHBERG The precision and ease with which we can study viewers' responses to moving patterns, and to other stimuli that change over time, depends of course on the equipment with which such stimuli can be produced and presented. Until the early l950s only simple mechanical and electrical devices were in general use. Since then the dissemination of relatively cheap 16-millimeter motion picture cameras capable of producing controlled motion through animation, the advent of even cheaper and more convenient video equipment, and, above all, the availability of computer-generated displays, have progressively revolutionized the study of patterns that change with time. We are now in the midst of an explosion of research on the topic, done as much by computer scientists, physicists, and neurophysiol- ogists as by perceptual psychologists. Even the earlier and more primitive apparatus contributed a wide array of facts, some of which have been neglected in the recent interdisciplinary renaissance. Much of the earlier research was not directly addressed to questions of object perception but was intended instead to explore basic processes, e.g., the study of the time constants of the visual system's responses to flicker (Kelly, 1961), or the study of the conditions that yield apparent movement with successive simple static stimuli (Braddick, 1980; Kolers, 1972; Korte, 1915; Morgan, 19801. Some of the facts obtained in such research address the question of whether (and how well) our nervous systems respond to the stimulus changes that carry information about depth and motion (Figure 141. We know, for example, that our visual systems are extremely sensitive to motion parallax: even a very slight difference in distance between two aligned or nearby rods (Berry, 1948) and a very small head movement on the part of the viewer will provide a displacement in the retinal image that should be detectable (Helmholtz, 1866; Wheatstone, 18391. If two objects at different distances happen to line up from a particular view, therefore, and good continuation then provides a misperception of the object (as in Figures 12Ai [see p. 266], 14Bi), even a slight head movement should provide a detectable break in the good continuation. Moreover, the two-dimensional shadows or projections of irregular spatial arrangements of rods, or of dots distributed in space, or of unfamiliar objects (Figures l5A, B. C, respectively), lacking other depth cues so that they are perceived as flat arrangements when stationary, are perceived as three- dimensional layouts when they are set into motion. Even more than the static Gestalt demonstrations, these phenomena seem difficult to explain as the use of a lookup table, learned by association, that the viewer can consult to determine the meaning of some previously encountered set of sensory events: How plausible is it that the viewer has encountered the particular pattern of moving randomly arrayed dots, shown in Figure l5B, so often that by familiarity it has become a recognizable tridimensional arrangement?

271 D Lit" \ . ]~ o l :=~ ',\ q3 ~ / ._ ~ ._ 1 / ·- / :=: j__~ ._ > 1 . :- - U. o o 3 .~> Cal - - ,= s ;- D I: - C ~ Icy= en _ ~ _ ~ a ~ - C ~ ~ ~ :> 2 ~ ~ ~ lo_ .= ~ - 5: ~ "D ._ ~ °.3 3 .° ·_ ~ - ~ :- Cal Cd ~ _ ~ :; '1 m °-,, ,, c_ .O ~ C c s - i 0 X ~ 0 X ;- C .= ~ 00 Cl A_ _ e ,, ~ ~ ~ · ~ - - - :E °2~'' ~ `: A A ° - U, At o

272 (J. j ~i ' ~ .', ~ . 11 JUrdIAN HOCHBERG ~- ~ .; ! ~ o... .! :: ~ ` ·- --:--:- ·-:e ·. 'J.,. ~ ·~.5 C_~> i ~ ii A B C FIGURE 15 Structure through motion: Precomputer methods of studying motion per- ception. Unfamiliar patterns that look quite flat when static appear tridimensional when put in motion, encouraging the formulation that we perceive the rigid (or invariant) object that would provide the changing stimulus pattern. Some well-studied older ex- amples are illustrated. A: The shadows of rods on a rotating turntable i, or the rods themselves, are viewed through an aperture that occludes their ends (Metzger, 1934; White and Mueser, 19601. B: The shadow of a set of dots on a moving glass plane i is projected on a screen ii (Gibson and Gibson, 1957~. Such displays were initially the easiest to program and study in computer-generated form (Green, 1961~. C: Simple unfamiliar wire forms mounted on a turntable provide a "kinetic depth effect" (Wallach and O'Connell, 1953~. It seems far more plausible that the phenomenon is the expression of a perceptual rule. We have seen that we can in fact use relative displacement to discern spatial structure. But that still leaves open the question of what the rule is by which we fit three-dimensional space to the two-dimensional but moving stimulus pattern. As we have seen, the simplest and most general solution is that we extract that invariant object or layout that will fit the moving stimulation (Gibson, 1979; Johansson, 19801. This rule would account for the perception of rigid objects and surfaces without additional rules or constraints. Moreover, it includes the perception of motion pictures, and the phenomena represented in Figures 13 Trough 15, under the same general explanation. As computers have made it easier to generate pictures of points moving in space, and as more research is done with such patterns, the point first made by the Gestalt demonstrations that perception is governed by rules rather than lookup tables has taken hold. And although Helmholtz and the earlier psychologists to whom perception is the result of learning often talked of what amounts to perceptual rules, no formal account has been

VISUAL PERCEPTION OF REAL AND REPRESENTED OBJECTS AND EVENTS 273 offered of specific mechanisms or principles by which such perceptual learning might occur. However, once a rule is explicated with precision, it becomes relatively easy to imagine neural circuitry that might underlie its working. There is therefore added incentive today to take a "nativist" stance to propose explanations of perception that depend on innate pre- wiring rather than on learning processes. As we will see, the design of explanations in terms of such neural circuitry is a very active enterprise today, particularly in the field of computer science. But if that effort is to apply to human perception, it must start with perceptual rules that indeed are used in the human perceptual process. We still must decide what those rules are. The perceptual rule that is most readily and explicitly defined in physical terms is the currently popular rigidity principle. In fact, however, the strong forms of the rigidity principle will not work, for the perception either of objects in space or of their representations. Evidence that decisively refutes the strong form of the principle includes findings obtained many years ago, although the implications of these facts have not been adequately taken into account in most recent discussions. The same facts make the other over- arching principles, as they are presently conceived, equally unworkable. We survey some of that evidence next. Some of the points that follow have recently been made as well by Braunstein (1983), by Gillam (1972), and by Schwartz and Sperling (19831. Why the Strong Forms of Various General Perceptual Principles Must Be Rejected Although the overall case against these general rules cannot be reviewed here in detail, the strongest argument is simple and sufficient: Even when rigid moving shapes are in full view, we do not necessarily see them. In some cases we perceive instead quite different shapes undergoing nonrigid deformation. This has been known in a general way at least since 1922 (e.g., van Hornbostel found that a real, rotating wire cube reverses perspective even though it must then appear to stretch and bend), and a remarkably robust illusion known as the Ames "window" has been widely disseminated since 1951: A trapezoid (often with shadows painted on it to "suggest" the perspective view of a window) rotates continuously in one direction (e.g., arrow vii as seen from above in mirror, M) either clockwise or counter- clockwise, in full view, as shown in Figure 16A. It is not seen as such. Instead, it is perceived as oscillating (=row viii), reversing direction twice each cycle so that the larger end (i) (or iv in the mirrors always appears the nearer. It is as though a process of unconscious inference were at work, assigning depth on the basis of the static depth cue of linear perspective (Figure 2, see p. 251) and inferring direction of movement from relative

274 ~ _ ~ ,' _, - viii l I \ / \ J M Viii/ IM ix it ,~ A B JULIAN HOCHBERG , 1 1 1 1 FIGURE 16 A classic illusion with a moving object. In A, a flat trapezoid, with markings painted on its surface to "suggest" depth, is seen from in front, with i and iii equidistant from the viewer, and from above in the mirror M. At B. the trapezoid has rotated so that the small edge iii is nearer the viewer. Ames (1951) found that though rotating continuously (arrow vii in the top view) it appears to oscillate back and forth (arrow viii); gee text. Although the rod, ii, is rigidly fixed to the trapezoid, it is correctly seen to rotate, passing through the substance of the trapezoid! The trapezoid cannot both appear to oscillate and yet remain rigid in appearance. The solid and doped outlines in C are its shape as presented to the eye when edge iii or i is respectively the nearer. If seen to oscillate, the trapezoid must also appear to deform between these shapes, as shown by the arrows, although this nonrigidity is not normally very noticeable. depth. I hasten to add that although it is widely offered (e.g., Ames, 1951; Gibson, 1979; Graham, 1963; Hochberg, 1978b) there is no experimental support for such an explanation of the phenomenon; indeed, there are features of He changing retinal image that might be direct, if misleading, bases of the illusory response (Braunstein, 1976; Hochberg, 1984b). (For example, even when He larger end swings away from the viewer, as shown by arrow vii in Figure 16B, a vector of expansion, ix, will generally be provided as the large end swings in toward the axis of rotation, and ex- pansion is normally a correlate of approach; cf. Figure l lC, p. 264.)

VISUAL PERCEPTION OF REAL AND REPRESENTED OBJECTS AND EVENTS 275 This illusion is extremely strong and difficult to overcome even when the viewer is confronting a real object (although in that case, one eye must be kept covered at close distances; at really close distances the true shape and movement may be seen monocularly as well). When the viewer con- fronts a moving picture, rather than the object itself, the illusion is almost irresistible. The virtual shape that then fits the perceived illusory movement to the changing pattern of light at the eye must then be nonrigid (Figure 16C), and the perceived path must follow a complex and changing radius. Moreoever, the illusory 180-degree oscillations of the trapezoid are per- ceived even if a rod is rigidly affixed to the trapezoid, as in Figure 16A and B; the rod does not appear rigidly fixed, but pursues its 360-degree rotation, apparently passing through the trapezoid like a phantom when the trapezoid reverses its apparent course. (This is true even if the viewer is simultaneously shown the setup from above in a mirror; the rod and trap- ezoid are seen to rotate 360 degrees as a rigid unit in the mirror, and, at the same time, to move in separate parts in direct view.) Thus, a truly rigid and invariant object, moving in a simple and invariant orbit, is not perceived, and instead a nonrigidly deforming and quite illusory object is seen, moving in a complex and variable path. This phenomenon, although widely popularized since 1951, has been virtually ignored by those who propose that our perceptions are determined by invariance, rigidity, or simplicity principles. I can find only brief mention of the phenomenon (Gibson, 1979), claiming that it occurs only when the motion-p;ovided information is below threshold, and that-then the illusion rests on unconscious inference. This highlights the question of thresholds, which must surely be considered before we can say that any of the motion- produced information discussed in connection with Figure 11 (see p. 264) provides anything useful to the viewer, and which has yet to be addressed in any systematic evaluation of the direct theory (Cutting, 1983; Hochberg, 19821. Moreover, by invoking unconscious inference, this way of dealing with the phenomenon spoils the direct theory's claim to parsimony. But in any case, that answer is wrong. Even when the changes provided by the moving object are clearly above the detection threshold and the illusion is therefore accompanied by clearly perceived nonrigidities, the latter is what we see, and not the veridical rigid motion (Hochberg, 1984b; Hochberg et al., 19841. Subtle arguments are not needed, however: Given the lessons of Figure 16, we can readily devise new illusions in which rigid simply moving objects, freely viewed (with monocular vision), are seen to bend and deform nonrigidly, as in Figure 17. Motion is not enough to ensure veridicality, therefore, and what is perceived may be perceived against any effects that simplicity, invariance, and rigidity principles might exert.

276 ................................................. ........................ ..~ I,................ 3: ' '.' ' 2'.? ......... :::::: ........... ~ , . .,,, . ~ .. .................... : : ; : : ; .' ', ~' ................... ........................ ,,.,,,.,,, , , I . . . ....... ............... . A JULIAN HOCHBERG I' (i) I'` _'` (ii) B FIGURE 17 Apparent bending in a rigid, moving object. A flat, rigid octagonal cutout, with markings painted on its surface to "suggest" depth, is shown in front view at A and from above at B. To monocular vision, when it moves as shown in Bi, it tends to appear instead to hinge in the middle, and to "flap" away from the viewer as shown in Bii. Similar but less compelling effects occur without the markings, and with oval shapes as well (Hochberg and Spiron, 1985~. COMPUTERS AND PERCEPTUAL PSYCHOLOGY The microelectrode was a major technological watershed, and its effects were quickly manifest. The introduction of the computer has had far greater effects, but they have been more diffuse, are slower in being realized, and are still growing, as computer science and technology change. There are six main ways in which the computer has affected perceptual psychology; although these ways are closely intertwined, they are also very different, and it is important to separate them if one is to understand the relationship between the two disciplines. The first two uses are contributions that computers now offer every branch of science: obtaining and analyzing data, and modeling theories and ex- planations. Obtaining Data The computer has of course radically changed the methodology of mea- surement and analysis. For example, the direction and changes in the sub- ject's gaze can be monitored and even used to control the display that confronts the eye (McConkie and Rayner, 1975), permitting the detailed study of how the integration of successive glances occurs in the process of reading text and perceiving pictures. Such research would simply have been impossible without high-speed and powerful computers; we will see the the problem to which this method is addressed is of central importance. By handling large quantities of numbers and rapidly executing operations that

VISUAL PERCEPTION OF REAL AND REPRESENTED OBJECTS AND EVENTS 277 were once prohibitively time-consuming and expensive, the computer makes available data that were in essence unobtainable. This is true both of phys- iological data and of judgments intended to tap perceptual experience. Thus, physiological signals that are normally far too weak to be distinguished from accompanying bioelectrical background noise can be adumbrated by computer methods that cumulate them over many occurrences, making it possible to measure the electrical potentials (Donchin et al., 1978; Sutton et al., 1965) and magnetic fields (Kaufman and Williamson, 1982; Reite and Zimmerman, 1978) at the scalp. Such averaged transients reflect neural responses that the brain makes to sensory stimulation and that accompany perceptual processing. Modeling Theories and Explanations The second use of the computer, common to all science, is to model theoretical proposals and explanations for which it would otherwise be impossible or too laborious to say whether and how they would work. Whether some hypothetical neural network would respond as designed (Hebb, 1949; Marr, 1982; Rashevsky, 1948; Rosenblatt, 1962), whether a particularly defined set of flow patterns (like Figure 16C) would specify uniquely a set of surface forms in the world (Ullman, 1979), whether a particular history of strengthened associations would even theoretically re- sult in perceptual learning (Hebb, 1949; Minsky and Papert, 1969; Rosen- blatt, 1962) these are questions that cannot be answered simply by considering them in verbal form but that can often be answered once the functions are stated specifically enough to run as a computer program. Embodying Perceptual Functions A second branch of computer science aims at embodying perceptual functions, similar in effect to those of humans, in computer hardware and software. We must distinguish two distinct purposes that guide this enter- prise. One is to design and provide devices that can serve instead of humans. Some of these functions are readily achieved (the sensors that open super- market doors, the bar-code scanner that identifies and prices items at the checkout counters), and some are probably unachievable in the foreseeable future (e.g., machines that respond to or translate free and normal human discourse); but in general there is no compulsion to serve each function in the same ways that humans do. Human perceptual functions here serve only as "existence" proofs that assure the computer scientist that at least one way of solving the problem exists and is embodied in human neuroanatomy. Once we start to consider the means by which modern electronic com

278 JULIAN HOCHBERG puters might perform such tasks, however, we develop new ways of thinking about how the human nervous system performs its perceptual tasks. The computer then serves as an analogy or even a model for the study of human perceptual processes. That may turn out to be the most important relationship of all between computer science and perceptual psychology, and we consider that next. The Computer as an Analogy to Perception Perhaps the greatest effect of the computer has been its influence as an analogy: Inherently vulnerable to entrapment in the mind-body problem of philosophers and metaphysicists, and self-conscious about the need to be scientific, psychology is always tempted to confine its attentions to variables that are conceived and measured in physical terms. Indeed, almost since J.B. Watson's behaviorist manifesto in 1913, physical measurement and nhvsical (or at least ohvsiolozical) conceptions have enjoyed intellectual r--J~ hegemony in this country. There was of course continuous opposition, both on scientific and me- taphysical grounds, and the field of perceptual psychology by its very subject matter was less constrained by behaviorism than other fields of psychology, but for that very reason it was almost abandoned as a discipline for some two decades. It was not until the late 1950s that what can only be called "mental" conceptions and measures once again became scientifically re- spectable to the rank and file of the profession. I am convinced that the main factor in this change was the obvious fact that computer programs are in principle transportable to very different physical machines. They can therefore be analyzed and discussed in abstract functional terms without reference to the specific hardware in which they must be embodied to perform. Familiarity with computer functions, terminology, and flow charts made it possible to describe what the mind might be doing in a way that could, in principle, be instantiated in a program and then embodied in a machine (Miller et al., 1960; Rosenblatt, 1962; Selfridge, 1959~. Something like this had already been done repeatedly, long before com- puters were developed, from Descartes' design in 1650 of a hydraulic model underlying neural function, to Tolman's analysis of purposive learning by a "schematic sowbug" in 1938; but there was never any real likelihood that the analyses might be put to the test by building the machines. The general-purpose computer and transportable programs have made the point much more powerfully. The language of cognitive psychology is now very close to the language of computer science. There is usually no guarantee that any given flow chart with which the cognitive psychologist offers to explain some phe

VISUAL PERCEPTION OF REAL AND REPRESENTED OBJECTS AND EVENTS 279 nomenon can in fact be translated into an executable program, but if not, it is inadequate because it is vague or inconsistent and not because it is mentalistic. Computer Science in Perceptual Psychology Research The attempt to design machines that embody human perceptual functions (or to design programs that model such machines) rests on the belief that only in this way can we be sure that we have achieved a scientific under- standing of those functions. This is an old belief and undertaking, but the advent of the modern computer makes the venture seem more plausible. Given its purpose, this undertaking must start with scientific empirical knowledge of how humans perceive. That is of course precisely what the task of perceptual psychology has been. In consequence, the two disciplines now overlap greatly, and an increasing amount of perceptual (and cognitive) psychology is currently being done in computer science departments. This is a very promising development, and some of the work has received wide attention as a "breakthrough"; but it would be unwise to overestimate whet has been accomplished et this early stage (see Braunstein, 1983; Haber, 1983~. The approach ensures precise modeling of theories but does not by itself provide either new theories or new facts about human perception. The point is worth spelling out in a brief examination of the field. Because it is far easier to make initial progress at formulating specific models of direct neural response to stimulus information than at formulating specific models of central processes of learning and inference, most of the work in this field has concentrated on the former (see Haber, 19831. As a first stage, any perceiving machine must be able to separate objects from their cluttered surroundings; this problem is very difficult to deal with in still pictures (Oately, 1978; Roberts, 19651. We have seen above that the problem is mathematically less refractory, given the multiple views provided by motion parallax and binocular parallax (see Figure 14A and C on p. 271) in that fewer constraints are needed to specify the three-dimensional layout that would produce the stimulation at the eye. It is understandable, therefore, that computer scientists have recently turned to models of binocular ster- eopsis (Marr, 1982; Marr and Poggio, 1979) and of the perception of structure through motion (Marr, 1982; Ullman, 19791. These "computational" models are totally within the mainstream of perceptual psychology (although that is not always clear from their pre- sentation, nor from their reception). For example, the computational model of binocular stereopsis devised and tested by Marr and his colleagues was a relatively slight variation of a detailed theory published in 1970 by Sper- ling, a psychologist; Sperling's theory is itself well within that class of

280 JULIAN HOCHBERG psychologists' explanations of stereopsis (see Kaufman, 1974) that have taken Johannes Kepler's (1611) geometrical analysis of the binocular lines of sight that obtain when viewing objects in space as the model of an internal "binocular neural field" that merely reflects that geometry. And Ullman's computational analysis of the information that moving stimuli give about their layout in space takes its place within the long tradition of such analyses and research. Neither of these computational perceptual the- ories can claim to be more than partial accounts of the phenomena in the domains they address. For example, even with unimpeded binocular par- allax, we perceive the concave mold of a human face lit from below as a convex face lit from above; even with unimpeded motion information, as we have seen (Figures 16-17), at least some rigid objects are perceived as nonrigid and in wrong slant and motion. These computational theories do not differ from other attempts at sensory explanation of object perception in their inability to deal with such problems. They differ only in that they are restricted to models that can be successfully run as computer programs, and that is not necessarily an unalloyed virtue. Although computer simulation and "computer perception" have received considerable praise in recent years, there are grounds for criticism as well. The need to devise perceiving machines that work as humans do is certainly not a valid economic argument one can usually find far more direct means of performing specific tasks. Nor is computability a necessary criterion for assessing any theory, regardless of how desirable that quality may be (and despite the stress on simulation studies currently evident in many quarters). But these arguments are moot. Regardless of the intrinsic merits of computer simulation and of the quest for perceiving machines, and without appeals to metatheory or philosophy of science, there remains a present and growing need to develop theories of human perception to the point that they can be embodied in computer programs. That is the relationship be- tween the computer and perceptual psychology that we consider next. Why Models of Human Perception Are Needed Computers communicate to their human users through pictures as well as through words and numbers. But more than that, they are increasingly used specifically to generate pictures: as interfaces between the viewer and some part of the world that would otherwise be difficult or impossible to see; as means of visualizing designs of buildings, machine parts, molecules, chromosomes, or cellular processes; as substitutes for human artists and animators in creating graphic displays for advertising and entertainment; as simulators in flight training. The use of such devices is already great, and growing rapidly. In many cases, the pictures (or pictorial sequences) that

VISUAL PERCEPTION OF REAL AND REPRESENTED OBJECTS AND EVENTS 281 are displayed were not themselves programmed and were never previewed in any sense, but are generated in response to the question that the user asks or in reaction to something that the user does. An example of the former would be what an architectural layout or a machine part will look like from some chosen location, or what the state of a flow chart would be under some specified conditions; an example of the latter would be a low- altitude flight simulator display, which depends, of course, both on the terrain being simulated and on the individual pilot's actions. Without a human editor intervening between computer and user, there is some unknown likelihood that the viewer will be shown misleading or incomprehensible pictures. Where that likelihood must be minimized, the computer must avoid certain classes of pictures, or must be prepared to enrich or enhance those pictures. This means that we must be able to specify, in terms acceptable to a computer, how humans will perceive a picture or a sequence of pictures. The study of the rules of representation is now a vigorous and growing field in which perceptual research finds practical application (Cutting and Millard, 1984; Haber and Wilkinson, 1982; Stevens, 1983; Todd and Min- gola, 19831. Although this task shares much of what we must learn through exploring analogies between computers and human perception, it is also significantly different. It cannot ignore as mere embarrassments the cases in which we misperceive, the exceptions to proposed generalizations- indeed, it is just those cases that must be the focus of inquiry. And that is fortunate for psychology, because those are the cases that test the generality of any perceptual theory. A superficial answer to the question of how we can ensure comprehensible pictures is to increase the fidelity of the surrogate i.e., to make the light to the eye more like that provided by the object or scene that is being represented. That means improving the resolution and the color balance, avoiding distortions, etc. Indeed, if other things are equal, an improvement in these engineering factors will usually improve picture comprehensibility. But we have seen that even perfect fidelity i.e., the moving object itself may result in misperceptions (Figures 16, 17~. The constraints on mental structure-on the structure of perceived objects are not the same as the normal constraints on physical objects, and we must know the former as well as the latter if we are to be able to predict how pictures are perceived, even with the best picture quality possible. There are practical limits, moreover, to the pictorial information that we can count on. One can see detail in a closeup, or an entire object or scene in a long shot, but not both. Picture quality is limited, and the techniques that motion picture and television filmmakers have developed to cope with those limits surveying or scanning an object or scene by successive partial

282 JULIAN HOCHBERG views or closeups require the viewer to go beyond the momentary sensory input, and to enter and store the successive partial views in some mental or perceptual structure of the object. Perceptual structure refers to the relationships within what one perceives (Garner et al., 1956; Hochberg, 1956), that is, to information about the object that can be retrieved from the viewer-for example, that the sides of a cube look equal and parallel, that the vertical edge at I in Figure 12E (see p. 266) looks farther than the horizontal edge it intersects when the vertical edge looks the nearer at 2. To the degree that perceptual structure reflects the structure of the physical stimulus, physical analyses of optical information will serve to model the perceptual process; obviously, as long as we stipulate that some object, say, a wire cube, is perceived correctly, the layout of the physical cube itself must serve to predict the relative apparent nearnesses of its parts. The simplicity of this task is of course what makes the more extreme direct theories so attractive. To the degree that perceptual structure reflects known (or hypothetical) neurological struc- ture, however, the latter must also modify any attempts to relate what the viewer perceives, on the one hand, to the optical structure of the object or scene that confronts the perceiver, on the other. Thus, what we know about the distribution of acuity over the retina or what we think we know about spatial frequency channels must be used in attempting to predict the effects of the information that could otherwise be provided by the optical structure. Computer models of perception can incorporate both kinds of structure with very little input from psychological research. To the degree that perceptual structure reflects none of these that is, to the degree that it expresses what we may call mental structure- per- ceptual research must provide the facts that are needed for any theories, whether or not those theories are embodied in computer models. Such facts are obtainable but sparse; for this reason, computer science has as yet very little to say about the modeling of mental structure. Some terms have been offered (e.g., Minsky's "frames" [1975], roughly equivalent to an expec- tation or a schema), but terms or even models are not needed here so much as facts, and more attention paid to what facts we do have. We next consider very briefly the current state of research on mental structure in real and represented objects. MENTAL STRUCTURE IN OBJECT PERCEPTION AND REPRESENTATION Where perception can be predicted from the pattern of stimulation that falls on the sense organs, we are free to argue that the stimulus pattern itself (as transformed and limited by the sensory system) determines what

VISUAL PERCEPTION OF REAL AND REPRESENTED OBJECTS AND EVENTS 283 we perceive. Of course, we are also free to hold that there are other factors at work as well. The fact is that we have a long history of demonstrations that sensory stimulus information is neither necessary nor sufficient to determine what we perceive. One source of such evidence is provided by completion phenomena, examples of which are shown in Figures 18 and 19. These should not be dismissed as strained: In a normally cluttered world, such interrupted and fragmented shapes must be the rule rawer than the exception. Nor can our perceptions of these shapes be profitably ascribed either to complex sensory structures (such as receptive fields and frequency channels) or to invariant stimulus information. The perception of a single object rather than of sep- arate fragments often depends on the viewer's having specific knowledge of what that object normally looks like, and on being ready to perceive it. That is very much what Mill and Helrnholtz meant by perception. A nice demonstration to which Dallenbach called attention in 1951 is shown in Figure l9A, in which few viewers can discern any clear object. After looking at Figure l9B, however, it is remarkably difficult not to see that same object when looking at Figure l9A. . / A ~ C B ~ it) (I D FIGURE 18 Completion phenomena. A selection of simple geometric shapes: at A, a square; at B. a circle; at C, a circle, triangle, and square; at D, a cube. In fact, the fragments that are shown do not by themselves define or specify any shape. It might be, for example, that C consists of the block letters CAT, partially occluded.

284 JULIAN HOCHBERG Completion phenomena have been known to psychologists for more than a century (and to artists, of course, for much longer). While the classical theory prevailed (see Figure 6, p. 254), such demonstrations were taken for granted and assumed to reveal the pattern of associations (or sensory expectations) that each viewer has learned from experience with any object. Although viewers would differ in their individual perceptual histories, the structure of their sensory expectations or associations should nevertheless reflect at least grossly the covariations or contingencies of the physical world as filtered through their limited sensory systems. That is, mental structure should be predictable from measures of physical structure (e.g., Brunswik, 1956), once sensory limits are taken into account. In some cases, mental structure does indeed seem to be at least approx- imately that of physical structure (e.g., the "constancies" described in connection with Figure 7), but we also have had dramatic counterexamples for the past 30 years (Figures 12F, 16A). Some psychologists still adhere to "unconscious inference" explanations today (Gregory, 1970; Rock, 1977), but such counterexamples make that proposal as it now stands an empty one. To mean anything at all, the premises of such supposed inferences must be investigated and not simply taken to be the same as the structure of the physical world. Moreover, as we have discussed at length, the last 30 years have also shown that much of perceptual structure may be given directly by complex neurophysiological circuitry; if that is at all true, such prewired perceptual structure must surely affect the nature and use of what- ever mental structure does exist in addition. For example, for all we know at present the Ames trapezoid phenomenon (Figure 16) may result not from unconscious inference but from some direct sensory mechanism that pro- vides a salient illusion only in certain conditions (see Figure 16C, p. 274; Hochberg, 1984b). We need, therefore, to study mental structure and to measure its char- acteristics. The very topic has an aura of insubstantiality, until recently anathema to many psychologists eager to avoid subjectivity and mentalism. To study how a person perceives some object we must in one way or another ask him or her questions about that object retrieve information from the subject about the object. In cases like the completion phenomena, we must ask the viewer questions about an object that is not in fact present and for which only that absent virtual object, and the few stimulus fragments ac- tually shown to the viewer, can be confidently described in physical terms. That fact is a challenge but not an insuperable obstacle. There is actually a considerable body of research with a much more extreme experimental situation: Since Gallon (1883) first undertook to study individual differences in mental imagery, methods have existed for studying how well individuals can retrieve information about objects for which no stimulus information

VISUAL PERCEPTION OF REAL AND REPRESENTED OBJECTS AND EVENTS 285 whatsoever is present. Such "objective tests of imagery' (see Woodwork, 1938) have seen increasing use in recent years, but rather than being used to probe individual differences in imagery, most present research is directed to examining the nature of the imagery process itself (Kosslyn, 1980), a task that faces many of the same challenges as does the task of studying mental structure in perception. Whether such imagery studies have sub- stantial implications for perception is unclear. We do not know whether imagery, studied with no stimuli present, is related in any simple way to the mental structure that is involved in the perception of partially present objects. That can only be answered by research on the process by which mental structure informs and accepts sensory information. The need to fit fragments of sensory information into some mental s~uc- ture is pervasive in normal perception. The perception of objects that are partially obscured in normally cluttered environments must often draw on a process of fitting fragmentary sensory information into a previously pro- vided mental structure (Figure 191. In addition, our perceptions of any scene or moderately large object must be assembed over time by means of successive glances, each of which provides only a partial view of the world. Finally, as objects are temporarily obscured by nearer ones (as viewer, object or both move through the world), we must be able to keep Back of their motions even while they are out of sight, and to recognize them when they reappear. Both of these functions are drawn upon in our perceptions of real objects in the world and also in FIGURE l9A A completion figure. The mysterious object shown in this high contrast photograph is more readily apparent in Figure l9B on page 286 (reprinted, with per- mission, from Dallenbach, 1951~.

286 JULIAN HOCHBERG ~. Baa n . FIGURE l9B The same cow shown in l9A (repented, with permission, from Dal- lenbach, 1951~. Once the object has been seen in Figure l9B, it is remarkably difficult to avoid seeing it in l9A as well. film and video, as cameras cut from one scene to another (both successfully and unsuccessfully). And both functions suggest methods by which mental structure may be studied. We consider these in turn. In the nonnal process of directing our gaze at different parts of some object, each glimpse offers detailed vision only in a small central part of the retina. The information gained by the successive fragmentary glimpses (as many as four per second) must therefore be integrated by some non- sensory process into a single perception. Similarly, in virtually all motion picture or video sequences, successive closeup views or shots each provide a partial view of some scene that may never be shown in its entirety (which would be a long shot) and may in fact not exist at all save in the mind's eye of the viewer. This is a kind of completion over time, of central importance to perceptual theory and application, that could not be studied at all until the last decade, when motion pictures and high-speed computer graphics became generally available as laboratory tools. There has as yet been little more research on this aspect of object per- ception than to show that such research is possible. The row of circles in Figure 20 represents a sequence of successive views that simulate a sta- tionary circular aperture through which the individual corners of some object that is being moved about behind the screen in this case, a cross are visible. If the motions of the corners were themselves visible, the viewer could construct the entire object behind the screen in his mind's eye, de- tecting, for example, that a specific arm of the cross has been skipped

VISUAL PERCEPTION OF REAL AND REPRESENTED OBJECTS AND EVENTS 287 TIME - . [a ~ ~ [a I Be TO C ~ O _ ~ C' ~ . .1 0 .5 1.0 SEC ~1 [a FIGURE 20 Other completion figures. A sequence of right angles, presented at rates of 333 to 2000 msc per view, is shown at A. Subjects who are shown two such sequences, which may or may not differ in one or more views near the middle of the sequence, cannot tell better than chance (the baseline in C) whether the sequences are the same or different, because each sequence of views, considered as independent items, far exceeds their memory span. In each pair of sequences, at least one sequence is in fact a systematic succession of closeups of the corners of a cross. If each sequence is introduced by a long shot and a medium shot, as at B. which establishes the overall object and the starting point of the sequence, then each view of the unchanged sequence takes its place in turn within the structure that the viewer has in mind, whereas Me altered sequence does not, and the difference between the two sequences (which are no longer strings of independent events) becomes evident, within the time limits indicated at C. within the sequence. If the motions are not visible, as they are not in this experiment, then the sequence of static views is indecipherable and in fact cannot be kept in mind; if a long shot of the object is presented first, however (as in row B), providing a mental structure within which the successive views can take their place, the subject can again perceive the object that is moving behind the aperture (Hochberg, 1978a). It is the mental structure of the object that makes the stimulus sequence comprehensible. Given the long shot and the structure it provides, two sequences that are different are perceived as such; without the structure, the viewer cannot distinguish one sequence from another. When a pedestrian you are watching is lost from view while he passes behind a parked truck, or while you divert your gaze to the traffic light, you must still be able to tell approximately when he will return to view from behind the truck, or where he will have gotten to when you look back from the traffic light. Such predictive functions, for which we can surely

288 JULIAN HOCHBERG find ample evolutionary demands, imply that something that corresponds to motion through space occurs in the mind's eye of the viewer. The filmmaker or graphics programmer who cuts away from one event to an- other, and then returns to the first one, must make some assumptions about how well the viewer keeps track of any motion that is going on in the first event. The following research shows that discussing such mental motion is more than just a poetic metaphor. Shepard and his colleagues had shown in a wide range of experiments that the time subjects need to judge whether two objects are the same or different (Figure 21A) is proportional to the angle between them, as though one object were being mentally rotated at some constant rate to bring it to the same orientation as the other in making the comparison (Shepard and Cooper, 1982; Shepard and Metzler, 1971~. Using that paradigm, Cooper (1976) first determined each subject's characteristic "mental rotation" rate, co, and then, after having had subjects memorize the figures, displayed the comparison figure at some variable angle (~) and delayed after a starting signal by a variable interval (t). She found that if the product of It) x (t) = Gil, judgment times no longer increased with angle Ail: they were now inde- pendent of the angle between the two objects being compared (Figure 21B). The results are what one would expect if the object had in fact been rotated at angular velocity co x (t) between presentations i and ii, and if both objects had come to the same orientation by the time the comparison was called for. Given these findings, "mental rotation" seems more than a metaphor that summarizes the fact that judgment time is a function of angle (~) in Figure 21A. It implies a usable and consistent relationship between time and distance in a mental structure that cannot be attributed to physical stimulus information. A third and quite different paradigm, which appears in a recent technical report by Cooper (1984) on work in progress, may tell us something more general about the form in which perceived objects are manipulated and stored and may also eventually provide a tool with which to compare how well different methods of representation accord with the ways in which objects are perceived and remembered. Subjects had been given two or- thographic projections of an object (a and b in Figure 22) and were to judge whether a third orthographic projection (c) was of the same or of a different object. No isometric projections (e.g., c) of any objects were shown to the subjects at this time. Subsequently the subjects were shown a set of isometric projections (e.g., c, f), some of which represented the objects used in the previous tasks and some of which did not. Subjects tended to report that they had seen the former before, even though no isometric pictures at all had been shown. Although this research is still in progress, and various

VISUAL PERCEPTION OF REAL AND REPRESENTED OBJECTS AND EVENTS 289 . i '1 is; I. _ .. ~ 11 1 B . E11- 1 ...1, 111 ~ ... . Ill,lV,, ~ . . . 1,11 i' i' a' ANG LE A . ~1 - ~' it, ,,' 1' A' t=0 cot= 0 FIGURE 21 Mental rotation. A: Given two objects at different orientations, the time that it takes to judge whether they are the same or different is a function of the angle between their orientations, whether in the picture plane i, ii or in depth iii, iv. It is as though the subject must rotate one object into the orientation of the other before the two can be compared (Shepard and Metzler, 1971~. B: If the two shapes to be compared i, ii are presented simultaneously (i.e., separated in time by an interval t = 0.0) their reaction time R.T. increases with angle, ¢, between their orientations, as above. But if the comparison figure is presented after an interval t = ¢/m, where ~ is the subject's characteristic rotation rate (obtained from the slope of the function at t = 0.0), then the R.T. does not increase with increasing angle ~ (Cooper, 1976~. This is just what one would mean by saying that the subjects had rotated the object before making the comparison.

290 JULIAN HOCHBERG a EE] b d ~ f ~ ; \ \ \ _ ,-~ , a Cal ~3 e FIGURE 22 The structure of perceived objects orthographic and isometric projec- tions. Two different pictorial systems are shown here: Pictures a and b and pictures d and e are orthographic projections of objects whose respective isometric projections are c and I. Isometric projections are easier to grasp, at least for these objects. Cooper (1984) presents preliminary evidence that even when subjects have been presented only with orthographic projections of objects, they tend to report later that they have seen isometric projections of those objects. controls are needed, the preliminary results will, I feel certain, survive the necessary replication and controls: Orthographic and isometric projections can both specify the form of a three-dimensional object, but the isometric projections are in some sense closer to the way in which we extract and store the information-closer to the mental structure involved in perceiving and comparing the objects. Although I know of no research to the point, it hardly needs an experiment to discover that isometric pictures are more rapidly and accurately comprehended than orthographic ones. What ex- periments can do is give us a better understanding of why that is so, and of the sense in which the isometric picture is more like the structure that underlies our perception of the object. These three research procedures that I have described in connection with Figures 20 through 22 are interesting more as examples of a field of ex

VISUAL PERCEPTION OF REAL AND REPRESENTED OBJECTS AND EVENTS 291 penmental and quantitative inquiry than as demonstrations that mental pro- cesses can be studied and are in that sense real; the latter is not a new conception. It has repeatedly come into and gone out of scientific fashion, and merely showing that mental structure "exists," in some sense, will not add much to its history. Fortunately, this time there are vested interests in obtaining and systematizing the knowledge, and technical facilities for doing so, that should keep research and theory centered on these problems of object perception and representation for some time to come. REFERENCES Ames, A. 1951 Visual perception and the rotating trapezoidal window. Psychological Monographs, No. 324. Attneave, F. 1954 Some informational aspects of visual perception. Psychological Review 61:183-193. 1959 Applications of Information Theory to Psychology. New York: Holt, Rinehart and Winston. Barlow, H., Blakemore, C., and Pettigrew, J. 1967 The neural mechanism of binocular depth discrimination. Journal of Physiology 193:327- 342. Bekesy, G., van 1960 Neural inhibitory units of eye and skin. Quantitative description of contrast phenomena. Journal of the Optical Society of America 50:1060-1070. Berry, R.N. 1948 Quantitative relations among vernier, real depth, and stereoscopic acuities. Journal of Experimental Psychology 38:708-721. Blakemore, C., and Campbell, F.W. 1969 On the existence of neurons in the human visual system selectively sensitive to the orientation and size of retinal images. Journal of Physiology 203:237-260. Braddick, O.J. 1980 Low-level and high-level processes in apparent motion. In H.C. Longuet-Higgins and N.S. Sutherland, eds., The Psychology of Vision. London: The Royal Society. Braddick, O.J., Campbell, F.W., and Atkinson, J. 1978 Channels in vision: basic aspects. In R. Held, H.W. Leibowitz, and H.L. Teuber, eds., Handbook of Sensory Physiology, Vol. 8. Heidelberg: Springer. Braunstein, M.L. 1976 1983 Depth Perception Through Motion. New York: Academic Press. Contrasts between human and machine vision: should technology recapitulate phy- logeny? In J. Beck, B. Hope, and A. Rosenfeld, eds., Human and Machine Vision. New York: Academic Press. BNnswik, E. 1956 Perception arid the Representative Design of Psychological Experiments, 2nd ed. Berkeley: University of California Press. Buffart, H., Leeuwenberg, E.L.J., and Restle, F. 1981 Coding theory of visual pattern completion. Journal of Experimental Psychology: Human Perception and Performance 7:241-274.

292 JULIAN HOCHBERG Butler, D.L. 1982 Predicting the perception of three-dimensional objects from the geometrical information in drawings. Journal of Experimental Psychology: Human Perception and Performance 8:674-692. Campbell, A.G., Hartwell, R., and Hood, D. 1978 Lightness constancy at the level of the frog's optic nerve fiber. Proceedings of the Eastern Psychological Association 49:47 (Abstract). Campbell, F.W., and Robson, J.G. 1964 Application of Fourier analysis to the modulation response of the eye. Journal of the Optical Society of America 54:518A (Abstract). Cavanaugh, P. 1984 Image transforms in the visual system. In P.C. Dodwell and T. Caelli, eds., Figural Synthesis. Hillsdale, N.J.: Lawrence Erlbaum Associates. Cooper, L.A. 1976 Demonstration of a mental analog of an external rotation. Perception and Psycho- physics 19:296-302. 1984 Strategic Factors in Complex Spatial Problem Solving. Invited paper presented at the annual meeting of the Midwestern Psychological Association, Chicago, Illinois. Cutting, J.E. 1983 Four assumptions about invariance in perception. Journal of Experimental Psychology: Human Perception and Performance 9:310-317. Cutting, J.E., and Millard, R.T. 1984 Three gradients and the perception of flat and curved surfaces. Journal of Experimental Psychology: General 113:198-216. Dallenbach, K.M. 1951 A puzzle picture with a new principle of concealment. American Journal of Psychology 54:431-433. Descartes, R. 1650/ Les passions de l'ame. In E.S. Haldane and G.R.T. Ross, trans., The Philosophi 1931 cat Works of Descartes. Cambridge, England: University Press. DeValois, R., and Jacobs, G. 1968 Primate color vision. Science 162:533 - 540. DeValois, R.L., Albrecht, D.G., and Thorell, L.G. 1976 Spatial tuning of LGN and cortical cells in monkey visual systems. Pp. 60-63 in H. Spekreijse and H. van der Tweel, eds., Spatial Contrast. Amsterdam: North-Holland. Donchin, E., Ritter, W., and McCallum, W.C. 1978 Cognitive psychophysiology: the endogenous components of the ERP. Pp. 349-412 in E. Calloway, P. Tueting and S.H. Keslow, eds., Event-Related Potentials in Man. New York: Academic Press. Foster, D.H. 1984 Local and global computational factors in visual pattern recognition. In P.C. Dodwell and T. Caelli, eds., Figural Synthesis. Hillsdale, N.J.: Lawrence Erlbaum Associates. Galton, F. 1883 Inquiries into Human Faculty and its Development. London: Macmillan. Garner, W.R., Hake, H.W., and Eriksen, C.W. 1956 Operationism and the concept of perception. Psychological Review 63:149-159. Gibson, J.J. 1950 The Perception of the Visual World. Boston: Houghton Mifflin. 1951 What is a form? Psychological Review 58:403-412. 1966 The Senses Considered as Perceptual Systems. Boston: Houghton Mifflin. 1979 The Ecological Approach to Visual Perception. Boston: Houghton Mifflin.

VISUAL PERCEPTION OF REAL AND REPRESENTED OBJECTS AND EVENTS 293 Gillam, B. 1972 Perceived common rotary motion of ambiguous stimuli as a criterion for perceptual grouping. Perception and Psychophysics 11 :99-101. Ginsburg, A. 1971 Psychological Correlates of a Model of the Human Visual System. Master's thesis Air Force Institute of Technology, Dayton, Ohio. Gogel, W.C. 1984 The role of perceptual interrelations in figural synthesis. In P.C. Dodwell and T. Caelli, eds., Figural Synthesis. Hillsdale, N.J.: Lawrence Erlbaum Associates. Graham, C.H. 1963 On some aspects of real and apparent visual movement. Journal of the Optical Society of America 53:1019- 1025. Graham, N. 1981 Psychophysics of spatial-frequency channels. In M. Kubovy and J. Pomerantz, eds., Perceptual Organization. Hillsdale, N.J.: Lawrence Erlbaum Associates. Graham, N., and Nachmias, J. 1971 Detection of grating patterns containing two spatial frequencies: a comparison of single channel and multiple-channel models. Vision Research 11:251-259. Green, B. 1961 Figure coherence in the kinetic depth effect. Journal of Experimental Psychology 62:272-282. Gregory, ILL. 1970 The Intelligent Eye. London: Weidenfeld. Gross, C.G., and Mishkin, M. 1977 The neural basis of stimulus equivalence across retinal translation. Pp. 109-122 in S. Harnad et al., eds., Lateralization in the Nervous System New York: Academic Press. Haber, R.N. 1983 Stimulus information and processing mechanisms in visual space perception. In J. Beck, B. Hope, and A. Rosenfeld, eds., Human and Machine Vision. New York: Academic Press. Haber, R.N., and Wilkinson, L. 1982 The perceptual components of computer graphic displays. Computer Graphics and Applications 2(3):23-25. Harris, C.S., ed. 1980 Visual Coding and Adaptability. Hillsdale, N.J.: Lawrence Erlbaum Associates. Hartline, H.K. 1949 Inhibition of activity of visual receptors by illuminating nearby retinal elements in the Limulus eye. Federation Proceedings 8:69. Hebb, D.O. 1949 The Organization of Behavior. New York: John Wiley and Sons. Helmholtz, H.L.F., von 1866/ Treatise on Physiological Optics. Vols. ii and iii (translated from the 3rd German edi 1911 tion, 1909-1911). J.P.C. Southall, ed. and trans. Rochester, N.Y.: Optical Society of America. Hering, E. 1878/ Outlines of a Theory of the Light Sense (originally published in 1878). L. Hurvich and 1964 D. Jameson, trans. Cambridge: Harvard University Press. Hobbes, T. 1651/ Human Nature (originally published in 1651). In W. Dennis, ea., Readings in the 1948 History of Psychology. New York: Appleton-Century-Crofts.

294 Hochberg, J. 1956 1962 JUl lady HOCHBERG Perception: toward the recovery of a definition. Psychological Review 63:400-405. The psychophysics of pictoral perception. Audio-Visual Communication Review 10:22 54. 1968 In the mind's eye. In R.N. Haber, ea., Contemporary Theory and Research in Visual Perception. New York: Appleton-Century-Crofts. 1978a Motion Pictures of Mental Structures. Presidential address to the Eastern Psychological Association. Washington, D.C., April. 1978b Perception. Englewood Cliffs, N.J.: Prentice-Hall. 1981 1982 1984a 1984b Levels of perceptual organization. In M. Kubovy and J. Pomerantz, eds., Perceptual Organization. Hillsdale, N.J.: Lawrence Erlbaum Associates. How big is a stimulus? In J. Beck, ea., Organization and Representation in Perception. Hillsdale, N.J.: Lawrence Erlbaum Associates. Form perception: experience and explanations. In P.C. Dodwell and T. Caelli, eds., Figural Synthesis. Hillsdale, N.J.: Lawrence Erlbaum Associates. Visual Worlds in Collision: Invariances and Premises, Theories versus Facts. Presi- dential address, Division of Experimental Psychology, annual meeting of the American Psychological Association, Toronto. Hochberg, J., and Brooks, V. 1960 The psychophysics of form: reversible-perspective drawings of spatial objects. Amer- ican Journal of Psychology 73:337-354. Hochberg, J., and McAlister, E. 1953 A quantitative approach to figural "goodness." Journal of Experimental Psychology 46:361-364. Hochberg, J., and Spiron, J. 1985 The Ames window: unveridical "direct perception" and not perceptual inference? Proceedings and Abstracts of the Annual Meeting of the Eastern Psychological As- sociation 56:38. Hochberg, J., Amira, L., and Peterson, M. 1984 Extensions of the Schwartz/Sperling phenomenon: invariance under transformation fails in the perception of objects' moving pictures. Proceedings and Abstracts of the Annual Meeting of the Eastern Psychological Association 55:17 (Abstract). van Hornbostel, E.M. .. 1922 Uber optische inverson. Psych~logische Forschung, 1:130-156. Hubel, D.H., and Wiesel, T.N. 1962 Receptive fields, binocular interaction, and functional architecture in the cat's visual cortex. Journal of Physiology 160:106-154. 1968 Receptive fields and functional architecture of the monkey cortex. Journal of Physi- ology 195:215-243. Hurvich, L., and Jameson, D. 1957 An opponent-process theory of color vision. Psychological Review 64:384-404. 1974 Opponent processes as a model of neural organization. American Psychologist 29:88- 102. Johansson, G. 1977 Spatial constancy and motion in visual perception. In W. Epstein, ea., Stability and Constancy in Visual Perception. New York: Wiley and Sons. 1980 About Perspective Transformations and the Theory of Visual Space Perception. Upps ala Psychological Reports, No. 278. Department of Psychology, University of Uppsala, Sweden.

VISUAL PERCEPTION OF REAL AND REPRESENTED OBJECTS AND EVENTS 295 Kabrisky, M., Tallman, T., Day, C.H., and Radoy, C.M. 1970 A theory of pattern perception based on human physiology. In A.T. Welford and L. Houssiadas, eds., Contemporary Problems in Perception. London: NATO Advanced Study Institute, Taylor and Francis. Kaufman, L. 1974 Sight and Mind. New York: Oxford University Press. Kaufman, L., and Williamson, S.J. 1982 Magnetic location of cortical activity. Annals of the New York Academy of Science 388:197-213. Kelly, D.H. 1961 Visual responses to time-dependent stimuli. I. Journal of the Optical Society ofAmerica 51:422-429. Kepler, J. 1611 Dioptrice. In W. van Dyk and M. Caspar, eds., Gesammelte Werke 4:1937-1963. Augsburg, Germany: Frank. Koffl`a, K. 1935 Principles of Gestalt Psychology. New York: Harcourt, Brace. Kohler, W. 1929 Kolers, P. 1972 Aspects of Motion Perception. New York: Pergamon. Kopfermann, H. 1935 Psychologische Untersuchungen uber die Wirkung zweidimensionaler Darstellungen korperlicher Gebilde. Psychologische Forschung 13:293-364. Gestalt Psychology. New York: Liveright. Korte, A. 1915 Kinematoskopische Untersuchungen. Zeitschrift fur Psychologie 72:194-296. Kosslyn, S.M. 1980 Image and Mind. Cambridge: Harvard University Press. Leeuwenberg, E.L.J. 1971 A perceptual coding language for visual and auditory patterns. American Journal of Psychology 84:307-349. Mach, E. 1886/ 1959 Marr, D. 1982 Marr, D., and Poggio, T. 1979 A computational theory of human stereo vision. Proceedings of the Royal Society of London, b204, 302-328. McConkie, G.W., and Rayner, K. 1975 The span of effective stimulus during a fixation in reading. Perception and Psycho- physics 17:578-586. Metzger, W. 1934 Tiefenerscheinungen in optischen Bewegungsfeldern. Psychologische Forschung 20:195- 260. The Analysis of Sensations and the Relation of the Physical to the Psychical (trans lated by S. Waterlow from the 5th German edition, 1886). New York: Dover. Vision. San Francisco: Freeman. Mill, J. 1965 Analysis of the phenomena of the human mind. In R.J. Herrnstein and E.G. Boring, eds., A Source Book in the History of Psychology. Cambndge, Mass.: Harvard Uni versity Press.

296 JULIAN HOCHBERG Miller, G.A., Galanter, E., and Pribram, K. 1960 Plans and the Structure of Behavior. New York: Holt, Rinehart and Winston. Minsky, M. 1975 A framework for representing knowledge. In P.H. Winston, ea., The Psychology of Computer Vision. New York: McGraw-Hill. Minsky, M., and Papert, S. 1969 Perceptrons. Cambridge: MIT Press. Movshon, J.A., Thompson, I.D., and Tollhurst, D.J. 1978 Spatial and temporal contrast sensitivity of neurones in areas 17 and 18 of the cat's visual cortex. Journal of Physiology 283:101-120. Mueller, J. 1838/ Handbuch der Physiologie des Menschen, bks. V and VI. Coblenz, 1838 and 1840. 1965 Translated in 1848 by W. Baly and excerpted in R.J. Herrnstein and E.G. Boring, eds., A Source Book in the History of Psychology. Cambridge: Harvard University Press. Newton, I. 1672/ A new theory of light and colors. Philosophical Transactions of the Royal Society. Re I948 printed in W. Dennis, ea., Readings in the History of Psychology. New York: Ap pleton-Century-Crofts . Oately, K. 1978 Perceptions and Representations. New York: Free Press. Pantle, A., and Sekuler, R. 1968 Size-deteeting mechanism in human vision. Science 162:1146-1148. Penrose, L., and Penrose, R. 1958 Impossible objects: a special type of visual illusion. British Journal of Psychology 49:31-33. Perrett, D.I., Rolls, E.T., and Caan, W. 1982 Visual neurones responsive to faces in the monkey inferotemporal cortex. Experimental Brain Research 47:329-342. Peterson, M.A., and Hoehberg, J. 1983 Opposed-set measurement procedure: a quantitative analysis of the role of local eues and intention in form perception. Journal of Experimental Psychology: Human Per- ception and Performance 9:183-193. Rashevsky, N. 1948 Mathematical Biophysics. Chieago: University of Chieago Press. Ratliff, F. 1965 Mach Ba~uls: Quantitative Studies on Neural Networks in the Retina. San Franeiseo: Holden-Day. Reite, M., and Zimmerman, J. 1978 Magnetie phenomena of the central nervous system. Annual Review of Biophysics and Bioengineering 7:167-188. Restle, F. 1979 Coding theory of the perception of motion configurations. Psychological Review 86: 1 24. Roberts, L.G. 1965 Maehine perception of three-dimensional solids. In J.T. Tippett et al., eds., Optical and Electro-Optical Information Processing. Cambridge: MIT Press. Roek, I. 1977 In defense of unconscious inference. In W. Epstein, ea., Stability and Constancy in Visual Perception. New York: John Wiley and Sons. 1983 The Logic of Perception. Cambridge: MIT Press.

VISUAL PERCEPTION OF REAL AND REPRESENTED OBJECTS AND EVENTS 297 Rosenblatt, F. 1962 Principles of Neurodynamics. New York: Spartan Books. Schade, O.H. 1956 Optical and photoelectric analog of the eye. Journal of the Optical Society of America 46:721-739. Schuck, J., and Leahy, W.R. 1966 A comparison of verbal and non-verbal reports of fragmenting visual images. Per- ception and Psychophysics 1:191-192. Schwartz, B.J., and Sperling, G. 1983 Luminance controls the perceived 3-D structure of dynamic 2-D displays. Bulletin of the Psychonomic Society 21(6):456-458. Selfridge, O.G. 1959 Pandemonium: a paradigm for learning. In The Mechanization of Thought Processes. London: H.M. Stationery Office. Shepard, R.N. 1981 Psychophysical complementarily. In M. Kubovy and J.R. Pomerantz, eds., Perceptual Organization. Hillsdale, N.J.: Lawrence Erlbaum Associates. Shepard, R.N., and Cooper, L. 1982 Mental Images and Their Transformations. Cambridge: MIT-Bradford Books. Shepard, R.N., and Metzler, J. 1971 Mental rotation of three-dimensional objects. Science 171:701-703. Sperling, G. 1970 Binocular vision: a physical and a neural theory. American Journal of Psychology 83:461-534. Stevens, K.A. 1983 The visual interpretation of surface contours. Artificial Intelligence 17:47-73. Sutton, S., Braden, M., Zubin, J., and John, E.R. 1965 Evoked potential correlates of stimulus uncertainty. Science 150:1187-1188. Svaetichin, G. 1956 Spectral response curves from single cones. Acta Physiologica Scandinavica 39(Suppl. 134):17-46. Todd, J. 1982 Visual information about rigid and nonrigid motion: a geometric analysis. Journal of Experimental Psychology: Human Perception and Performance 8:238-252. Todd, J.T., and Mingola, E. 1983 Perception of surface curvature and direction of illumination from patterns of shading. Journal of Experimental Psychology: Human Perception and Performance 9:583-595. Tolman, E.C. 1938 Schematic "sowbug" and discrimination learning. Psychological Bulletin 35:524. Ullman, S. 1979 The Interpretation of Visual Motion. Cambridge: MIT Press. Wallach, H. 1948 Brightness constancy and the nature of achromatic colors. Journal of Experimental Psychology 38:310-324. Wallach, H., and O'Connell, D.N. 1953 The kinetic depth effect. Journal of Experimental Psychology 38:310-324. Watson, J.B. 1913 Psychology as the behaviorist views it. Psychological Review 20:158-177. Wheatstone, C. 1839 On some remarkable and hitherto unobserved phenomena of binocular vision. Part 2. Philosophical Magazine 4:504-523.

Ago ~e, BE ~ ~s=, G. 19e Icky of ~=s~g ~ get of emend go tic apt as pl~s. WHIZ ~Z f~- e loll. Watt, A. 1911 On ~ ~c1 ~ son avert. Bh~k FIZZ ~^- ~S~- P~, 1. act, R.S. 1938 -~_Z ant. New Yak: Hok, ~ ~ West. @

Behavioral and Social Science: 50 Years of Discovery (1986)

Chapter: Visual Perception of Real and Represented Objects and Events

Welcome to OpenBook!

Get Email Updates