Read "Advances in the Modularity of Vision: Selections From a Symposium on Frontiers of Visual Science" at NAP.edu

« Previous: Areas and Modules in Visual Cortex

Page 39 Cite

Suggested Citation:"Visual Coding of Features and Objects: Some Evidence from Behavioral Studies." National Research Council. 1990. Advances in the Modularity of Vision: Selections From a Symposium on Frontiers of Visual Science. Washington, DC: The National Academies Press. doi: 10.17226/9557.

Page 40 Cite

Page 41 Cite

Page 42 Cite

Page 43 Cite

Page 44 Cite

Page 45 Cite

Page 46 Cite

Page 47 Cite

Page 48 Cite

Page 49 Cite

Page 50 Cite

Page 51 Cite

Page 52 Cite

Page 53 Cite

Page 54 Cite

Page 55 Cite

Page 56 Cite

Page 57 Cite

Page 58 Cite

Page 59 Cite

Page 60 Cite

Page 61 Cite

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Visual Coding of Features and Objects: Some Evidence from Behavioral Studies ANNE TRETSMAN I am going to tank this afternoon about some particular aspects of perception that I have been exploring using behavioral tasks rather than brain studies. The question I will discuss is what can we find out about the early stages of visual processing by using purely behavioral data. Like many other psychologists, we compare response latencies and error rates in different visual tasks. From these, we obtain a measure of relative difficulty and some indication of which operations are carried out parallel and which sequentially. We infer the use of different operations from increases or decreases in total response times as we either complicate or simplify the task, and we look at different kinds of errors that may suggest ways in which the system breaks down. No one result will ever provide compelling support for a hypothesis, so we try to marshal as much converging evidence as we can to support the same underlying hypothetical mechanism. If we get consistent results, we gain confidence that our theory is on the right approach. One immediate observation is that perception feels effortless and auto- matic. The minute we open our eyes, we seem to be aware of an organized scene containing meaningful objects. We are not normally conscious of color patches, movements, edges, and textures that we then assemble, ob- ject by object. It might be the case, however, that this apparently effortless achievement is actually the result of complex preprocessing stages, involving many operations to which we have no conscious access. In fact, the ease of introspection seems to be inversely related to the order of processing, at least from what we can infer. That makes sense, since what we need to react to are tigers, footballs, or motor cars, not color patches. If there are extensive preprocessing operations, we need to probe them through indirect behavioral evidence; we cannot expect people to 39

40 A O ~ ~ ~ ~ ANNE TREISMAN B ~ ,~ ~ ~ ~ O0~: :~000 0~A 0~0 O ~ ~ ~ ~ O O ~ O /\ FIGURE 1 Salient boundary between groups defined lay shape or lay color. (Stnped areas should be green and white areas should be red.) Source: Adapted from Beck (1966~. introspect. One approach is to ask what functions need to be carried out early in the task of perceiving the real world, and then see which factors make those tasks easy or difficult. EARLY GROUPING OF PERCEPTUAL ELEMENTS Certainly, an early step must be to locate and define the boundaries of what might be candidate objects. We need to group areas that are likely to belong together and to separate the scene into potential objects to be identified. One approach, then, would be to ask what kinds of discrimination mediate the early grouping phenomena. A long time ago, the Gestalt psychologists suggested a number of different principles that seem to be important in understanding this process. Elements are grouped by proximity, by similarity, by common movement, and by good continuation. Now it turns out that those are all good guides to what might be parts of the same object. If you see a cow behind a tree, its front and rear are likely to be the same color, they are likely to move together, and so on. But, maybe we could say a little more about what kinds of similarity are important in mediating grouping. Here we find a fairly sharp dichotomy: differences in simple aspects of shapes, like curved or straight lines and edges, will produce a good boundary between groups of elements; so will differences in colors and in brightness. In both cases (in Figure 1) the division down the middle is immediately salient. But if we ask people to find a boundary between green circles and red triangles on one side and red circles and green triangles on the other side (see Figure 2) they find it much more difficult. Similarly, we can look at the arrangements of parts of shapes. Figure 3 is taken from Beck (1966), who showed that we get a very good boundary

VISUAL CODING OF F~~5 ED OBJECTS 41 O O O ~ O O FIGURE 2 Poor segregation between groups defined by conjunction of color and shape. between elements, defined by their orientation. So As and tilted T's segregate well, but As and Lo with the same horizontal and vertical lines in different spatial arrangements do not. The finding is interesting because, as Beck showed, similarity judgments go the other way: if you show somebody a tilted T and a normal T and get them to rate how similar they are, then show them a T and an L, they will say that the "T" and the tilted T are more similar than the T and the L. For the earlier preattentive level of processing, however, grouping is based on different principles from those that mediate consciously judged similarity for single attended figures. Segregation and boundary formation offer one possible diagnostic for what happens early in visual processing. They suggest that simple properties like straight versus curved, tilted versus vertical, and color and brightness, all of which mediate good grouping, are likely to be distinguished early and in parallel across the visual field. But if we have to put parts or properties together to define a boundary, then we are not so good at it. The visual system just does not work that way. EXPECTANCY AND ATTENTION What else might we look at? Another possible diagnostic that might indicate early processing would be independence from central control, from voluntary decisions, expectancy, and attention. We can look to see what kinds of things are spontaneously salient or "pop out" of a display what catches our attention when we look at a scene with a single black sheep among hundreds of white ones, for example. In visual search tasks, we ask subjects to find a target in displays in which it differs either in color, in

42 ANNE TREIS~N A B ,~ TT OTT // /~/ ~ . . , I l Am To FIGURE 3 Good segregation between groups differing in line orientation, but not between groups differing only in line arrangement. Source: Adapted from Beck, 1966.

P75UAL CODING OF FEA17JRES AND OBJECTS 43 line orientation, or in size. Targets defined by simple features are available immediately and effortlessly. Can we say any more than just that feature detection tasks are easy? we can bring in another argument about the probable function of early visual processing, independent of attention: We would expect it to be spatially parallel. If the goal of early visual stages is to establish figure-ground relations and to monitor the field for any salient stimuli, there would be an advantage to doing it across the whole scene at once, rather than relying on a sequential scan. This allows us to make a prediction about the effect of varying the number of items in the field. We can ask subjects to find a target when there is only 1 item in the display, or when there are 6 or when there are 60. If the target can be found at an early level of visual processing, at which detection is spatially parallel, we would expect search times to be independent of the number of items in the display. That is in fact what we find, for quite a number of different kinds of stimuli. A target that is green against a background of not green, or filled against open stimuli, or a bullseye pattern against circles with dots outside the boundary (Figure 4) will be found without attention or effort. Latencies to detect these targets show no effect of added nontarget items (distractors). Performance seems to reflect spatially parallel processing; these targets show what I will call a pop-out effect. The search diagnostic may throw more light on the early stages of processing if we look at the effects of varying the background stimuli (the distractors). We can make the distracters vary in size, orientation, gap vs. completion and so on and see whether this makes a target defined by color any harder to find. Similarly, we can vary background colors and other features in a task requiring search for targets defined by orientation. We have found that background heterogeneity has little or no effect on search, provided that the variation is only on irrelevant dimensions and not on the relevant dimension that defines the target (lleisman, 1988~. The apparent independence of visual processing on each of these separate dimensions suggests a modular organization. The idea is that there may be a number of relatively independent modules, each of which computes its own property, one specializing in color, one in orientation, one in stereoscopic depth, one in motion, and so on. These modules need not necessarily be anatomically separate, although some specialization into different anatomical channels has been described (Livingstone and Hubel, 1988; Van Essen, 1985~; but I am suggesting they may be functionally separate. If features are analyzed in functionally separate, specialized modules, we might make the converse prediction about heterogeneity when we vary the nature of the target. In this case, it should be important to know that you are looking for a target that is blue rather than large or horizontal. You

44 in c - 700 a 600 . _ at: 500 as w an 400 ANNE TREISA[4N ~ Color T rget: - Green ---- Not Green , , b Filled / ut! ine (red or black) c Inside/ Outside O. Present ° Absent , . . . . . . . . 1 6 12 1 12 1 6 12 Number of Items in Display FIGURE 4 Easy detection (pop~ut) of targets with a unique feature not shared By the nontargets. can then check just the appropriate module for evidence of its presence. In an experiment to test this prediction, we compared how fast subjects could detect a blue target or a larger target or a horizontal target when they did not know whether it would be blue or large or horizontal, and when they did they did know which it would be. The target always appeared against a background of small green vertical bars. The results suggest that checking several different properties takes longer than checking a single property. Although search remains spatially parallel, the latency to detect the target was greater when its nature was not specified in advance, as if subjects checked separately within each of the different modules until they found it. LOCALIZATION So far, I have given you some evidence for two kinds of information that is available from these early feature modules, if they exist. The first is the presence of global discontinuities or boundaries dividing one area from another. The second is the presence of a unique item in a display. Do these early representations contain any precise information about where things are, that is, about their localization?

P75UAL CODING OF FEATURES AND OBJECTS 45 Suppose we set up a display that has a locally unique item, for example a red circle amongst some green ones, or an X amongst O's; the unique item is very salient: it pops out. Suppose now we embed the group in which the target is locally unique in a larger display that has the same locally unique property present elsewhere. Figure 5 illustrates the more complex display. The locally unique item is now much harder to find when its defining feature is present elsewhere in the display, even though it may be some distance away (lleisman, 1982~. The difficulty is not due simply to the larger or more complex display, because, if the target is unique not just locally but also in the whole display, it remains about as easy to find in the larger display as in the smaller local context. What is going on here? It seems as if we can hide an object percep- tually. Just by embedding an item in a display that has its locally unique property elsewhere, we can make it preattentively invisible. This suggests that the early representation automatically makes available some kind of pooled response that tells you, "Yes, there is some red there," or "Yes, there is a diagonal line." But the same process cannot tell you where the red item or the diagonal are located. What must the visual system do to locate the item? Performance in tasks that force subjects to create a unique identity for an item defined only by a conjunction of properties may give us some clue. We can, for example, look at a task in which subjects search for a green T amongst other green shapes mixed with other colored As (~eisman and Gelade, 1980~. As Figure 6 illustrates, the search time for this type of conjunction target increases linearly as a function of the number of the distracters in the display. This pattern of performance suggests that each item was serially checked, adding about 60 milliseconds for each extra nontarget item that had to be rejected. If the target was present in the display, it would be found on average halfway through. It looks like the kind of pattern you would get if you were focusing attention on each item in turn and stopping when you found the target. I should mention at this point that Nakayama (1988) has found some versions of search for conjunction targets that give faster search latencies than the ones I have reported, although none of them are completely flat. If the features whose conjunction defines the targets are highly discrim- inable, search can be considerably faster than 60 ms per item. I have confirmed that there are, in fact, clear differences in difficulty between dif- ferent conjunctions of the same four dimensions. 1b test this, I presented displays containing bars in highly discriminable colors (pink and green), highly discriminable orientations (45 degrees left and right), moving in highly discriminable directions (up-down oscillation versus left-right), and in highly discriminable sizes (ratio of 1.8 to 1~. Figure 7 presents the search latencies. Conjunctions of color and size are found very quickly, whereas

46 a b ANNE TREISMAN /\/\/\ · - ~ A/\/\ /\ /\ /\ · · · /\ /\ /\ /\ /\ /\ · · · /\ /\ /\ ·~. /\/\/\ ·~. .~. /\~/\ a ·~. /\/\/\ 'aim /\ /\ /\ · · · /\ /\ /\ /\/\/\ O.. /\/\^ /\ /\^ · · · /\ /\ /\ ' C ·~e ·~- O.. ·~- ·~. ·~- FIGURE 5 (a) A locally unique item is hard to find when items elsewhere share its locally unique property. (b) and (c) When the property is not present elsewhere in the display, the targets become salient. conjunctions of motion and orientation are quite slow; the other conjunc- tions are intermediate between them. What is intriguing is that these findings do not seem to link very closely to what is known so far about the physiological and anatomical segregation. Many single units respond

P7SUAL CODING OF FEATURES AND OBJECTS Search time 2000 1600 1 200 800 400 47 Conj unc t ion Search L / / Jr Negative - ~ Positive s 15 30 Nllmber of items in display FIGURE 6 Search times for a conjunction target (a amen T among green H's and brown less. Both functions increase linearly with the number of items in the display and the slope for the positives (target present) is about half the slope for the negative teals (target absent). to combinations of size or spatial frequency or motion with orientation, whereas color and motion seem to be segregated into different pathways. Yet color-motion conjunctions are relatively easy to find, and conjunctions with orientation are difficult. What seems to happen, according to both Ken Nakayama and me, is that subjects get very good segregation between the two sets of distracters when their features are as discriminable as these. It seems possible to attend, for example, to the items that are moving up and down, even

48 1 700 1SOO ~ 1300 E 1100- E 900 F 0 700 300 0- 1700 1SOO' 1~ ~ 1 t 00 a ~ 900 0 700 ~ , 02 1 SOO ~ E 1100 I= 1300- 900 700 500- ANNE TREISMAN MC :~ MS MO SO CS M O C S 0t 4. - ~ ~= ~ ~~ 16 ~ 9 16 ~ 9 Display Size . For · · Conjunction 1B ~ 9 16 present __ rant FIGURE 7 Search times for each conjunction of color, size, motion, and orientation and for each feature on its own. M = motion; C = color; S = size; 0 = orientation.

P75UAL CODING OF FEATURES AND OBJECTS 49 though they are interspersed with items moving left and right. Eke, for example, a display containing a green target moving up and down among green distracters moving left and right and red distracters moving up and down. Perhaps subjects can reject all distracters that are moving left and right (for example) without conjoining their features. Any remaining green item must be moving up and down and must therefore be the target. ROLE OF ATTENTION To get some further evidence for the idea that attention is involved in conjoining features, we have tried a number of different tasks. Perhaps the most dramatic result came when we prevented subjects from focusing attention on each item in turn (lieisman and Schmidt, 1982~. We showed them brief displays with more items than they could attend to. For example, the display shown in Figure 8 might be flashed up briefly (for about 2~)0 msec) and the subjects would be asked to report first the two digits and then any colored letters they had seen, giving both the color and the letter for each item whenever possible. Their responses included a large number of illusory conjunctions, as I call them. That is, the subject put together a color and a shape in the wrong combination, for example a green T in Figure 8. They reported illusory conjunctions on about one-third of trials, which is nearly as often as they reported correct conjunctions. So, when subjects are forced to divide attention (in this case to make sure they would get the digits correct), they seem unable to conjoin the shapes and the colors correctly. In further experiments, we obtained similar illusory recombinations with parts of shapes (lieisman and Paterson, 1984~. For example, when we showed displays like those in Figure 9 and asked subjects to look for a dollar sign, they frequently reported illusory dollar signs in displays in which none was present. The illusory targets resulted from combining the diagonal lines with S's when both were present, since far fewer were reported when only the S's or the lines were present on their own. Surprisingly, subjects saw as many illusory dollar signs with the triangle displays (Figure 9c) as with the displays with separate lines (Figure 9b). This suggests that at the preattentive level, the triangle is analyzed into three separate lines. Unless these lines can receive focused attention, they seem to be free to recombine with the S's to form illusory dollar signs. An interesting finding was recently reported by Kolinsky. When she tested young children with displays of this kind, the children also saw illusory dollar signs with the separate line displays, but they did not with the triangles. Perhaps young children perceive more holistically and do not separately detect each line of the triangles at the preattentive level.

so ANNE TREISMAN 6 X~ _'- 1 1 ~ 7 FIGURE 8 Example of display that gave rise to illusory conjunctions. The filled area represents blue, the white area red, and the dotted area green. Are there any constraints on illusory conjunctions, in terms of the overall similarity of the items? There seemed to be absolutely no effect of similarity in my experiments. Subjects were just as willing to take the blue color from a small outline circle, and use it to fill a large triangle, as they were to take it from another large triangle. This, again, seems to be quite strong evidence for modularity, in the sense that the presence of the color is separable from its spatial layout. Without attention, apparently we code the separate features, such as blue, triangle, outline, but not their interrelations. 1b recap so far, I have suggested that early vision simply registers the presence of separate features in the scene. It does so within a number of separate modules that can be related to each other only once we focus attention on them. This locates the features that we are currently attending to and ties them together through their shared location. The evidence suggests at least a functional separation between a set of color maps, a set of orientation maps, a set of directions of motion, and so on. When we are involved in a visual search task with a target defined by a single feature, we can simply check: Is there activity in the red map? Is there activity in the horizontal map? We can then respond regardless of what is going on in all the other maps. The diagram in Figure 10 outlines in functional terms what might happen when attention is focused on a particular location. We suggest

V7SUAL CODING OF F~5 ID OB~CT5 51 a b , : S Set, / an/ ~ S S ' ~ ~ ,. . s /1 Son S Son \ \ A\ \ ~ , ~ \ FIGURE 9 Examples of displays used to demonstrate illusory conjunctions of parts of shapes. (a) Display containing a real target (dollar sign). (b) and (c) Displays that gave rise to approximately equal numbers of illusory dollar signs. . . that attention selects particular stimuli through a kind of master map of locations to which the different feature maps in separate modules are all connected. Attention retrieves information about the different features present in a particular restricted area of the field. When attention is focused on a particular location, it pulls out the features, for example, '`red" and "horizontal," that are currently present in that same location. In this way, the attended color and orientation are conjoined to form a single unitary perceptual object. If attention is divided over the whole area, we can know from the separate feature maps which features are present, but not how they are spatially related to each other.

52 Colour maps RED YELLOW BLU E / ' ~ ~ \ l ^: /,' A' / ~ ,, / / 1 ~ t STIMULI Recognition network Temporary Object Representation Time t Place x Stored ~ Properties Relations descnpbons of objects, with names . ~ Identity Name etc. ~ ~ . ," it; ATTENTION FIGURE 10 Schematic framework to explain the results descnbed. ANNE TREISA`4N Orientation maps Map of Locations The hypothesis seemed a little far-fetched, and we felt it would certainly be nice to get more evidence to support it. We therefore devised a couple more experiments, in which we tried to test some further predictions. In one study we asked: Is it possible to detect which feature is present, without knowing where it is? It should be, if the model I outlined is correct. When presented with brief displays of multiple objects, subjects should be able to check the map for "red" and to see whether there is activity there, without necessarily linking it to any particular location in the master map of locations. In the other experiment, we tested the prediction that the presence of a feature could be detected when its absence could not. I will come back to that experiment in a moment.

VISUAL CODING OF FEATURES AND OBJECTS a b xX XOXXOX XX~ X OOXOXX FIGURE 11 (a) Example of display used to investigate the dependence of identification on correct localization. (b) Same for conjunction identification. 'WHAT WITHOUT WHERE" 53 We did an experiment in which we asked subjects both to identify a target and to say where it was. We flashed up a display of red O's and blue X's like that in Figure lla (lleisman and Gelade, 1980~. The subject's task was to report whether there was an orange letter or an H. Each of those targets is defined by a unique feature. We were interested to see whether they sometimes got the identity correct when they got the location wrong. Is it possible to know "what" without knowing "where"? We measured the conditional probability of getting the identity correct, given that the location was wrong and found that it was quite high. On around 70 percent of the trials in which the subjects Dislocated the target by more than one square in the matrix, they were nevertheless correct in choosing whether it was orange or an H. In another condition (Figure lib) we replaced the "orange or H" feature targets by two conjunction targets. Subjects had to do the same two tasks: decide both the identity of the target and also its location. They were asked: Was there a red X or a blue O. and also where was it in the display? In this case, we found that if subjects got the location wrong, they were at chance on getting the identity of the target. The theory claims that to identify a conjunction target, you must attend to it, and therefore you will know where it is, because attention is spatially controlled. So that was

54 ANNE TRElS~ one piece of supporting evidence: it seems that we can identify features without necessarily locating them, but we cannot conjoin them correctly without also knowing where they are. When attention is overloaded, it seems that we have some free-floating feature information for which the location is indeterminate. We can know, "Yes, there is orange there, but I do not know where." Obviously, if the display remains present for long, the subject will home in on the target very quickly; but our results suggest that it is possible to cut off processing at a time at which the subject knows what the target is but not where it is. THE ABSENCE OF A FEATURE If the story is correct, then there should also be other tasks besides search for conjunction targets, that require attention. An interesting one is search for a target defined by the absence of a feature, when that feature is present in all distracters. The poppet strategy should not work here if it in fact depends on detecting activity in a feature map that is unique to the target. Suppose that we look in Figure 12a for the one circle that does not have an intersecting line. We cannot check a map for any of its featuresvertical or straight or intersecting because each of these feature maps would be swamped with activity. All the background items have the lines and the target is the only one that does not have it. However, when we look for the only circle that does have an intersecting line, as in Figure 12b, we can presumably just check the map for vertical (or whatever feature defines the line), and we will find it automatically. This is exactly what the results suggest (~eisman and Souther, 1985~. Search for the circle without the line gives fairly steep linearly increasing functions which suggest serial scanning. Search for the circle with the line gives flat functions with no effect of the number of background items. So there does seem to be a difference between "search for presence" and "search for absence." This finding is surprising because exactly the same discrimination is involved in the two tasks. We test the same pair of stimuli; it is just that one plays the role of target in one case, and of distracter in the other. FEATURE ANALYSIS AND THE ASYMMETRY OF CODING If I am right that search is parallel when the target is signalled by activity in the relevant map for a feature that is unique to the target, this might give us a diagnostic to discover what other features are analyzed early in the visual system. We cannot assume that the brain analyses visual displays in the same way as physicists might. Perceptual properties might not map directly and simply onto physical properties. We need some empirical evidence to tell us what features function as natural elements or

ss o c J %~ c ~ J o c - ~ 1 ~ o o o - c' % % - c D o % %% % %% %% % % %~ % %% % \ \ \ O ~ ~ %' I ~ ~ I ~ ~ \] O O O O %= %9 tsw) aW!1 Ci C: o ()G~ C3 C: %n ~ 3 _ ~ _ i~ - o ~ ~ J 00 S ~' o o o 3 O. o oo o o Ct - o >% ~ o > o 2 ~ CD C~ _ ~ 0~ _ s~ O C~ C) ~ ._ ') C) ~ C _ C~ Ct Ct C~ _ Ct o C~ %) ~ .= 't e,.c .= _ _ C~ C ._ C~ O `:: ~ Ct . 3 c ~ CQ ~ %) >%- ~ c s o - o cq -= · - %) s~ ~ u) - cc - ~ ~l ~ -o v

56 ANNE TREISMAN "primitives" in the language of early vision. We used the search task to look for possible asymmetries in the coding of a number of other simple properties (~eisman and Gormican, 1988~. For example, we asked subjects to find a curved line amongst straight lines or a straight line amongst curved lines and looked to see whether there was any asymmetry in the difficult of the two tasks. What could this tell us? Suppose straightness is a primitive feature, detected early in visual processing. Then its presence in the target should mediate pop-out; it should be detected in parallel, just like the added line was among circles without lines. Similarly, if curvature is a primitive feature, a single curved target line should pop out of a display of straight lines. Its presence would be signalled by the presence of activity in the map for curvature. It might also be the case that only one of these two features is coded positively, as the presence of activity, while the other is coded simply as the absence of its opposite. In fact, we found a very large asymmetry that was clearest when the lines and curves were least discriminable (see Figure 13a). The asymmetry suggests that the curved line functions as a feature in the visual system, while the straight line does not. It is as if we code curvature as the presence of something, and we code straightness by default, as the absence of curvature. If we take seriously the analogy to the circle and line experiment, curvature may be coded as the addition of a feature; a curved line, then, would be represented as a line, plus a deviation from the standard or reference value of straightness, just as the circle with an intersecting line could be represented as a basic circle with an added feature. We looked next at some other features of simple lines, for instance, orientation. Is there any asymmetry there? We can ask subjects to look for a tilted line amongst vertical lines or a vertical line amongst tilted lines. Again, we found a large asymmetry: this time it was the tilted line that was easy to find against a background of vertical lines, and gave hat functions relating latency of search to number of distracter lines. When the target was a vertical line on a background of tilted lines, search was slower and latencies increased with the number of distracter lines. Again, by analogy with the circles and lines, we might infer that the tilted line is coded as the presence of an added feature- perhaps tilt and the vertical is coded simply as the standard orientation with no added deviation. Even colors seem to show a similar pattern. Colors tend to give flat search functions unless the target and the distracter are very similar and hard to discriminate, but we did find some asymmetry in search even here. We looked at search for deviating colors like magenta and lime and turquoise against standard colors like red, green, and blue, and found faster, more parallel search than with the reverse arrangement. The colors that were harder to find as targets were the "good" colors, the red, the

57 oo o o o ° oo o o o o° ~ oo o o \l _ 1,,, -.7 u ~a V) ~ OogO " ~ C~ C\ C~ ~ c\ > o 4 ~ C) ~ l l 1 ~ C O C\ -. C C U C l I ~ 0£ a.= C~ ~ g ~0 ~ U~ =~-.'2 ~ ~ Oo ~ ,= - . ~ ~ ~: ~: c: I_ ~ _ ~ a." . - C'} ~ c ~ ~ o.e ~ - c \ - ~ ~ ·= 80 .= a 3 ~ ~ 3 .= ~ ~ _ .,. ~ - o o., - U. . - 3 c~~ ~ ~ ~ ~, a a - 0~4 'd 3 1 I ~ · _] ~ ~' o . - . ~ ,~4 C' ' v . - Cl V~ 1 ~ o o . o o o o o o o o o (SW) ~W,I ~ U3JOOS

58 ANNE TREIS~fAN green and the blue, and the ones that were easier to find were the deviating colors, magenta, lime and turquoise. The same asymmetries recur with some other properties: for instance, converging lines against parallel lines. A pair of converging lines pop out, while a pair of parallel lines in a background of converging lines are found more slowly. Similarly, a circle with a gap pops out of a display of complete circles, but not the reverse. The results of these search tasks are shown in Figure 13. We seem to have stumbled on quite a general principle of perceptual coding. Perhaps we can generalize and say that the visual system is tuned to signal departures from a normal or standard value. If this is correct, we may be able to use it to explore some even less obvious cases, such as the perceptual coding of "inside" versus "outside." Would a dot inside a closed shape be easier or harder to find than a dot outside a shape? It turns out that inside is harder to find, suggesting that this is the standard, and outside is the deviating value. The asymmetry of coding appears to be quite pervasive and may prove a useful tool to throw light on the nature of the features extracted by the visual system at the early preattentive levels. The experiments I have described so far all tested stimuli defined by luminance contrasts. It may be of interest to ask whether the same principles of coding would also extend to other media. How general and abstract is the analysis? Patrick Cavanagh (1987) has been exploring the properties of shapes defined by other kinds of boundaries; for example color boundaries at isoluminance, texture boundaries defined by motion, or by the size of the texture elements, or by stereoscopic depth. He and I have recently looked at search performance when the stimuli (bars or discs) are defined by discontinuities in these other media. For example, we can create vertical or tilted bars from stationary random dot textures against otherwise identical moving backgrounds. We can then ask subjects to look for a target bar that is tilted among vertical distracter bars, or for a vertical target bar among tilted distracters. We find results that are very similar to those obtained with bars defined by luminance (i.e., darker or lighter than the background). The same pop-out for a tilted target and serial search for a vertical target appears with bars created by color, or motion, or texture, or stereoscopic disparity. The coding language used by the visual system seems to be quite general across these different channels or media. PERCEPTION OF OBJECTS My speculations at present are that vision initially forms spatially parallel maps in functionally separate specialized modules. These modules signal the presence of positively coded features that code deviations from a standard or a norm. In order to access their locations, or to specify that

VISUAL CODING OF FEATURES AND OBJECTS 59 they are not present in any particular stimulus, or to tie them correctly to other features of the same object, we have to focus attention serially on each location in turn.The currently attended features can then be selected and entered into some temporary representation of the attended object. Once the features are assembled, their conjunction can be compared to memories, to stored descriptions in a long-term recognition network, and the appropriate identification can be made. Other research (lleisman, 1988) suggests that anomalous conjunctions that we might otherwise make in everyday life get weeded out at this comparison stage and not before. ~p-down constraints from expectations and prior knowledge seem not to influence which features are entered into each object representation; the only constraints at this level appear to come from spatial attention. Thus subjects who were expecting to see a carrot, for example, were no more likely to recombine the orange from another object with the shape of a blue carrot than they were to imagine its orange color when no other orange object was present in the display. These temporary object representations may also be important in maintaining the perceptual continuity of objects as they move and change. Once a set of features are conjoined and a perceptual unit is established, it can be updated as the object moves or changes. In some recent experiments with Daniel Kahneman and Brian Gibbs, we have found evidence that new stimulus information gets integrated with the previously perceived object that is best linked to it by spatio-temporal continuity. For example, a letter is named faster if the same letter was previously presented within the same outline shape, even when the shape has moved to a new location in the interval between the two letters (Figure 14~. The naming latency is unaffected if the same letter had appeared in a different outline shape, even though the time interval and the distance between the pairs of letters were equated. When the matching letter appeared in the same shape as the first, the motion of the frame was sufficient to link the two letters as parts of the same continuous object. If the features of an object change, we simply update the temporary representation. The perceptual unity and continuity of the object is maintained so long as the spatial-temporal parameters are consistent with the continued presence of a single object. If we were ever to see a frog turn into a fairy tale prince, we would perceive it as a single character transformed, just one perceptual entity, even though everything about it has changed its properties, its identity, its label, and so on. That continuity, we suggest, would be mediated by a single object representation. If my story is correct, we may have no introspective access to the earlier stages of processing. These object specific representations may be the basis of conscious experience. In fact, they would be our subjective windows into the mind.

60 a b ANNE TREISA~N . - E fir , -I- .., i , . dc', :::E - , , Hi. ,, 1 1 . . [a I, FIGURE 14 Example of displays used to demonstrate the integration of information in object-specific representations. (a) The two squares appear first; two lettem are briefly flashed in the squares, which then move (empty) to two new locations. (b) A single letter then appears in one of the squares, and subjects are asked to name it as quickly as possible. In this example, the latency would be about 30 milliseconds shorter than it would have been if the letter N had appeared in the left-hand square in the second display. \

P75UAL CODING OF FEATURES AND OBJECTS REFERENCES 61 Beck, J. 1966 Effects of orientation and of shape similarity on perceptual grouping. Perception and Psychophysics 1:300 Cavanagh, P. 1987 Reconstructing the third dimension: Interactions between color, texture, motion, binocular disparity, and shape. Computer Wilson, Graphics and Image Processing 37:171-195. Iivingstone, M.S., and D.H. Hubel 1987 Psychological evidence for separate channels for the perception of form, color, movement and depth. Joumal of Neuroscience 7:3416-3468. Nakayama, K 1988 The iconic bottleneck and the tenuous link between earb visual processing and perception. In ~ Blakemore, ea., Sawn: Coding and Effwien~. New York: Cambridge University Press. Talisman, A. 19B2 Perceptual grouping and attention in visual search for features and for objects. Jackal of E~al Psychology: Hump Percepti<m and Pa7~omu~ &194-214. 1985 Preattentive processing in vision. Computer Awn, Graphics, and Image Processutg 31:156-177. 1988 Features and objects: The fourteenth Bartlett Memorial Lecture. Qyuar- ter~ Joumal of E~na~l Psychology 40A:201-237. lLeisman, A., and G. Gelade 1980 A feature integration theory of attention. Cog~unve Psychology 12:97-136. talisman, A., and S. Gormican 1988 Feature analysis in early vision: PSyc*O10g)~ 1~w 95~1~: 15~8. Iteisman, A., and R. Paterson 1984 Emergent features, attention and object perception. Joumal of E~i- mez~al Psychology: Human Perception and P - Eunice 1012-31. Iteisman, A., and N. Schmidt 1982 Illusory conjunctions in the perception of ojects. Coronae Psychology 14:107-141. Iteisman, A., and J. Souther Evidence from search asymmetnes. 1985 Search asymmetry: A diagnostic for preattentive processing of separable features. Joumal of EN Psych~f~y:General 114:285-310. Van Essen, D.C 1985 Functional organization of primate visual cortex. In ~Peters, and E.G. Jones, eds., Cerebral Carted. Vol. 3, Visual Cortex. New York: Plenum Press.

Advances in the Modularity of Vision: Selections From a Symposium on Frontiers of Visual Science (1990)

Chapter: Visual Coding of Features and Objects: Some Evidence from Behavioral Studies

Welcome to OpenBook!

Get Email Updates