Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 39
Visual Coding of Features and Objects:
Some Evidence from Behavioral Studies
ANNE TRETSMAN
I am going to tank this afternoon about some particular aspects of
perception that I have been exploring using behavioral tasks rather than
brain studies. The question I will discuss is what can we find out about the
early stages of visual processing by using purely behavioral data.
Like many other psychologists, we compare response latencies and
error rates in different visual tasks. From these, we obtain a measure of
relative difficulty and some indication of which operations are carried out
parallel and which sequentially. We infer the use of different operations
from increases or decreases in total response times as we either complicate
or simplify the task, and we look at different kinds of errors that may
suggest ways in which the system breaks down. No one result will ever
provide compelling support for a hypothesis, so we try to marshal as much
converging evidence as we can to support the same underlying hypothetical
mechanism. If we get consistent results, we gain confidence that our theory
is on the right approach.
One immediate observation is that perception feels effortless and auto-
matic. The minute we open our eyes, we seem to be aware of an organized
scene containing meaningful objects. We are not normally conscious of
color patches, movements, edges, and textures that we then assemble, ob-
ject by object. It might be the case, however, that this apparently effortless
achievement is actually the result of complex preprocessing stages, involving
many operations to which we have no conscious access. In fact, the ease
of introspection seems to be inversely related to the order of processing,
at least from what we can infer. That makes sense, since what we need to
react to are tigers, footballs, or motor cars, not color patches.
If there are extensive preprocessing operations, we need to probe
them through indirect behavioral evidence; we cannot expect people to
39
OCR for page 39
40
A
O ~ ~ ~ ~
ANNE TREISMAN
B ~ ,~ ~ ~ ~
O0~: :~000
0~A 0~0
O ~ ~ ~ ~
O O ~ O /\
FIGURE 1 Salient boundary between groups defined lay shape or lay color. (Stnped areas
should be green and white areas should be red.) Source: Adapted from Beck (1966~.
introspect. One approach is to ask what functions need to be carried out
early in the task of perceiving the real world, and then see which factors
make those tasks easy or difficult.
EARLY GROUPING OF PERCEPTUAL ELEMENTS
Certainly, an early step must be to locate and define the boundaries
of what might be candidate objects. We need to group areas that are
likely to belong together and to separate the scene into potential objects
to be identified. One approach, then, would be to ask what kinds of
discrimination mediate the early grouping phenomena.
A long time ago, the Gestalt psychologists suggested a number of
different principles that seem to be important in understanding this process.
Elements are grouped by proximity, by similarity, by common movement,
and by good continuation. Now it turns out that those are all good guides
to what might be parts of the same object. If you see a cow behind a tree,
its front and rear are likely to be the same color, they are likely to move
together, and so on. But, maybe we could say a little more about what
kinds of similarity are important in mediating grouping.
Here we find a fairly sharp dichotomy: differences in simple aspects
of shapes, like curved or straight lines and edges, will produce a good
boundary between groups of elements; so will differences in colors and
in brightness. In both cases (in Figure 1) the division down the middle is
immediately salient. But if we ask people to find a boundary between green
circles and red triangles on one side and red circles and green triangles on
the other side (see Figure 2) they find it much more difficult.
Similarly, we can look at the arrangements of parts of shapes. Figure
3 is taken from Beck (1966), who showed that we get a very good boundary
OCR for page 39
VISUAL CODING OF F~~5 ED OBJECTS 41
O
O
O ~
O O
FIGURE 2 Poor segregation between groups defined by conjunction of color and shape.
between elements, defined by their orientation. So As and tilted T's
segregate well, but As and Lo with the same horizontal and vertical lines in
different spatial arrangements do not. The finding is interesting because, as
Beck showed, similarity judgments go the other way: if you show somebody
a tilted T and a normal T and get them to rate how similar they are, then
show them a T and an L, they will say that the "T" and the tilted T are
more similar than the T and the L. For the earlier preattentive level of
processing, however, grouping is based on different principles from those
that mediate consciously judged similarity for single attended figures.
Segregation and boundary formation offer one possible diagnostic for
what happens early in visual processing. They suggest that simple properties
like straight versus curved, tilted versus vertical, and color and brightness,
all of which mediate good grouping, are likely to be distinguished early and
in parallel across the visual field. But if we have to put parts or properties
together to define a boundary, then we are not so good at it. The visual
system just does not work that way.
EXPECTANCY AND ATTENTION
What else might we look at? Another possible diagnostic that might
indicate early processing would be independence from central control, from
voluntary decisions, expectancy, and attention. We can look to see what
kinds of things are spontaneously salient or "pop out" of a display what
catches our attention when we look at a scene with a single black sheep
among hundreds of white ones, for example. In visual search tasks, we ask
subjects to find a target in displays in which it differs either in color, in
OCR for page 39
42
ANNE TREIS~N
A
B
,~
TT OTT // /~/
~ . . , I
l
Am To
FIGURE 3 Good segregation between groups differing in line orientation, but not between
groups differing only in line arrangement. Source: Adapted from Beck, 1966.
OCR for page 39
P75UAL CODING OF FEA17JRES AND OBJECTS
43
line orientation, or in size. Targets defined by simple features are available
immediately and effortlessly.
Can we say any more than just that feature detection tasks are easy? we
can bring in another argument about the probable function of early visual
processing, independent of attention: We would expect it to be spatially
parallel. If the goal of early visual stages is to establish figure-ground
relations and to monitor the field for any salient stimuli, there would be an
advantage to doing it across the whole scene at once, rather than relying
on a sequential scan. This allows us to make a prediction about the effect
of varying the number of items in the field. We can ask subjects to find a
target when there is only 1 item in the display, or when there are 6 or when
there are 60. If the target can be found at an early level of visual processing,
at which detection is spatially parallel, we would expect search times to be
independent of the number of items in the display. That is in fact what
we find, for quite a number of different kinds of stimuli. A target that is
green against a background of not green, or filled against open stimuli, or
a bullseye pattern against circles with dots outside the boundary (Figure 4)
will be found without attention or effort. Latencies to detect these targets
show no effect of added nontarget items (distractors). Performance seems
to reflect spatially parallel processing; these targets show what I will call a
pop-out effect.
The search diagnostic may throw more light on the early stages of
processing if we look at the effects of varying the background stimuli (the
distractors). We can make the distracters vary in size, orientation, gap vs.
completion and so on and see whether this makes a target defined by color
any harder to find. Similarly, we can vary background colors and other
features in a task requiring search for targets defined by orientation. We
have found that background heterogeneity has little or no effect on search,
provided that the variation is only on irrelevant dimensions and not on the
relevant dimension that defines the target (lleisman, 1988~.
The apparent independence of visual processing on each of these
separate dimensions suggests a modular organization. The idea is that
there may be a number of relatively independent modules, each of which
computes its own property, one specializing in color, one in orientation,
one in stereoscopic depth, one in motion, and so on. These modules need
not necessarily be anatomically separate, although some specialization into
different anatomical channels has been described (Livingstone and Hubel,
1988; Van Essen, 1985~; but I am suggesting they may be functionally
separate.
If features are analyzed in functionally separate, specialized modules,
we might make the converse prediction about heterogeneity when we vary
the nature of the target. In this case, it should be important to know that
you are looking for a target that is blue rather than large or horizontal. You
OCR for page 39
44
in
c
-
700
a 600
. _
at: 500
as
w
an
400
ANNE TREISA[4N
~ Color
T rget:
- Green
---- Not Green
, ,
b Filled / ut! ine
(red or black)
c Inside/ Outside
O.
Present
° Absent
, . . . . . . . .
1 6 12 1
12 1 6 12
Number of Items in Display
FIGURE 4 Easy detection (pop~ut) of targets with a unique feature not shared By the
nontargets.
can then check just the appropriate module for evidence of its presence. In
an experiment to test this prediction, we compared how fast subjects could
detect a blue target or a larger target or a horizontal target when they did
not know whether it would be blue or large or horizontal, and when they
did they did know which it would be. The target always appeared against a
background of small green vertical bars. The results suggest that checking
several different properties takes longer than checking a single property.
Although search remains spatially parallel, the latency to detect the target
was greater when its nature was not specified in advance, as if subjects
checked separately within each of the different modules until they found it.
LOCALIZATION
So far, I have given you some evidence for two kinds of information
that is available from these early feature modules, if they exist. The first is
the presence of global discontinuities or boundaries dividing one area from
another. The second is the presence of a unique item in a display. Do
these early representations contain any precise information about where
things are, that is, about their localization?
OCR for page 39
P75UAL CODING OF FEATURES AND OBJECTS
45
Suppose we set up a display that has a locally unique item, for example
a red circle amongst some green ones, or an X amongst O's; the unique
item is very salient: it pops out. Suppose now we embed the group in
which the target is locally unique in a larger display that has the same
locally unique property present elsewhere. Figure 5 illustrates the more
complex display. The locally unique item is now much harder to find when
its defining feature is present elsewhere in the display, even though it may
be some distance away (lleisman, 1982~. The difficulty is not due simply
to the larger or more complex display, because, if the target is unique not
just locally but also in the whole display, it remains about as easy to find in
the larger display as in the smaller local context.
What is going on here? It seems as if we can hide an object percep-
tually. Just by embedding an item in a display that has its locally unique
property elsewhere, we can make it preattentively invisible. This suggests
that the early representation automatically makes available some kind of
pooled response that tells you, "Yes, there is some red there," or "Yes,
there is a diagonal line." But the same process cannot tell you where the
red item or the diagonal are located.
What must the visual system do to locate the item? Performance in
tasks that force subjects to create a unique identity for an item defined
only by a conjunction of properties may give us some clue. We can, for
example, look at a task in which subjects search for a green T amongst
other green shapes mixed with other colored As (~eisman and Gelade,
1980~. As Figure 6 illustrates, the search time for this type of conjunction
target increases linearly as a function of the number of the distracters in the
display. This pattern of performance suggests that each item was serially
checked, adding about 60 milliseconds for each extra nontarget item that
had to be rejected. If the target was present in the display, it would be
found on average halfway through. It looks like the kind of pattern you
would get if you were focusing attention on each item in turn and stopping
when you found the target.
I should mention at this point that Nakayama (1988) has found some
versions of search for conjunction targets that give faster search latencies
than the ones I have reported, although none of them are completely flat.
If the features whose conjunction defines the targets are highly discrim-
inable, search can be considerably faster than 60 ms per item. I have
confirmed that there are, in fact, clear differences in difficulty between dif-
ferent conjunctions of the same four dimensions. 1b test this, I presented
displays containing bars in highly discriminable colors (pink and green),
highly discriminable orientations (45 degrees left and right), moving in
highly discriminable directions (up-down oscillation versus left-right), and
in highly discriminable sizes (ratio of 1.8 to 1~. Figure 7 presents the search
latencies. Conjunctions of color and size are found very quickly, whereas
OCR for page 39
46
a
b
ANNE TREISMAN
/\/\/\ · - ~ A/\/\
/\ /\ /\ · · · /\ /\ /\
/\ /\ /\ · · · /\ /\ /\
·~. /\/\/\ ·~.
.~. /\~/\ a
·~. /\/\/\ 'aim
/\ /\ /\ · · · /\ /\ /\
/\/\/\ O.. /\/\^
/\ /\^ · · · /\ /\ /\
' C
·~e ·~-
O.. ·~-
·~. ·~-
FIGURE 5 (a) A locally unique item is hard to find when items elsewhere share its locally
unique property. (b) and (c) When the property is not present elsewhere in the display,
the targets become salient.
conjunctions of motion and orientation are quite slow; the other conjunc-
tions are intermediate between them. What is intriguing is that these
findings do not seem to link very closely to what is known so far about
the physiological and anatomical segregation. Many single units respond
OCR for page 39
P7SUAL CODING OF FEATURES AND OBJECTS
Search
time
2000
1600
1 200
800
400
47
Conj unc t ion
Search
L
/
/
Jr Negative
-
~ Positive
s
15
30
Nllmber of items in display
FIGURE 6 Search times for a conjunction target (a amen T among green H's and brown
less. Both functions increase linearly with the number of items in the display and the
slope for the positives (target present) is about half the slope for the negative teals (target
absent).
to combinations of size or spatial frequency or motion with orientation,
whereas color and motion seem to be segregated into different pathways.
Yet color-motion conjunctions are relatively easy to find, and conjunctions
with orientation are difficult.
What seems to happen, according to both Ken Nakayama and me, is
that subjects get very good segregation between the two sets of distracters
when their features are as discriminable as these. It seems possible to
attend, for example, to the items that are moving up and down, even
OCR for page 39
48
1 700
1SOO
~ 1300
E 1100-
E 900
F
0 700
300
0-
1700
1SOO'
1~
~ 1 t 00
a
~ 900
0 700
~ ,
02
1 SOO ~
E 1100
I=
1300-
900
700
500-
ANNE TREISMAN
MC
:~
MS
MO
SO
CS
M O C S
0t 4. - ~
~= —~ ~~
16 ~ 9 16 ~ 9
Display Size
. For · · Conjunction
1B ~ 9 16
present
__ rant
FIGURE 7 Search times for each conjunction of color, size, motion, and orientation and
for each feature on its own. M = motion; C = color; S = size; 0 = orientation.
OCR for page 39
P75UAL CODING OF FEATURES AND OBJECTS
49
though they are interspersed with items moving left and right. Eke, for
example, a display containing a green target moving up and down among
green distracters moving left and right and red distracters moving up and
down. Perhaps subjects can reject all distracters that are moving left and
right (for example) without conjoining their features. Any remaining green
item must be moving up and down and must therefore be the target.
ROLE OF ATTENTION
To get some further evidence for the idea that attention is involved
in conjoining features, we have tried a number of different tasks. Perhaps
the most dramatic result came when we prevented subjects from focusing
attention on each item in turn (lieisman and Schmidt, 1982~. We showed
them brief displays with more items than they could attend to. For example,
the display shown in Figure 8 might be flashed up briefly (for about 2~)0
msec) and the subjects would be asked to report first the two digits and
then any colored letters they had seen, giving both the color and the letter
for each item whenever possible. Their responses included a large number
of illusory conjunctions, as I call them. That is, the subject put together
a color and a shape in the wrong combination, for example a green T in
Figure 8. They reported illusory conjunctions on about one-third of trials,
which is nearly as often as they reported correct conjunctions. So, when
subjects are forced to divide attention (in this case to make sure they would
get the digits correct), they seem unable to conjoin the shapes and the
colors correctly.
In further experiments, we obtained similar illusory recombinations
with parts of shapes (lieisman and Paterson, 1984~. For example, when we
showed displays like those in Figure 9 and asked subjects to look for a dollar
sign, they frequently reported illusory dollar signs in displays in which none
was present. The illusory targets resulted from combining the diagonal
lines with S's when both were present, since far fewer were reported when
only the S's or the lines were present on their own. Surprisingly, subjects
saw as many illusory dollar signs with the triangle displays (Figure 9c) as
with the displays with separate lines (Figure 9b). This suggests that at
the preattentive level, the triangle is analyzed into three separate lines.
Unless these lines can receive focused attention, they seem to be free to
recombine with the S's to form illusory dollar signs. An interesting finding
was recently reported by Kolinsky. When she tested young children with
displays of this kind, the children also saw illusory dollar signs with the
separate line displays, but they did not with the triangles. Perhaps young
children perceive more holistically and do not separately detect each line
of the triangles at the preattentive level.
OCR for page 39
V7SUAL CODING OF F~5 ID OB~CT5 51
a b
,
: S Set,
/ an/
~ S S
' ~ ~
,. .
s /1
Son S
Son
\
\
A\
\
~ , ~
\
FIGURE 9 Examples of displays used to demonstrate illusory conjunctions of parts of
shapes. (a) Display containing a real target (dollar sign). (b) and (c) Displays that gave
rise to approximately equal numbers of illusory dollar signs.
. .
that attention selects particular stimuli through a kind of master map of
locations to which the different feature maps in separate modules are all
connected. Attention retrieves information about the different features
present in a particular restricted area of the field. When attention is
focused on a particular location, it pulls out the features, for example,
'`red" and "horizontal," that are currently present in that same location. In
this way, the attended color and orientation are conjoined to form a single
unitary perceptual object.
If attention is divided over the whole area, we can know from the
separate feature maps which features are present, but not how they are
spatially related to each other.
OCR for page 39
52
Colour maps
RED
YELLOW
BLU E /
' ~ ~
\
l
^: /,'
A'
/ ~ ,,
/
/ 1
~ t
STIMULI
Recognition network
Temporary
Object Representation
Time t Place x
Stored ~ Properties Relations
descnpbons of
objects, with
names . ~
Identity
Name etc.
~ ~ .
," it;
ATTENTION
FIGURE 10 Schematic framework to explain the results descnbed.
ANNE TREISA`4N
Orientation maps
Map of
Locations
The hypothesis seemed a little far-fetched, and we felt it would certainly
be nice to get more evidence to support it. We therefore devised a couple
more experiments, in which we tried to test some further predictions. In
one study we asked: Is it possible to detect which feature is present, without
knowing where it is? It should be, if the model I outlined is correct. When
presented with brief displays of multiple objects, subjects should be able
to check the map for "red" and to see whether there is activity there,
without necessarily linking it to any particular location in the master map
of locations. In the other experiment, we tested the prediction that the
presence of a feature could be detected when its absence could not. I will
come back to that experiment in a moment.
OCR for page 39
VISUAL CODING OF FEATURES AND OBJECTS
a
b
xX
XOXXOX
XX~
X
OOXOXX
FIGURE 11 (a) Example of display used to investigate the dependence of
identification on correct localization. (b) Same for conjunction identification.
'WHAT WITHOUT WHERE"
53
We did an experiment in which we asked subjects both to identify a
target and to say where it was. We flashed up a display of red O's and blue
X's like that in Figure lla (lleisman and Gelade, 1980~. The subject's task
was to report whether there was an orange letter or an H. Each of those
targets is defined by a unique feature. We were interested to see whether
they sometimes got the identity correct when they got the location wrong.
Is it possible to know "what" without knowing "where"? We measured
the conditional probability of getting the identity correct, given that the
location was wrong and found that it was quite high. On around 70 percent
of the trials in which the subjects Dislocated the target by more than one
square in the matrix, they were nevertheless correct in choosing whether it
was orange or an H.
In another condition (Figure lib) we replaced the "orange or H"
feature targets by two conjunction targets. Subjects had to do the same
two tasks: decide both the identity of the target and also its location. They
were asked: Was there a red X or a blue O. and also where was it in the
display? In this case, we found that if subjects got the location wrong, they
were at chance on getting the identity of the target. The theory claims that
to identify a conjunction target, you must attend to it, and therefore you
will know where it is, because attention is spatially controlled. So that was
OCR for page 39
54
ANNE TRElS~
one piece of supporting evidence: it seems that we can identify features
without necessarily locating them, but we cannot conjoin them correctly
without also knowing where they are. When attention is overloaded, it
seems that we have some free-floating feature information for which the
location is indeterminate. We can know, "Yes, there is orange there, but
I do not know where." Obviously, if the display remains present for long,
the subject will home in on the target very quickly; but our results suggest
that it is possible to cut off processing at a time at which the subject knows
what the target is but not where it is.
THE ABSENCE OF A FEATURE
If the story is correct, then there should also be other tasks besides
search for conjunction targets, that require attention. An interesting one is
search for a target defined by the absence of a feature, when that feature
is present in all distracters. The poppet strategy should not work here
if it in fact depends on detecting activity in a feature map that is unique
to the target. Suppose that we look in Figure 12a for the one circle that
does not have an intersecting line. We cannot check a map for any of its
features—vertical or straight or intersecting because each of these feature
maps would be swamped with activity. All the background items have the
lines and the target is the only one that does not have it. However, when
we look for the only circle that does have an intersecting line, as in Figure
12b, we can presumably just check the map for vertical (or whatever feature
defines the line), and we will find it automatically. This is exactly what the
results suggest (~eisman and Souther, 1985~. Search for the circle without
the line gives fairly steep linearly increasing functions which suggest serial
scanning. Search for the circle with the line gives flat functions with no
effect of the number of background items. So there does seem to be a
difference between "search for presence" and "search for absence."
This finding is surprising because exactly the same discrimination is
involved in the two tasks. We test the same pair of stimuli; it is just that
one plays the role of target in one case, and of distracter in the other.
FEATURE ANALYSIS AND THE ASYMMETRY OF CODING
If I am right that search is parallel when the target is signalled by
activity in the relevant map for a feature that is unique to the target, this
might give us a diagnostic to discover what other features are analyzed
early in the visual system. We cannot assume that the brain analyses
visual displays in the same way as physicists might. Perceptual properties
might not map directly and simply onto physical properties. We need some
empirical evidence to tell us what features function as natural elements or
OCR for page 39
ss
o
c
J %~
c
~ J
o c
- ~
1 ~
o
o
o
-
c'
%
%
-
c
D
o
%
%%
%
%%
%%
%
%
%~
%
%%
%
\
\
\
O ~
~ %'
I ~ ~ I ~ ~ \]
O O
O O
%= %9
tsw) aW!1
Ci
C:
o
()G~
C3
C:
%n
~ 3
_ ~
_
i~
-
o
~ ~ J 00 S
~'
o
o
o
3
O.
o
oo o
o
Ct
-
o
>% ~
o >
o
2
— ~
CD
C~
_ ~
0~ _
s~
— O
C~
C)
~ ._
—
')
C)
~ C
_ C~
Ct
Ct
C~
_
Ct
o
C~
%)
~ .=
—
't e,.c
.=
_
_ C~
C
._
C~ —
O
`:: ~
Ct .—
3
c ~
CQ ~
%)
>%-
~ c
s o
-
o
cq
-= · -
%)
s~
~ u)
-
cc
-
~ ~l
~ -o v
OCR for page 39
56
ANNE TREISMAN
"primitives" in the language of early vision. We used the search task to
look for possible asymmetries in the coding of a number of other simple
properties (~eisman and Gormican, 1988~. For example, we asked subjects
to find a curved line amongst straight lines or a straight line amongst curved
lines and looked to see whether there was any asymmetry in the difficult
of the two tasks.
What could this tell us? Suppose straightness is a primitive feature,
detected early in visual processing. Then its presence in the target should
mediate pop-out; it should be detected in parallel, just like the added
line was among circles without lines. Similarly, if curvature is a primitive
feature, a single curved target line should pop out of a display of straight
lines. Its presence would be signalled by the presence of activity in the
map for curvature. It might also be the case that only one of these two
features is coded positively, as the presence of activity, while the other is
coded simply as the absence of its opposite.
In fact, we found a very large asymmetry that was clearest when the
lines and curves were least discriminable (see Figure 13a). The asymmetry
suggests that the curved line functions as a feature in the visual system, while
the straight line does not. It is as if we code curvature as the presence of
something, and we code straightness by default, as the absence of curvature.
If we take seriously the analogy to the circle and line experiment, curvature
may be coded as the addition of a feature; a curved line, then, would
be represented as a line, plus a deviation from the standard or reference
value of straightness, just as the circle with an intersecting line could be
represented as a basic circle with an added feature.
We looked next at some other features of simple lines, for instance,
orientation. Is there any asymmetry there? We can ask subjects to look
for a tilted line amongst vertical lines or a vertical line amongst tilted lines.
Again, we found a large asymmetry: this time it was the tilted line that was
easy to find against a background of vertical lines, and gave hat functions
relating latency of search to number of distracter lines. When the target
was a vertical line on a background of tilted lines, search was slower and
latencies increased with the number of distracter lines. Again, by analogy
with the circles and lines, we might infer that the tilted line is coded as
the presence of an added feature- perhaps tilt and the vertical is coded
simply as the standard orientation with no added deviation.
Even colors seem to show a similar pattern. Colors tend to give
flat search functions unless the target and the distracter are very similar
and hard to discriminate, but we did find some asymmetry in search even
here. We looked at search for deviating colors like magenta and lime
and turquoise against standard colors like red, green, and blue, and found
faster, more parallel search than with the reverse arrangement. The colors
that were harder to find as targets were the "good" colors, the red, the
OCR for page 39
57
oo o
o o °
oo o
o o
o°
~ oo
o o
\l
_ 1,,,
-.7
u
~a
V)
~ OogO
" ~
C~
C\ C~
~ c\
>
o
4 ~ C)
~ l l
1
~ C
O C\
-. C
C U
C l I
~ 0£
a.=
C~ ~
g ~0
~ U~ =~-.'2 ~ ~ Oo ~
,=
-
. ~ ~
~:
~:
c:
I_ ~
_
~ a."
. - C'} ~
c ~ ~
o.e
~ - c
\ - ~ ~
·= 80
.= a
3 ~
~ 3
.= ~
~ _
.,. ~
- o
o.,
- U. . - 3
c~~ ~ ~
~ ~,
a
a -
0~4
'd
3 1 I ~ · _]
~
~'
o
. -
. ~
,~4
C'
'
v
. -
— Cl
V~
1
~ o
o . o o o o
o o o o o
(SW) ~W,I ~ U3JOOS
OCR for page 39
58
ANNE TREIS~fAN
green and the blue, and the ones that were easier to find were the deviating
colors, magenta, lime and turquoise.
The same asymmetries recur with some other properties: for instance,
converging lines against parallel lines. A pair of converging lines pop out,
while a pair of parallel lines in a background of converging lines are found
more slowly. Similarly, a circle with a gap pops out of a display of complete
circles, but not the reverse. The results of these search tasks are shown
in Figure 13. We seem to have stumbled on quite a general principle of
perceptual coding.
Perhaps we can generalize and say that the visual system is tuned to
signal departures from a normal or standard value. If this is correct, we
may be able to use it to explore some even less obvious cases, such as
the perceptual coding of "inside" versus "outside." Would a dot inside a
closed shape be easier or harder to find than a dot outside a shape? It
turns out that inside is harder to find, suggesting that this is the standard,
and outside is the deviating value. The asymmetry of coding appears to be
quite pervasive and may prove a useful tool to throw light on the nature of
the features extracted by the visual system at the early preattentive levels.
The experiments I have described so far all tested stimuli defined
by luminance contrasts. It may be of interest to ask whether the same
principles of coding would also extend to other media. How general and
abstract is the analysis? Patrick Cavanagh (1987) has been exploring the
properties of shapes defined by other kinds of boundaries; for example
color boundaries at isoluminance, texture boundaries defined by motion,
or by the size of the texture elements, or by stereoscopic depth. He and I
have recently looked at search performance when the stimuli (bars or discs)
are defined by discontinuities in these other media. For example, we can
create vertical or tilted bars from stationary random dot textures against
otherwise identical moving backgrounds. We can then ask subjects to look
for a target bar that is tilted among vertical distracter bars, or for a vertical
target bar among tilted distracters. We find results that are very similar to
those obtained with bars defined by luminance (i.e., darker or lighter than
the background). The same pop-out for a tilted target and serial search for
a vertical target appears with bars created by color, or motion, or texture,
or stereoscopic disparity. The coding language used by the visual system
seems to be quite general across these different channels or media.
PERCEPTION OF OBJECTS
My speculations at present are that vision initially forms spatially
parallel maps in functionally separate specialized modules. These modules
signal the presence of positively coded features that code deviations from
a standard or a norm. In order to access their locations, or to specify that
OCR for page 39
VISUAL CODING OF FEATURES AND OBJECTS
59
they are not present in any particular stimulus, or to tie them correctly to
other features of the same object, we have to focus attention serially on
each location in turn.The currently attended features can then be selected
and entered into some temporary representation of the attended object.
Once the features are assembled, their conjunction can be compared to
memories, to stored descriptions in a long-term recognition network, and
the appropriate identification can be made.
Other research (lleisman, 1988) suggests that anomalous conjunctions
that we might otherwise make in everyday life get weeded out at this
comparison stage and not before. ~p-down constraints from expectations
and prior knowledge seem not to influence which features are entered into
each object representation; the only constraints at this level appear to come
from spatial attention. Thus subjects who were expecting to see a carrot,
for example, were no more likely to recombine the orange from another
object with the shape of a blue carrot than they were to imagine its orange
color when no other orange object was present in the display.
These temporary object representations may also be important in
maintaining the perceptual continuity of objects as they move and change.
Once a set of features are conjoined and a perceptual unit is established, it
can be updated as the object moves or changes. In some recent experiments
with Daniel Kahneman and Brian Gibbs, we have found evidence that new
stimulus information gets integrated with the previously perceived object
that is best linked to it by spatio-temporal continuity. For example, a
letter is named faster if the same letter was previously presented within
the same outline shape, even when the shape has moved to a new location
in the interval between the two letters (Figure 14~. The naming latency is
unaffected if the same letter had appeared in a different outline shape, even
though the time interval and the distance between the pairs of letters were
equated. When the matching letter appeared in the same shape as the first,
the motion of the frame was sufficient to link the two letters as parts of
the same continuous object. If the features of an object change, we simply
update the temporary representation. The perceptual unity and continuity
of the object is maintained so long as the spatial-temporal parameters are
consistent with the continued presence of a single object. If we were ever
to see a frog turn into a fairy tale prince, we would perceive it as a single
character transformed, just one perceptual entity, even though everything
about it has changed its properties, its identity, its label, and so on. That
continuity, we suggest, would be mediated by a single object representation.
If my story is correct, we may have no introspective access to the
earlier stages of processing. These object specific representations may be
the basis of conscious experience. In fact, they would be our subjective
windows into the mind.
OCR for page 39
60
a
b
ANNE TREISA~N
. -
E
fir
, -I- .., i
, .
dc',
:::E -
,
, Hi. ,,
1
1
. .
[a
I,
FIGURE 14 Example of displays used to demonstrate the integration of information in
object-specific representations. (a) The two squares appear first; two lettem are briefly
flashed in the squares, which then move (empty) to two new locations. (b) A single letter
then appears in one of the squares, and subjects are asked to name it as quickly as possible.
In this example, the latency would be about 30 milliseconds shorter than it would have
been if the letter N had appeared in the left-hand square in the second display.
\
OCR for page 39
P75UAL CODING OF FEATURES AND OBJECTS
REFERENCES
61
Beck, J.
1966 Effects of orientation and of shape similarity on perceptual grouping.
Perception and Psychophysics 1:300
Cavanagh, P.
1987 Reconstructing the third dimension: Interactions between color, texture,
motion, binocular disparity, and shape. Computer Wilson, Graphics and
Image Processing 37:171-195.
Iivingstone, M.S., and D.H. Hubel
1987 Psychological evidence for separate channels for the perception of form,
color, movement and depth. Joumal of Neuroscience 7:3416-3468.
Nakayama, K
1988 The iconic bottleneck and the tenuous link between earb visual processing
and perception. In ~ Blakemore, ea., Sawn: Coding and Effwien~.
New York: Cambridge University Press.
Talisman, A.
19B2 Perceptual grouping and attention in visual search for features and
for objects. Jackal of E~al Psychology: Hump Percepti