Instead it is broken up into two pieces, each having the shape of the letter U, one right side up, the other inverted. Yet we have very little trouble in recognizing the separated fragments in Fig. 1 b as the letter C, despite the change in the image. Somehow our visual system has ignored the boundary between the letter and the rectangle and considers the C as continuing behind the rectangle. We do not see it as two separate letters (as in Fig. 1 c).
This single example suggests that a three-dimensional interpretation may be needed even before 2D information can be fully evaluated. Two problems seem most apparent. First, it is necessary to distinguish between the true boundaries of 2D surfaces and those arbitrary or spurious boundaries occasioned by occlusion. Second, the visual system needs a method to determine whether separate image patches should be joined together or whether they should be regarded as parts of different surfaces.
First, let us consider the spurious boundary problem in relation to the example. We can see intuitively that from the standpoint of considering the image patches corresponding to the letter C, the boundary between the C and the rectangle is arbitrary. It exists mainly as a consequence of the properties and position of the occluding rectangle. The border between the rectangle and the C does not “ belong” to the C but to the rectangle. The determination of border ownership is a necessary intermediate step in the building of a surface representation. How is border ownership to be determined? We hypothesize that it is dictated by that surface patch which is seen in front. This means that regions corresponding to the U-shaped fragments do not have a border where they meet the rectangle. In terms of representing these image patches as surfaces, these fragments are locally unbounded.
We also need to consider the second problem posed by occlusion. How can the visual system determine which fragments are part of the same surface and which are on separate surfaces? Should the two U-shaped pieces be linked together or should they be considered as separate? We have hypothesized elsewhere that when such unbounded regions face each other, they can be part of a single surface, which is completed behind an occluder. That stereoscopic depth plays a decisive role in dictating both border ownership and surface linkage can be appreciated by fusing the stereoscopic images shown in Fig. 2. [Fusion can be accomplished with or without optical aids. For instructional guidance see Nakayama et al., (8).] Here we see that when the small rectangle is seen as in back, the two U-shaped fragments remain as perceptually separated. They do not link to form a single extended surface. When the rectangle is seen as in back, however, there is a large qualitative difference. Now the two fragments join easily, enabling us to see the letter C.
A similar situation can be seen for more complex perceptual tasks such as the recognition of faces. It is often presumed that there must be an internal template of the face stored in visual memory and that this is compared to the image of the face. Our concern with occlusion forces us to consider an even more elementary problem. What portions of an image should the visual system use for the process of recognition and which parts should be ignored? Note the cartoon face shown in Fig. 3, which appears partially visible, seen through an aperture. If one only considers the outer boundaries of the face region, these might reasonably conform to the contour labeled x, indicating a narrow face. We suggest the recognition system must discount this edge because it belongs to the occluder in front. Thus, before recognition occurs, there needs to be a prior distinction between those edges belonging to that which should be recognized and all else.
This is illustrated in the stereogram presented as Fig. 4, where we present identical information in the right and left eye views. The only difference is a tiny horizontal shift of the face fragments in each monocular image such that when fused, the fragments are seen as either in front of or in back of the interposed strips. When viewing the two possible stereoscopic displays (face-in-front vs. face-in-back), there is a dramatic difference in the clarity of the whole face. When the face strips are seen in front, each strip stands alone and isolated against the background. The face fragments are visible, but they do not cohere. It is very different when the face fragments are seen