Visual Control

A second interesting area of research is visual control, used to provide visual feedback for obstacle avoidance and locomotion guidance. In the mid-1990s, a computer system developed by Carnegie Mellon University successfully controlled a car driving cross-country for more than 90 percent of the travel time, although not in urban or congested areas. The control operates on a hierarchy of levels: the system needs to make low-level decisions, such as which lane to travel in, while taking into account upcoming decisions, such as when to make the next turn, and these decisions must be moderated by a high-level plan that navigates the vehicle toward its final destination (Problem 13).

Problem 13. If we control a navigation system by making use of visual information, there will inevitably be delays in the feedback loop due to processing time. Thus we need to design control laws involving look-ahead to avoid instabilities in control.

Image Reconstruction

Segmentation is a basic task in image processing—it is performed to partition a collection of pixels into objects. Object boundaries are cued by differences in brightness, color, texture, and/or stereographic depth among neighboring pixels, and their detection can be aided by motion and the recognition of certain common objects (e.g., a chair). This processing is usually performed bottom-up (processing pixels to determine objects), but a top-down approach can also be useful: for instance, if the first processing steps suggest that a human figure is present, a top-down model might then be invoked to list body parts that ought to be in the scene, and perhaps to suggest where on the image the pixels might constitute a face. Segmentation also can be performed by surface fitting or by probabilistic inference with a Markov random field model.

Problem 14. Perform segmentation by unifying the bottom-up and top-down approaches and making use of all of the visual cues (brightness, color, depth, texture, and so on).

More recently, graph partitioning has been applied to the problem. 3 Each pixel is a node in the graph, and the weight of an edge is based on the similarity between pairs of pixels in features such as brightness, texture, and their coordinate differences. The eigenvectors of the graph Laplacian can be used for partitioning the pixels into segments, but the mathematical theory is incomplete: properties of the eigenvectors are not well understood. Applications to computer vision problems have been pursued.4


Shi, J., and Malik, J., Self-inducing relational distance and its application to image segmentation, Proc. of Fifth European Conference on Computer Vision, H. Burkhardt and B. Neumann, eds., Springer-Verlag, Berlin, 1998, pp. 528-543.


See, for instance, Boykov, Y., Veksler, O., and Zabih, R., Fast approximate energy minimization via graph cuts, Proc. of Seventh IEEE International Conference on Computer Vision, IEEE Comput. Soc., Los Alamitos, Calif., 1999, pp. 377-384; Ishikawa, H., and Geiger, D., Occlusions, discontinuities, and epipolar lines in stereo, Proc. of Fifth European Conference on Computer Vision, H. Burkhardt and B. Neumann, eds., Springer-Verlag, Berlin, 1998, pp. 232-248; and Roy, S., and Cox, I.J., A maximum-flow formulation of the N-camera stereo correspondence problem, Sixth International Conference on Computer Vision, Narosa Publishing House, New Delhi, 1998, pp. 492-499.

The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement