Click for next page ( 10


The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 9
2 Image Analysis and Computer Vision Donald Geman, University of Massachusetts and Basilis Gicias, Brown University 2.l Introduction to Image Analysis The goal of computer vision is to build machines that reconstruct and in- terpret a three-dimensional environment based on measurements of radiant energy, thereby automating the corresponding processes performed by bio- logical vision systems, and perhaps eventually extending some abilities of these systems. This goal is far from being attained, and indeed! most of the fundamental problems remain largely unsolved. The field is generally immature by comparison with standard scientific disciplines, the present methodology being a hybrid of those from artificial intelligence, classical signal processing, pattern theory, and various branches of mathematics, in- cluding geometry and statistics. Still, important advances have been made that are beginning to substantially affect such areas as industrial automa- tion, earth science, medical diagnosis, and digital astronomy. Computer vision tasks are generally clivicled into "Iow-level" and "high- level" problems to differentiate between those that (apparently) are largely data-driven ("early vision") and those that (apparently) rely heavily on stored knowledge and symbolic reasoning. More specifically, Tow-level vi- sion includes such problems as coding and compressing ciata for storage and transmission; synthesizing natural and man-made patterns; restoring images degraded by blur, noise, digitization, anal other sensor effects; reconstructing images from sparse data or indirect measurements (e.g., computed tomog- raphy); computing optical flow from motion sequences; en c! reconstructing 9

OCR for page 9
10 three-dimensional surfaces from shading, motion, or multiple views (e.g., binocular stereo) or multiple wavelengths (e.g., multichannel satellite ciata). In contrast, high-level vision tends to be driven by more specific goals and generally involves object recognition and scene interpretation. These are perceived as the most Circuit of the "natural" processes to replicate; in particular, one of the factors that inhibits the introduction of industrial and exploratory robots is their inability to "see," in particular, to infer enough information about objects to navigate in complex, cluttered environments. Indeed, invariant object recognition is one of the most challenging modern scientific problems whose resolution may require new conceptual principles, computational procedures, and computer architectures. Biological vision systems, in particular the human eye and brain, analyze scenes in an apparently effortless way. This ability appears miraculous (es- pecially to workers in computer vision) and is attributed to several factors, not the least of which is the large proportion of the human brain devoted to vision. Still, we actually know very little about the principles of biolog- ical vision despite the information gathered by physiological, psychophys- ical, and neurophysiological studies. We do know that our visual system is able to integrate cues from many sources (e.g., binocular stereo, motion, and color) and that we exploit a priori expectations, specific scene knowI- edge, and contextual clues to reduce ambiguities and "correctly" perceive the physical world. It also appears that low-level analysis of retinal infor- mation and hi~h-level cognition are clone interactiveIv. sometimes referred cat ~ - , . ~ . . ~ arc . . ~ ~ fit . ~ qq to as the integration of nottom-up and top-cown processing. Finally, there is little doubt that our recognition and interpretation system is largely scale invariant and at least partially rotation invariant. Computer vision systems are usually quite inferior to biological ones. This may be due in part to the lack of raw processing power or suitably par- allel computation, but also, and perhaps more important, to the inability of synthetic systems to integrate sources of information and place appropriate global constraints. At the moment, automated visual systems rarely make "interpretation-guided" or "knowledge-driven" decisions, due probably to a lack of sufficiently invariant representations en c! to feedback mechanisms between these representations and the raw data. It appears inefficient, if not fruitless, to attempt to represent a given object in all its possible presen- tations. Instead, object representations and recognition algorithms should possess certain invariances with respect to scale, location, and rotation. Despite these shortcomings, important successes have been achieved in actual applications. In some areas, the sheer amount of data leaves no choice

OCR for page 9
11 except automated analysis. For instance, a single Landsat Multi-spectral Scanner image may contain 30 Mbytes of data. In fact, some of the earliest and most publicized successes of computer vision occurred during the 1960s and 1970s when images received from orbiting satellites and space probes were substantially improved with linear signal processing techniques such as the Wiener filter. More recently, significant advances have been made in the classification of satellite data for weather and crop yield prediction, geologic mapping, and pollution assessment, to name but three other areas of remote-sensing. Another domain in which automated systems are achieving success is the enhancement and interpretation of medical images obtained by computed tomography, nuclear magnetic resonance, and ultrasonics. Other applications include those to optical astronomy, electron microscopy, silicon wafer inspection, optical character recognition, robot navigation, and robot manipulation of machine parts and toxic material. By and large, the algorithms used in machine vision systems are specif- ically dedicated to single applications and tend to be ad hoc and unstable. From a practical viewpoint, problems arise when algorithms are so critically "tuned" to the particulars of the environment that small perturbations in the output of the sensors, or ordinary day-to-day variations, will signifi- Obviously, there is a need for algorithms that are more robust and that are based on solid theoretical foundations. But these are ambitious goals; the problems are very clifficult, ranging, in mathematical terms, from is-conditioned (unstable) to ilI-posed (underdetermined), the latter due to the Toss of information in passing from the continuous physical world to sampled and quantized two-dimensional arrays. In order to reduce the ambiguity, it is necessary to reduce the set of plausible interpretations by incorporating a priori knowledge and integrating multiple cues. One then seeks a mathematical representation for structure and regularity. Standard regularization theory, as applied, for example, to inverse prob- lems in particle scattering, is deterministic and lacks flexibility. One major current trend is toward "stochastic regularization." This is not surprising in view of the fact that many natural regularities are in fact nondeterministic: they describe correlations and likelihoods. Spatial statistics, in general, and lattice-basec! random field models, in particular, provide a promising frame- work for capturing these regularities and quantifying generic and a priori knowledge. cantI',r reduce their level of performance. Sinch models, properly conceived, impose severe but natural restrictions on the set of plausible interpretations. Thus spatial processes and Bayesian inference have provided a coherent theoretical basis for cer-

OCR for page 9
12 tain inverse problems in low-level vision. Moreover, this framework supports robust and feasible computational methods (e.g., Torte CarIo algorithms), measures of optimality and performance, and well-designed principles of in- ference. This methodology is described in more detail in 2.4. Section 2.2 contains a brief review of digital images, and 2.3 describes four specific image analysis tasks. 2.2 Digital Images The data available to an automated derision system are one or more images acquired by one or more sensors. The most familiar sensors are optical sys- tems, with ordinary cameras and lenses, that respond to visible light. Other sensors respond to electromagnetic radiation corresponding to other parts of the spectrum (e.g., infrared, X-rays, and microwaves), or to other forms of energy such as ionized high-energy particles (protons, electrons, alpha parti- cles), ultrasound, and pressure (tactile sensors). Many applications employ multiple sensors; for example, navigation robots may be equipped with video camera~s), range, and tactile sensors, and the Landsat multispectral scan- ners collect data in bands of both visible light and infrared! radiation. Sensor fusion is a current trend in many technologies and inferential procedures. Regardless of the form of energy acquired and the specific processes of detecting, recording, and digitizing, the output of all sensors has a common structure: it is a finite collection of measurements, y = {y`: t ~ T3, indexed by a finite set T. With few exceptions (e.g., photon emission tomography), y is a two-dimensional array, i.e., T is a grid of points (pixels) in the two- dimensional image plane. Each ye is integer-valued or, as in the case of multispectra satellite and color images, a vector with integer-valued compo- nents. Except in photon counting devices, the values of y may be regarded as "quantizations" of a continuum signal g = Age: t ~ TV, which, in turn, is a discrete approximation or sampled version of a function glut, u ~ R2, defined on the two-dimensional image plane (or some domain So C Rib. In addition to the errors introduced by digitization (= sampling + quan- tization), 9 involves various types of degradation (e.g., blur and noise; see below). In the absence of these degradations and other artifacts, i.e., in an "ideal" system, we would observe an ideal energy pattern foul, u ~ R2. The data y or {9~: t ~ Ti may be thought of as a representation of the physical scene being imaged. The task of computer vision is to estimate or infer properties ("attributes") of the scene on the basis of the data and a

OCR for page 9
13 priori knowledge or expections about the physical world. Attributes of inter- est may be the true pattern foul itself (as in image restoration); geometric features (e.g., orientation, depth, curvature) of objects in the scene; or se- mantical labels for scene entities, such as in object recognition or remote sensing, in which regions are classified as "forest," "cropland," "water," and so on. The relation of specific attributes (e.g., shape) to the true pattern flus is problem-specific, and some concrete cases will be treated in 52.3, 2.4. More specifically, the true pattern flub corresponds to the distribution of energy flux (radiance) emitted by objects, either because they are "il- luminated" by an energy source, or because they are a primary source of energy themselves; it is often referred to as "scene intensity" or "brightness." The measured values 9 correspond to the energy flux (or irradiance) inter- cepted, detected, and recorded by the sensor, and are usually referred to as "image intensities" or again simply as "brightness." Between emission and detection, various types of distortion and artifacts occur. These are usu- ally lumped into three major categories (before digitization): blur, which may be introduced by scattering within the medium (e.g., atmosphere, hu- man body), de-focused camera lenses, or motion of the medium, objects, or cameras; noise, introduced by the random nature of photon propagation ("quantum noise") or by the sensing and recording devices (e.g., film grain noise or current fluctuations in electronic scanners); nonlinear tmnsforma- tions of the signal (referred to as "racliometric distortion" ) introduced again by the sensing and recording devices. These degradations amount to a transformation from flus to glut. The most general transformation encountered is glut = ?~i'~f)(u)], Buy. (2.1) Here, I; accounts for blur and I{(f )(u) = J I((u, v, f TV. In the linear case, I((u,v,z) = I((u,~)z and I((u,v) is called the point spread function (PSF); the function ~ accounts for radiometric distortions, ~ is a collec- tion of noise processes, and i,) defines the noise mechanism (e.g., additive, multiplicative). These parameters have been studied extensively for many sensors (e.g., Vidicon and CCD cameras) and media (e.g., the atmosphere and human tissues). We refer to [9] and references cited there for more details.

OCR for page 9
14 2.3 Some Problems in Image Analysis In this section we describe four specific problems, which are representative of those in low-level computer vision: (1) image restoration; (2) boundary detection; (3) tomographic reconstruction; (4) three-dimensional shape re- construction. These problems demonstrate the mathematical difficulties en- countered in converting information which is implicit in the recorded digital image to explicit properties and descriptions of the physical world. By and large, these problems are naturally nonparametric, and, as inverse problems, range from ill-conditioned to is-posed. Consequently, as discussed in 2.1, one seeks a priori assumptions, or a priori information about the physical world (i.e., separate from the data and imaging equations), to constrain the set of possible, or at least plausible, solutions. An approach to some of these problems based on stochastic regularization is presented in 2.4. Other ap- proaches are briefly indicatecI in 52.3 and 2.4, and a few references are cited for more details. However, these problems have been studied extensively in the computer vision and engineering literature, and we refer the reader to [9] (and other surveys) for more complete references to related approaches based on stochastic regularization and B aye sian inference. Finally, the field of mathematical morphology [26i, which is not considered here, offers a quite different methodology for some of these problems. 2.3.] Image Restoration The classical image restoration problem for intensity data is that of recov- ering a true, two-dimensional (listribution of radiant energy, foul, u ~ R2, from the actual recorded image values. in a "continuous-continuous" set- up, the problem is posed by equation (2.1~. However, since the number of recorded values is finite (in fact space-discretization is inherent in many sensors), the continuous-discrete formulation is more realistic. Ignoring, for the moment, quantization of the measurement values, we can assume the actual recorder! data constitute a two-dimensional array 9 = 19i: i ~ S) of positive real numbers on a regular grid S = {i = (i3, ivy: in fact an N x N integer lattice. Then (2.1) is replaced by gi = {[I], Hi}, i ~ S. l,i2~}, (2.2) Assuming the degradation mechanism to be known (or previously estimated), the problem of recovering ffu) from {gig is nonparametric and obviously imposed in general. For computational purposes, the domain of foul is

OCR for page 9
15 discretized into a larger integer lattice S', concentric to S. of dimension N x N. N > N. and fLu) is replaced by f = {fi:i ~ S'). Then I: be- comes a discrete representation of the point spread function, and for linear space-invariant systems we have (K fat. = ~ Malijiffy, i ~ S. jES' Assuming K has bounded support (a reasonable assumption in most cases), then S' should be taken large enough so that the summation over j ~ S' includes all terms for which K(i- j) > 0. This is not exactly a convolution; one can imagine f convolved with I; on the infinite lattice, but only observed on S. and reconstructed on S'. The problem of estimating {fig from {gi) is still imposed. To see this, consider the simple case of a linear model: 9 = Iff + 9, with only blur and a single noise process. By relabeling the sites of S and S', we can regard 9, 71, and f as vectors of dimension N2, N2, and N'2, and Ii as an N2 x N'2 matrix. For ~ = 0 and N < N', the problem is underdetermined and I{-~ is not well-defined. But even if N = N' and K were invertible (e.g., toroidal convolution), the matrix is usually nearly singular, so the existence of measurement and quantization errors then renders the problem unstable in the sense that the propagation of errors from the data to the solution is not controlled. Put differently, given 9 and If, two images with blurred values very close to 9 can be very different. Consequently, one seeks additional information to constrain or "regular- ize" the problem. Traditional approaches (see section 4.2 of [9] and the ref- erences there) may be divided into linear methods, such as the traditional Wiener filter and other constrained least-squares methods, and nonlinear methods, such as the maximum entropy technique. Linear methods are typ- ically ill-conditioned and the reconstructions are often compromised by large oscillatory errors. Maximum entropy has been extensively examined and is widely popular in certain areas, such as digital astronomy [16, 8, 19, 244. The regularization underlying the standard constrained least-squares filter is equivalent to a Gaussian (improper) prior distribution, whereas that of maximum entropy is equivalent to a particular random field mode! whose variables interact only through their sum (= total energy). None of these methods addresses the crucial issue of discontinuities, which often convey much of the information in the image, and which are difficult to recover with standard methods. A general framework for nonlinear estimation based on spatial stochastic processes is outlined in 2.4.

OCR for page 9
16 2.3.2 Boundary Detection The boundary detection problem is that of locating (discrete) contours in a digital image that correspond to sudden changes of physical properties of the underlying three-dimensional scene such as surface shape, depth (occluding boundaries), surface composition (texture boundaries), and surface mate- rial. A common complication is that sharp physical boundaries may appear in an image as slow intensity transitions or may not appear at all due to noise and other degradation effects. In addition, extraneous "edges" may ap- pear from artifacts of the imaging system, such as sensor nonTinearities and digitization, or from "nonphysical" boundaries, such as shadows. Boundary ciassipcation refers to detecting and labeling boundaries according to their physical origins, but it is rarely attempted and appears essentially impossi- ble, at least without additional information from multiple views or sensors, temporal sequences, or specific scene knowledge. Segmentation is a closely related problem; one seeks to partition an image into disjoint regions (pixel cIasses) on the basis of local properties such as color, depth, texture and surface orientation, or on the basis of more global (or even semantical) criteria, for instance involving dichotomies such as "object-background" or "benign-maTignant." Clearly, each partition induces a unique boundary "map," whereas only boundary maps that are sufficiently organized yield useful segmentations. These problems are studied extensively in computer vision, and there are many concrete applications. For example, texture is a dominant feature in remotely sensed images, and texture segmentation is important in the anal- ysis of satellite data for resource classification, crop assessment, and geologic mapping. Other applications include automated navigation and industrial quality control; for example, in silicon wafer inspection, Tow magnification views of memory arrays appear as highly structured textures. Nonetheless, most work on boundary detection and segmentation has been of a general nature, separated from specific problems, and regardecl as a "preprocessing" step toward further analyses such as extracting three-(limensional shape attributes (see 2.3.4), object recognition, and full-scale scene interpreta- tion. In the view of some researchers, including the authors, this modular approach to image analysis is highly suspect, and generic segmentation is overemphasized. Algorithms abound for "edge detection," which refers to the problem of locating the individual changes in intensity independently of the overall scene geometry. Most of these algorithms are heuristic, partly because it is

OCR for page 9
17 Circuit to state these problems in precise mathematical terms. Traditional edge detectors [17, 20] are deterministic procedures based on (discrete) dif- ferential operators. Boundaries are then regarded as weD-organized subsets of edges and the construction of boundary maps is usually considered as a second phase in which the detected edges are "cleaned," "smoothed," and otherwise massaged into structures we associate with read boundaries. A variational method proposed in [22] for boundary detection has lead to in- teresting mathematical problems, but its practical utility is uncertain. Sta- tistical methods for segmentation in remote sensing are very prevalent [9, 25] and often successful. However, until recently most techniques employed non-spatial methods, such as linear discriminant analysis. Statistical meth- ods that are truly spatial are currently gaining popularity, and an example involving texture segmentation is presented in 2.4.2. 2.3.3 Tomographic Reconstruction Tomography is an imaging technology widely used in medical diagnosis (and also in industrial inspection and other areas). The two basic types are transmission and emission tomography. The most commonly used form of transmission tomography is the "CAT-scan," whereby a radiation source rotates about the patient's body and bombards it with X-rays or other atomic particles. Those particles that pass through the body are counted, and an image (or series of images) is formed from the combined counts; fewer counts correspond to regions of higher attenuation, which may, for example, indicate the presence of tumors. In emission tomography, a pharmaceutical product is combined with a radioactive isotope and directed to a location in the patient's body, usually by injection or inhalation. The pharmaceutical is selected so that its con- centration at the target location is proportional to some organic function of interest, for example, metabolic activity or local bloocT flow. The objective is then to reconstruct the (internal) two-dimensional or three-dimensional iso- tope concentration based on counting the number of released photons that escape attenuation and are registered by arrays of detectors placed around the body. For instance' in positron emission tomography (PET), the isotope emits a positron, which, upon colliding with a nearby electron, produces two photons propagating in opposite directions. From here on the focus is on single photon emission computed tomog- raphy (SPECT), in which the isotope releases a single photon each time a radioactive decay occurs. Arrays of collimators are placed arousal the area

OCR for page 9
18 of interest and capture photons that escape attenuation and whose trajecto- ries carry them down the collimator. The problem is then to reconstruct the isotope concentration from the photon counts and is similar to the inverse problems mentioned in 2.3.1. The dominant physical effect is photon attenuation, by which photons are annihilated and their energy absorbed by body matter. Other significant effects are photon scattering and background radiation, as well as those effects induced by the sensors. Attenuation is accurately described by a single function ~ whose values are known for bone, muscle, and so on, and for various photon energies. The incorporation of scattering and other effects is more subtle [134. We formalize the SPECT reconstruction problem as follows [12, 274. Let Mu) be the isotope density defined on some domain So, which is usually taken as a two-dimensional region corresponding to a cross section of the body. The detectors A, j = 1,...,m, are arranged on a linear array at equally spaced intervals, and the array is positioned at equally spaced angles Ok, k = 1, . . ., n, for the same time duration at each angle. Let T = ~ (oh, Ok ~ : j = 1, . . . ,m; k = 1, . . . ,n). The total number of observed photons is an array y = {y': t ~ T). Assuming that the photons generated in regions So are governed by a spatially nonhomogeneous Poisson process with mean x~u) at the point u ~ So, the observation y is a realization from another nonhomogeneous Poisson process, Y = {Y~: t ~ T3, with mean ~ BY = Ax, (2.3) where the linear operator A incorporates attenuation (via the attenuates! Radon transform) and other physical effects [134. Hence, the conditional probability distribution of Y given x is P(Y = yin) = II ~ )t expel(Ax)~. ACT t- (2.4) For computational purposes, the region So is discretized into a grid S of pixels. Then x = fXi: i ~ ST represents a piecewise constant approximation ~ , ~ ~ of Mu), and the operator A becomes a matrix A = (a',,i, t ~ T. ~ ~ S. The oldest, and still current, method used for reconstructing x from y is back-projection, which essentially inverts (2.3), and is not very accurate. A more recent method [27] is that of maximum likelihood (ML), i.e., maximize P(y~x) with respect to x, which is implemented with the expectation maxi- mization (EM) algorithm. However it has been widely realized that the ME . ~ _ ~ ~ ~ ~ . ~

OCR for page 9
19 estimator is too "rough." Various procedures for smoothing this estimator have been suggested (see the references in [9~), including penalized ME and the method of sieves. Another alternative within the Bayesian framework is described in 2.4.2. 2.3.4 Three-Dimensional Shape Reconstruction The problem of estimating or reconstructing properties of three-dimensional surfaces, such as orientation, height, Gaussian curvature, and Euler charac- teristics, from digital images is also widely studied in computer vision. In a common paradigm, especially in robot vision, these features are regarded as indispensable steps toward object recognition and other goals; the extraction of geometric features is followed by the imposition of relational structures among the features (using "grammars," symbolic graphs, and so on), and in turn by matching these data structures against stored models of real ob- jects and spatial relationships among them. Again, the paradigm consists of distinct steps, and the extraction of geometric features is itself precedecl by "pre-processing," which encompasses noise removal and other aspects of restoration, edge and boundary detection, and perhaps segmentation. The main sources of information ~ "cues" ~ for three-dimensional shape re- construction are the intensity changes themselves ("shape-from-shading"), pairs of images corresponding to multiple views ("stereopsis") or wave- lengths, motion sequences ("shape-from-motion"), texture analysis ("shape- from-texture"), and range data. Stereopsis is thought to be the most im- portant process by which human beings obtain depth information, but has played a lesser role in automated systems due to computational demands and lack of accuracy. In contrast, shading has played a larger role in ma- chine vision than in human vision, and range data is becoming an important source of information for automated systems but is unavailable to humans. The mathematical problems encountered in three-dimensional shape re- construction are similar to those in 52.3.1-2.3.3. We conclude 2.3 with two examples in more detail. Stereopsis Stereo vision uses two cameras Corresponding to our two eyes) and a single light source. The relative difference, or so-called disparity, in the positions of an object in the two images is useful information for extracting surface orien- tation and relative distances between objects. A physical point in the viewed ;

OCR for page 9
26 pixel lattice by SP = {i = (il,i2): 1 < il. i2 < N}, the gray-level intensity process by XP = {XP: i ~ SP}, and the observation process by Y = {Yi: i SP}. Then Yi = {I, Hi}, i ~ SP . (2.12) We assume that K, , fib, and the law of ~ are known. The basic idea proposed in [10] is to use MRF models to account for the spatial coherence among the intensity levels of nearby pixels, as well as for the existence of discontinuities between intensity surfaces. To this end, in addition to the intensity process XP, a second "edge" process XE is introduced. The process XE accounts for discontinuities, and is indexed by the dual lattice SE of SP; SE consists of all nearest-neighbor pairs ~ i, j > from. SP, and an element ("site") t ~ SE corresponds to a putative edge between the corresponding pixel sites. Thus, XE = {xE: t ~ SET = {XEi j>: i, j ~ SP}, XE ~ {0,1) where XE = 1 (resp. 0) indicates presence (resp. absence) of an edge at t ~ He. The process XE is neither part of the data nor the target of estimation; rather, it is an auxiliary process designed to bring exogenous information into the model, en cl it is coupled to XP in such a manner that in the likely states of the joint probability distribution of X = (XP,XE), the intensity function is TocaDy smooth with possibly sharp transitions, and the locations of the edges satisfy our a priori expectations about the behavior of boundaries. The process X = (XP,XE) is indexed by S = SP U so, and is chosen to be a MRF with energy function of the form U(x) = U(x ,xE) = U~(xP,xE) + U2(xE), (2.13) where Us reflects our expectations about interactions between intensities and edges (where edges "belong"), while U2 reflects boundary organization. Both terms are constructed from "local energies" corresponding to a neighborhood system N- = {~c':cl ~ S). The simplest neighborhood system ("nearest neighbors") is shown in Figure 2.1, where dots denote pixels and crosses edge sites. The energy Ui(XP,XE) is defined so that the Tow energy states will have xEi j> = 1 (resp. O) when hipxjP~ is large (resp. small). More specifically, Ui~xP,XE) = 8i ~ (XiPXjP)~1XEi j>), (2.14) with 8~ > 0, ~0) = - 1, and ~ even and nondecreasing on [0, oo). Note that when x = I, the bond ("interaction") between pixels i and j is broken;

OCR for page 9
27 x x x x x i x xt x x x x Phi, i ~ SP, Ibid = 8 Jet, t ~ SE, |~tl = 8 FIGURE 2.1 also, the properties of ~ ensure that when xP = xP, |iAl = 1, then XEi j> = 0 is a lower energy state than xEi j> = 1 and, in the absence of boundaries, there is cooperation between nearby pixel intensities. Note also that when (xP _ UP) = 0, we have no preference about the state of an edge at site i, j >. A specific ~) that has been used in restoration problems [9] is ~y = ~2 [I + ~~ ] , ~ ~ ~K,ki, (2.15) where ~ is a scaling constant. The energy U2(xE) reflects our prior expectations about boundaries: most pixels are not at boundary regions; boundaries are usually persis- tent (no isolated or abandoned segments); boundary intersections (junc- tions), sharp turns, and "small structures," are relatively unlikely. For spe- cific choices of U2(xE), we refer to [9] (and references cited there). Some of the above generic properties of boundaries may also be captured with penalty functions (see (2.7~. For problems, such as image restorations and shape-from-shading, where the edge process XE is auxiliary, using "soft con- straints" such as U2(xE) is satisfactory, whereas for problems in which the boundaries are of central interest (e.g., texture segmentation; see the next subsection), penalty functions are often more appropriate. The energy function (2.13) determines the prior proxy. Assuming that the noise process {yi) is independent of X = (XP,XE), the transforma- tion (2.12) determines the degradation model P(Y~X) in a straightforward manner. The posterior distribution ~r(X~Y) is then computed from the joint distribution P(Y~X)1r(X). For example, suppose that Hi is Gaus- sian with mean ~ and variance a2, and that b ~ ~ (a,b) is invertible, say,

OCR for page 9
28 7li = F(yi'[(KxP)i]) for some F(u,v). Then assuming that F(u,v) is strictly increasing and Fl(u, v) = ~8 F(u, v) exists, the posterior energy (see (2.8)) is given by [9] U( ly) = U(z ~ ) + 2 | F (Yi, [(K P)i] 2a iESP ~ log F1 (yi, ~[(KXP)i]) (2.16) iEsP Boundary Detection We summarize the procedure developed in [11] for detecting textural bound- aries. The method also applies to the problem of locating boundaries rep- resenting sudden changes in depth and surface orientation. The procedure involves two processes: the intensity process XP = fXP: i ~ SP), as in the first subsection of 2.4.2, and the boundary process XB = {XB:t ~ SIB), which is indexed by a regular lattice SB with sites interspersed among the pixels of sB. Again, XB = {0,11 with X~ = 1 (resp. O) indicating presence (resp. absence) of a boundary at t ~ sB. The prior distribution for X = (XP, XB) is chosen to be of the form (2.7) with U(x) = U(xP, xB), and V(x) = V(xB). The intensity-boundary term U(xP,xB) is chosen to promote placement of boundaries between regions in the image that demonstrate distinct spatial patterns. In [11] it was chosen to be of the form U(xP,xB) = ~ Ax )~1at as ), t,s ~ S , (2.17) where denotes nearest-neighbors in so. The function 4~',s is critical; it is a measure of disparity between the gray-level values in two blocks of pixels aclJacent to ~ I,s >; large disparities ( ~ O) encourage the presence of an active boundary (i.e., xB = xsB = 1), while small disparities (~ << 0) discourage the presence of a boundary (i.e., xBxsB = 0~. Specific choices of As in [11] are constructed in terms of the Kolmogorov-Smirnov distance applied to either the raw (lata ("first-order" statistics), or to transformed data corresponding to higher-order statistics (e.g., window means, range, variance, and "directional residuals"~. As a function of xB, (2.17) is similar to "spin glass" models in statistical mechanics. The penalty function V(xB) is chosen to inhibit unwanted configurations such as blind endings of boundaries, redundant boundaries, sharp turns, and other forbidden patterns (see [11] for details).

OCR for page 9
29 Assuming that there are no degradations that preclude direct observation of UP, then the data y _ UP. The degradation model P(Y~XB) is singular (the point mass at y = UP ), and the posterior energy is U(xB ~ y = UP ~ = U(y=xP,xB). The MAP estimate is equivalent to global optimization with constraints. Plates 2.1 and 2.2 (from [114) show two experiments. .. .. . ~ In . ~ ~ . ~ . ~ Plate 2.1 is an banana synthetic aperture racer tbAb) Image or Ice noes in the ocean: (a) original image, 512 x 512, (b) sixteen "snapshots" from sixty sweeps of stochastic relaxation with constraints. Plate 2.2 is a collage composed of nine Brodatz textures: leather, grass, and pigskin (top row), raffica, wool, and straw (middle row), and water, wood, and sand (bottom row). Two of the textures, leather and water, are repeated in the two circles; (a) original 384 x 384, (b) detected boundaries obtained by deterministic (left) and stochastic (right) algorithms. Single Photon Emission Tomography The digitized isotope intensity (see 2.3.3) is thought to be a realization of a spatial process X = {Xi: i ~ S). The idea of [12, 13] is to use a Gibbs prior to reflect the common observation that neighboring locations of the isotope surface typically have similar concentration levels, whereas sharp changes in concentration may occur, for instance, across an arterial wall or across a boundary between two tissue types. In contrast to the procedure in restoration, sharp changes are not represented explicitly by an edge or boundary process; instead, the intensity mode! is designed to allow such changes. Specifically ~ cf. [12] ), U(x) = ~ ~ (Xi - Xj) + pm ~ (Xi - Xj), Vy Li,j] (2.18) where < i, j ~ denotes a nearest-neighbor bond, [i, j] represents a nearest- neighbor-(liagonal bond, and ~ is given by (2.15~. The degradation model is given by (2.4), and the posterior energy is then U(x~y) = U(x) + ~~(Ax)~y' Tog(Ax)~; . t6T Although U(x) has a local structure, the graph for U(x~y) is highly nonfocal due to A. The choice of the "smoothing" parameter ,l] is critical. For ,B = 0, the MAP estimator is just the MID estimator an(l, hence, typically too rough. For

OCR for page 9
30 large A, the MAP estimator is too faithful to the prior and, hence, typically too smooth. The value of ~ is estimated in [12, 13] via the EM algorithm or the method of moments. The parameter ~ in ~ is also important, but its statistical estimation from the data appears to be difficult. Fortunately, reconstructions are not sensitive to moderate changes in A, and empirical values based on information about range intensities work well [124. Plates 2.3, 2.4, and 2.5 show three experiments from [13] with real (hos- pital) data, using the ICE algorithm. For comparison, the reconstruction with the filtered back projection (FBP) method is also shown. In all three cases the ~ is estimated by the ML method. Plate 2.3 shows a slice of a human skull across the eyes: (a) FBP, (b) ICE with ~ = 2.7. Note that in (b) one can distinguish details such as the nose bone, eyes, brain region; also the skull border is sharp. Plate 2.4 displays a SPECT reconstruction of a simulated phantom. The model used in this experiment was developed by the Nuclear Medicine Department of the University of Massachusetts Medical Center, in Worcester. This is a comprehensive model that captures the effects of photon scattering, photon attenuation, camera geometry, and quantum noise: (a) original phantom, (b) FBP reconstruction, (c) ICE re- construction with ,B = 1. Plate 2.5 is a human liver/spleen scan: (a) FBP, (b) ICE with ,l] = 3. The value ,0 = 3 is the ME estimation; (c) and (~) are ICE reconstructions with ,0 = 0 and ,l] = 20 respectively; they demonstrate the significance of the parameter ,0. Shape-E`rom- Shading We focus on the estimation of surface orientation. For simplicity, we assume that S and p are known, that the reflectance map is spatially homogeneous and known, and that V is constant throughout the image (orthographic pro- jections). However, the procedure presented below can be modified [28] to es- timate also S and p. There are three basic processes: The true (undegradecI) intensity process XP = {XiP: i ~ S), the shape process N = Toni: i ~ S) - where Ni is the unit normal at the surface ``point,, corresponding to pixel i' and the observation process Y = {Yi: i ~ S}. Here, S is a discretization of the domain of zebu) (see 2.3.4~. The shape process N (the target of estima- tion) is related to XP via the discrete version of (2.5), and XP is related to Y via (2.12~. XP is an indeterminate process and will play no direct role here. In the absence of degradation we have Yi = R(Ni), which is a determin- istic constraint on N. The shape-from-shading problems usually refer (see

OCR for page 9
31 [17~) to this case (i.e., observable XP), and we refer to [17,18] for various approaches to the problem including deterministic regularization techniques. The procedure below was developed in [28] and applies to both unde- graded and degraded data. The basic idea is to use Gibbs distributions to articulate general properties of shapes: surfaces are locally smooth, orienta- tion may exhibit jumps because of changes in depth or presence of surface discontinuities. As in restoration, the process N is coupled to an edge pro- cess XE = (XE: t ~ sE), where XE and SE are as in the first subsection of 2.4.2. The coupled process X = (N. XE) is a MRF with an energy function U(x) = U~(N,xE)+ U2(xE) (compare with (2.13~), where U2(xE) is chosen as in the first subsection of 2.4.2, and Ui(N,xE) = 8i ~ (1 - zEiti>~(~Ni - Nj~) + 82 ~(1 - hti,j~ ) ( HiNj ~ ), (2.19) fi,j, with 8~, 82 > 0, , [i,ji as in (2.18), and Hopi Hi = 1 if any of xExE = 1, xExE = 1, xExE = 1, xExE = 1, are true, zero otherwise; here i, j, tl, t2, t3, t4 are as in Figure 2.2. It X ~ t4 x x t2 i. X t3 FIGURE 2.2 The function ~ was chosen [28] to be f(1NiNj~) = -1 + ToniNj~2 = _ _ _ Ni Nj. Because of the constraint Knit = 1, the prior resulting from this choice of ~ is non-Gaussian even if 82 = 0 and Be = 0. In fact, mode] (2.19) has worked well in some cases t28] even without the edge process (i.e., XE = 01. In the absence of degradation, the distribution P(Y~X) is degenerate and amounts to the constraint Vi(N) = ~ Hi - R(Ni)~2 = 0. In the ins presence of degradation, P(I'~X) is computed as in the first subsection of 2.4.2. In both cases, there is an extra deterministic constraint V2(N) = 0

OCR for page 9
32 corresponding to the integrability condition. In the former case, the shape- from-shading problem reduces to minimizing U(N, X8) under the constraint Vi(N) + V2(N) = 0; in the latter case, to minimizing the posterior energy U(N,xE) under the constraint V2(N) = 0. The procedure does not require boundary conditions. However, in the absence of degradation and a single image, the quality of reconstruction is better [28] if one assumes correct boundary conditions along the image border. In the presence of noise, the results are satisfactory if one uses two images (photostereo) obtained with a single camera but two light sources of different origin. Plate 2.6 shows an experiment from [28] with an egg imaged under un- controlled illumination. The surface of the egg was assumed to be matte, and the algorithm estimated, in addition to N. the albedo p and an effective light source direction S. The reconstruction used a combination of con- strained annealing and ICM: (a) original image, 64 x 64, (b) reconstruction, (c) reconstructed egg illuminated from x-direction, and (~) reconstructed egg illuminated from y-direction. Deformable Templates In this subsection we briefly describe a powerful and elegant methodology introduced by Ulf Grenander for pattern synthesis and analysis of biological shapes. It provides a promising geometric/Bayesian paradigm for medical and industrial applications. The procedure [6] is based on global shape models designed to incorpo- rate prior (biological) knowle(lge about the variability anti global properties of shapes, and quantitative information about the variability of the geomet- ric object. The shape mode! has three basic components: (a) a "geometry" consisting of a space G of generators, a connector graph a, a "regular- ity" relation R. and a transformation group S:G ~ G; (b) a template (or templates) describing the overall architecture of the shape; and (c) a ~ w ~ ~ group-valued stochastic (typically Markov) process, which articulates the statistical variations of the shape. The choices of the template, transfor- mation group, and stochastic process control the desired global and local geometric properties of shapes. The choice of (a) is application-specific. We refer to [6] for the general framework, experiments, and references. Here we outline the method for the special case of two-dimensional (planar) shapes (as in the HAND experiment 6. Assuming that all the relevant information is contained in the boundary, and approximating the boundary

OCR for page 9
33 by a polygon with, say, n edges, the components of (a) are as follows: pi ~ G. j = 0, 1, . . ., n1 are polygonal edges; a is a cyclic graph consisting of the n nodes; R may, for example, be the condition that the polygon is closed and simply connected (other regularity conditions are often desirable); S may be chosen to be the general linear group G];~2), or the Euclidean group (i.e., rotations 0~2) and translations), or the group US(2) of uniform dilations (scare), or US(2) x 0~2~. The configuration space of interest is C(R) = {c = Ago, . . . ,gn_~): pi ~ G. j = 0, . . ., n1, c satisfies R3, i.e., the set of boundaries of closed, simply connected polygons. The interior of a polygon defines a pure image I. A template is a specific configuration c() = a~g(), . . ., 9~~ ~ ~ C( OCR for page 9
34 serve a degraded version ID Of a true pure estimation. The degradation mechanism model (apt, equivalently P(ID~c), as ~~ .. . . . .. . A, . _n ~ image ~ which is our target of ~ iD defines the clegraciation in the first subsection of Q2.4.2. linen the posterior distribution Ptc~l~) is computed as before, and con- tains all the relevant information about the unknown image ~ (equivalently c). The computational burden of estimating ~ is demanding, but feasible. Asymptotic arguments and other specific properties have been exploited in [6] to reduce the computations. One important aspect of the approach is that it can often be combined with dynamic programming to speed up the processing considerably. Bibliography [~] Aarts, E., and J. Korst, Simulated Annealing and Boltzmann Machines, John Wiley and Sons, 1989. [2] Almeida, M., and B. Gidas, A variational method for estimating the parameters of MRF from complete or incomplete clata, preprint, Brown University, 1989. [3] Besag, J., Spatial interaction and the statistical analysis of lattice sys- tems (with discussion), '7. R. Stat. Soc., B 36 (1974), 192-236. [4] Besag, J., On the statistical analysis of dirty pictures (with discussion), J. R. Stat. Soc., B 48 (1986), 259-302. t5] Besag, J., Towards Bayesian image analysis, J. Appl. Stat. 16 (1989), 395-407. [6] Chow, Y., U. Grenander, and D. Keenan, HANDS, A Pattern Theoretic Study of Biological Shapes, to be published by Springer-VerIag, 1990.

OCR for page 9
35 [7] Comets, F., and B. Gidas, Parameter estimation for Gibbs distributions from partially observed data, submitted to Ann. Appl. Prod. (1989~. [8] Frieden, B. R., Restoring with maximum likelihood and maximum en- tropy, J. Opt. Soc. Amer. 62 (1972), 511-518. [9] Geman, D., Random Fields and Inverse Problems in Imaging, to appear in Lecture Notes in Mathematics, Springer-VerIag, 1990. [10] Geman, S., and D. Geman, Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images, IEEE Trans. Pattern Anal. Machine Intell. 6 (1984), 721-741. [11] Geman, D., S. Geman, C. Graffigne, and P. Dong, Boundary detec- tion by constrained optimization, IEEE Trans. Pattern Anal. Machine Intell. 12 ( 1990), 609-628. t12] Geman, S., and D. McClure, Statistical methods for tomographic im- age reconstruction, in Proceedings of the 46th Session of the Int. Stat. Institute, Bulletin Int. Stat. Inst. 52 (1985~. [13] Geman, S., D. McClure, K. Manbeck, and J. Mertus, Comprehen- sive statistical model for single photon emission computed tomography, preprint, Brown University, 1990. [14] Gidas, B., A renormalization group approach to image processing prob- lems, IEEE Trans. Pattern Anal. Machine Intell. 11 (1989), 164-180. [15] Grenander, U., Tutorial in Pattern Theory, Technical Report, Brown University, 1983. [16] GuH, S. F., and J. Skilling, Maximum entropy methods in image pro- cessing, TEE Proc. 131 ~ 1984), 646-659. [17] Horn, B. K. P., Robot Vision, MIT Press, 1986. [18] Horn, B. K. P., Height and Gra(lient from Shading, MIT A.~. Memo No. 1105. [19] Jaynes, E. T., On the rationale of maximum entropy methods, Proc. IEEE 70 (1982), 939-952. [20] Marr, D., Vision, W. H. Freeman, 1982.

OCR for page 9
36 [21] Marroquin, J. L., S. Mitter, and T. Poggio, Probabilistic solution of ill- posed problems in computational vision, ~7. Am. Stat. Assoc. 82 (1987), 76-89. [22] Mumford, D., and J. Shah, Boundary detection by minimizing func- tionals, talk presented at the IEEE Conference on Computer Vision and Pattern Recognition, San Francisco, 1985. [23] Poggio, T., V. Torro, and C. Koch, Computational vision and regular- ization theory, Nature 317 (1985), 314-319. [24] Ripley, B. D., Statistical Inference for Spatial Processes, Cambridge University Press, 1988. [25] Ripley, B. D., Statistics, images, and pattern recognition, Can ~ Stat 14 (1986), 83-111. [26] Serra, J., Image Analysis and Mathematical Morphology, Academic Press, New York, 1982. [27] Shepp, L. A., and Y. Vardi, Maximum likelihood reconstruction in positron emission tomography, IEEE Trans Medical Imaging 1 (1982), 113-122. [28] Torreao, J., A Bayesian Approach to Three-Dimensional Shape Esti- mation for Robot Vision. thesis, Brown University, 1989.