Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
INCORPORATING INVARIANTS IN MAHALANOBIS DISTANCE-BASED CLASSIFIERS: APPLICATIONS TO FACE 184 RECOGNITION INCORPORATING INVARIANTS IN MAHALANOBIS DISTANCE BASED CLASSIFIERS: APPLICATION TO FACE RECOGNITION Andrew M.Fraser Portland State University and Los Alamos National Laboratory Nicolas W.Hengartner, Kevin R.Vixie, and Brendt E.Wohlberg Los Alamos National Laboratory Los Alamos, NM 87545 USA AbstractâWe present a technique for combining prior knowledge about transformations that should be ignored with a covariance matrix estimated from training data to make an improved Mahalanobis distance classifier. Modern classification problems often involve objects represented by high-dimensional vectors or images (for example, sampled speech or human faces). The complex statistical structure of these representations is often difficult to infer from the relatively limited training data sets that are available in practice. Thus, we wish to efficiently utilize any available a priori information, such as transformations of the representations with respect to which the associated objects are known to retain the same classification (for example, spatial shifts of an image of a handwritten digit do not alter the identity of the digit). These transformations, which are often relatively simple in the space of the underlying objects, are usually non-linear in the space of the object representation, making their inclusion within the framework of a standard statistical classifier difficult. Motivated by prior work of Simard et al., we have constructed a new classifier which combines statistical information from training data and linear approximations to known invariance transformations. When tested on a face recognition task, performance was found to exceed by a significant margin that of the best algorithm in a reference software distribution. I. INTRODUCTION The task of identifying objects and features from image data is central in many active research fields. In this paper we address the inherent problem that a single object may give rise to many possible images, depending on factors such as the lighting conditions, the pose of the object, and its location and orientation relative to the camera. Classification should be invariant with respect to changes in such parameters, but recent empirical studies [1] have shown that the variation in the images produced from these sources for a single object are often of the same order of magnitude as the variation between different objects. Inspired by the work of Simard et al. [2] [3], we think of each object as generating a low dimensional manifold in image space by a group of transformations corresponding to changes in position, orientation, lighting, etc. If the functional form the transformation group is known, we could in principle calculate the entire manifold associated with a given object from a single image of it. Classification based on the entire manifold, instead of a single point leads to procedures that will be invariant to changes in instances from that group of transformations. The procedures we describe here approximate such a classification of equivalence classes of images. They are quite general and we expect them to be useful in the many contexts outside of face recognition and image processing where the problem of transformations to which classification should be invariant occur. For example, they provide a framework for classifying near field sonar signals by incorporating Doppler effects in an invariant manner. Although the procedures are general, in the remainder of the paper, we will use the terms faces or objects and image classification for concreteness. Of course, there are difficulties. Since the manifolds are highly nonlinear, finding the manifold to which a new point belongs is computationally expensive. For noisy data, the computational problem is further compounded with the uncertainty in the assigned manifold. To address these problems, we use tangents to the manifolds at selected points in image space. Using first and second derivatives of the transformations, our procedures provide substantial improvements to current image classification methods. II. COMBINING WITHIN CLASS COVARIANCES AND LINEAR APPROXIMATIONS TO INVARIANCES Here we outline our approach. For a more detailed development, see [4]. We start with the standard Mahalanobis distance classifier where Cw is the within class covariance for all of the classes, µk is the mean for class k, and Y is the image to be classified. We incorporate the known invariances while retaining this classifier structure by augmenting the within class covariance Cw to obtain class specific covariances, Ck for each class k. We design the augmentations to allow excursions in directions tangent to the manifold generated by the transformations to which the classifier should be invariant. We have sketched a geometrical view of our approach in Fig. 1. Denote the transformations with respect to which invariance is desired by Ï(Y, θ), where and are the image and transform parameters respectively. The second order Taylor series for the transformation is where R is the remainder,
INCORPORATING INVARIANTS IN MAHALANOBIS DISTANCE-BASED CLASSIFIERS: APPLICATIONS TO FACE 185 RECOGNITION Fig. 1. A geometrical view of classification with augmented covariance matrices: The dots represent the centers µk about which approximations are made, the curves represent the true invariant manifolds, the straight lines represent tangents to the manifolds, and the ellipses represent the pooled within class covariance Cw estimated from the data. A new observation Y is assigned to a class using The novel aspect is our calculation of where α is a parameter corresponding to a Lagrange multiplier, and is a function of the tangent and curvature of the manifold (from the first and second derivatives respectively) with weighting of directions according to relevance estimated by diagonalizing Cw. We define (1) Where Cθ,k is a dim(Î)Ãdim(Î) matrix. We require that Cθ,k be non-negative definite. Consequently is also non-negative definite. When is used as a metric, the effect of the term is to discount displacement components in the subspace spanned by Vk, and the degree of the discount is controlled by Cθ,k. We developed [4] our treatment of Cθ,k by thinking of θ as having a Gaussian distribution and calculating expected values with respect to its distribution. Here we present some of that treatment, minimizing the probabilistic interpretation. Roughly, Cθ,k characterizes the costs of excursions of θ. We choose Cθ,k to balance the conflicting goals Big: We want to allow θ to be large so that we can classify images with large displacements in the invariant directions. Small: We want to be small so that the truncated Taylor series will be a good approximation. We search for a resolution of these conflicting goals in terms of a norm on θ and the covariance Cθ,k. For the remainder of this section let us consider a single individual k and drop the extra subscript, i.e., we will denote the covariance of θ for this individual by Cθ. If, for a particular image component d, the Hessian Hd has both a positive eigenvalue λ1 and a negative eigenvalue λ2, then the quadratic term θTHθ is zero along a direction e0 which is a linear combination of the corresponding eigenvectors, i.e. We suspect that higher order terms will contribute to significant errors when γ⥠min so we eliminate the canceling effect by replacing Hd with its positive square root, i.e. if an eigenvalue λ of Hd is negative, replace it with âλ. This suggests the following mean root square norm (2) Consider the following objection to the norm in Eqn. (2). If there is an image component d which is unimportant for recognition and for which Hd is large, e.g. a sharp boundary in the background, then requiring to be small might prevent parameter excursions that would only disrupt the background. To address this objection, we use the eigenvalues of the pooled within class covariance matrix Cw to quantify the importance of the components. If there is a large within class variance in the direction of component d, we will not curtail particular parameter excursions just because they cause errors in component d. We develop our formula for Cθ in terms of the eigendecomposition as follows. Break the dim(Î)Ãdim(γ)Ãdim(Î) tensor H into components (3) Then for each component, define the dim(Î)Ãdim(Î) matrix (4) and take the average to get (5) Define the norm