As illustrated here in Figures 12-3 and 12-4, male and female values of life expectancy at age 50 for a given time period have a strong positive correlation across states and counties of the United States An efficient means of characterizing the two-dimensional distribution of male-female values is to draw ellipses that contain most or all of the data points. A simple method for creating such ellipses in a different application was described by Coale and Treadway (1986). Here, we employ an alternative approach based on principal components analysis (PCA).
In words, we begin by centering the data points around their mean values, identifying their two principal axes and projecting the points onto the new basis (i.e., computing coordinates of the data points in relation to their principal axes), and then rescaling each point using standard deviations of projected abscissa and ordinate values. This series of calculations turns the original ellipsoidal scatterplot into a circular collection of points centered on the origin. To reduce the influence of outliers, we approximate the circular distribution while ignoring the outer 10 percent of data points; that is, we find a minimum radius r such that a circle with this radius (centered on the origin) contains 90 percent of the observed points. This centering, projecting, and scaling process is then reversed, so that the points on the circle with radius r are remapped (i.e., scaled, projected, and centered) so that they are comparable to the original values of male and female life expectancy, forming an ellipsoid that contains 90 percent of the data points.
In formulas, let x1 = (x11,x21, … , xn1)T and x2 = (x12,x22, … , xn2)T be column vectors containing male and female values of life expectancy at age 50 by state or country for a given year, and let x = (x1, x2) be an n by 2 matrix. Suppose that μ1 and μ2 are the mean values of x1 and x2, and let Y = (x1 − μ1, (μ1, x2), x2 − μ2) be an n by 2 matrix whose columns contain the recentered values of male and female life expectancy. The sample covariance matrix, , can be decomposed using PCA by invoking the spectral decomposition theorem (Mardia, Kent, and Bibby, 1979, pp. 213ff, 469): , where the columns of U = [u1, u2] comprise an orthonormal basis, and Λ = diag(λ1, λ2) is a diagonal matrix with positive elements λ1 > λ2 > 0. By computing , we project the original data points onto the span of u1 and u2 and simultaneously rescale