Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Page 91 Appendix A Factor analysis details Factor analysis is a method of dimension reduction based on Principal Components Analysis (PCA). The starting point is a rectangular dataset with a relatively large number of correlated variables (p) and a larger number of observations (n). The original set of variables can be thought of as dimensions in a p- dimensional space, and PCA produces a new basis for that space, where each dimension in the new space is a linear combination of the original variables. The linear combinations are selected based on variance accounted for by each dimension. The first Principal Component (PC) is the linear combination that accounts for the most total variance in all variables, followed by the PC that accounts for the most remaining variance and is orthogonal to the first PC, followed by the PC that accounts for the most remaining variance and is orthogonal to the first two PCs, and so on. Once the PCs are determined, the next step is to select a small subset of dimensions to define a new, smaller space that accounts for a large proportion of the original variance. Note that these new dimensions (PCs) are orthogonal by definition. Factor analysis takes the reduced dimensions and rotates them in the original space to better line up with the original dimensions. The goal of the rotation is to produce a reduced, but more interpretable set of dimensions. In this application, a group of variables that are all correlated with a given PC give an indication of the âmeaningâ of that PC. From these groups, it is generally possible to identify a process that is being captured and to select 1-2 variables from the set that will be used in analysis of fatalities to represent that process. Factor analyses for this project were run in groups of similar variables to focus the data reduction activities around each variable grouping (rather than trying to develop dimensions across all possible variables. The analysis is not prescriptive in which variables might be the best predictors, so we present the reasoning behind our selections in each section. Population Variables Population variables include overall state population for each year, as well as state population broken down by age and gender, 19 variables in total. The factor analysis resulted in only one factor, which accounted for 97.7% of the total variation in all population variables. Thus, states vary generally in population, but do not have substantially different age distributions or gender distributions. Total population will be used to represent population in analysis where appropriate. VMT Variables A series of VMT variables including overall state VMT for each year, as well as VMT broken down by rural and urban road types, were available for analysis. Three factors were identified, but the first one accounts for 63% of the total variance and the second and third account for 8% and 7% respectively. The first factor generally reflects the magnitude of VMT for each state with an emphasis on urban VMT. The second reflects rural VMT variables, and the third is specific to VMT on rural freeways and rural local roads. Thus, urban VMT and rural VMT vary somewhat differently across states, but variation by road
Page 92 type is common across states. Based on these results, we selected total VMT (to capture the overall magnitude as in Factor 1) and proportion of VMT that is rural VMT (to capture rural-specific VMT as in Factors 2 and 3). Employment/Unemployment Variables Employment and unemployment for different age and gender groups consisted of 20 different variables. Factor analysis on these variables produce two factors that captured 68% and 15% of the variance, respectively. After rotation, Factor 1 primarily reflected average employment and unemployment (where employment had negative coefficients) with a greater weight on unemployment. Factor 2 primarily reflected employment alone and especially employment of older portions of the population (who, if retired, would affect the employment statistics but not the unemployment statistics). Although older adultsâ employment varies somewhat differently from employment and unemployment across the age groups, further investigation of the relationship between employment, unemployment and fatalities showed that unemployment among young people (ages 16-24) is more strongly associated with fatalities than for other age groups. Since unemployment in this group is correlated with other employment/unemployment statistics, it captures the overall magnitude as well as the specific variation in the younger population. Thus, we used only this statistic as the predictor representing employment and unemployment. Expenditures Expenditures per mile were available for the following categories: total, capital, administrative, safety, maintenance, HSIP apportioned and HSIP obligated. The factor analysis of these variables returned a single factor explaining 74% of the variance. This factor reflected the overall magnitude of spending per mile, which varies by state but is fairly consistent across all categories. Because of the safety application of this project, we used two categoriesâcapital expenditures and total safety expenditures including HSIPâto represent expenditures in our models.