**Suggested Citation:**"Chapter 7. Modeling." National Academies of Sciences, Engineering, and Medicine. 2019.

*Identification of Factors Contributing to the Decline of Traffic Fatalities in the United States from 2008 to 2012*. Washington, DC: The National Academies Press. doi: 10.17226/25590.

**Suggested Citation:**"Chapter 7. Modeling." National Academies of Sciences, Engineering, and Medicine. 2019.

*Identification of Factors Contributing to the Decline of Traffic Fatalities in the United States from 2008 to 2012*. Washington, DC: The National Academies Press. doi: 10.17226/25590.

**Suggested Citation:**"Chapter 7. Modeling." National Academies of Sciences, Engineering, and Medicine. 2019.

*Identification of Factors Contributing to the Decline of Traffic Fatalities in the United States from 2008 to 2012*. Washington, DC: The National Academies Press. doi: 10.17226/25590.

**Suggested Citation:**"Chapter 7. Modeling." National Academies of Sciences, Engineering, and Medicine. 2019.

**Suggested Citation:**"Chapter 7. Modeling." National Academies of Sciences, Engineering, and Medicine. 2019.

**Suggested Citation:**"Chapter 7. Modeling." National Academies of Sciences, Engineering, and Medicine. 2019.

**Suggested Citation:**"Chapter 7. Modeling." National Academies of Sciences, Engineering, and Medicine. 2019.

**Suggested Citation:**"Chapter 7. Modeling." National Academies of Sciences, Engineering, and Medicine. 2019.

**Suggested Citation:**"Chapter 7. Modeling." National Academies of Sciences, Engineering, and Medicine. 2019.

**Suggested Citation:**"Chapter 7. Modeling." National Academies of Sciences, Engineering, and Medicine. 2019.

**Suggested Citation:**"Chapter 7. Modeling." National Academies of Sciences, Engineering, and Medicine. 2019.

**Suggested Citation:**"Chapter 7. Modeling." National Academies of Sciences, Engineering, and Medicine. 2019.

**Suggested Citation:**"Chapter 7. Modeling." National Academies of Sciences, Engineering, and Medicine. 2019.

**Suggested Citation:**"Chapter 7. Modeling." National Academies of Sciences, Engineering, and Medicine. 2019.

**Suggested Citation:**"Chapter 7. Modeling." National Academies of Sciences, Engineering, and Medicine. 2019.

**Suggested Citation:**"Chapter 7. Modeling." National Academies of Sciences, Engineering, and Medicine. 2019.

**Suggested Citation:**"Chapter 7. Modeling." National Academies of Sciences, Engineering, and Medicine. 2019.

**Suggested Citation:**"Chapter 7. Modeling." National Academies of Sciences, Engineering, and Medicine. 2019.

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Page 42 Chapter 7. Modeling This section describes the statistical modeling effort that was used to better characterize the relationship between fatalities and potential factors that could explain the major drop in the number of fatalities after 2007. Given the nature of the data (i.e., random variables and unobserved heterogeneity), the statistical models are needed to reduce the noise associated with traffic fatalities. The data collected and described in Chapter 5 were used for the various analyses. The modeling approach was patterned after Elvik (2014), who reviewed a number of statistical methods for looking at predictors of fatalities over time in a group of countries. The three primary methods he discussed were negative binomial models of fatalities counts, and two forms of models of year-over-year change. Two of these methods were used for the current study: a Poisson-gamma count model (equivalent to a negative binomial model) and the log-change regression model of year-over-year change. The negative binomial model uses raw fatality counts and incorporates VMT as exposure. This means that coefficients of predictors can be interpreted as influencing fatalities per VMT, or a fatality rate in each state and year. Thus, though different factors can influence risk or exposure, in this model they are interpretable primarily as influencing risk rather than exposure. There are two key sources of variation in the dataset of raw counts. First, differences between states can be thought of as generally more stable differences in environmental, population, cultural, economic, and traffic-safety conditions. Second, changes over time within states are more transient. Some factors change very slowly and steadily (e.g., new-vehicle fleet penetration, belt use rates) and others are more volatile (economic factors) and can have significant changes over short periods of time. Because these factors may operate differently on travel and risk, two negative binomial regression models were developed. One uses a state fixed effect to remove the stable differences among states and focus on changes over time (model controlling for state, referred to below as MCS). The other leaves out this fixed effect, allowing differences between states to be captured by the measured predictors (model not controlling for state, referred to as MNCS in the analyses below). When the effect of a predictor is different in these two models, it can indicate that differences in that variable between states have a different effect than change over time (i.e., the effect is relative). When the effect of the predictor is the same in these two models, it can indicate that the predictor has a general effect on risk that transcends local experience. The difference in these mechanisms can be informative. The dataset for the change model is a set of differences from one year to the next within a state, expressed as a percentage of the previous year. Thus, all state-to-state variation is removed, as is the magnitude of each variable within a state. The remaining information is simply the percent change from year to year within a state. Here, predictors can influence exposure and/or risk.

Page 43 Before any modeling can begin, it is useful to address the high level of correlation among the variables assembled for this project. This is done using an analysis tool known as factor analysis, a data-reduction technique that helps identify patterns of correlation among many variables (as opposed to two at a time). The final predictor set chosen by this method captures the array of different factors of interest but with substantially less multi-collinearity that can create problems in interpretation of regression models. The analyses were separated into two parts. In the first part, an exploratory analysis of the data was conducted to identify trends and factors that seem to influence fatalities at the national level. This part also included a factor analysis, which was used to rank the magnitude of the correlation between independent and dependent variables. Using the results obtained from the exploratory analysis of the data and the factor analysis, the second part focused on the two more advanced statistical analyses. By looking at the problem from different angles, a clearer picture emerges of the most significant influences on traffic fatalities in the US over this time frame. The following sections provide details on the choice of predictors, the modeling approaches, and the results. Section 7.1 describes the results of the exploratory and factor analyses. Section 7.2 summarizes the results of the modeling effort for the two approaches. 7.1 Factor analysis to identify parameters for the models A significant challenge of working with this type of dataset is that many of the variables in the dataset are correlated. While modeling can be done with some collinearity among predictors, it tends to fail when those relationships are very strong. For example, population and VMT within a state are very closely related (r=0.98), so it is not feasible to include both as predictors. A related issue is that when predictors are correlated, it is not possible to statistically assign responsibility for the effect of the shared variance on the outcome. In other words, the analysis itself cannot point to whether VMT or population is responsible for differences in fatalities. In essence, these are the same thing. Instead, a sensible mechanistic argument must be made. In this case, the larger the population, the more people there are to drive, thus producing greater VMT. However, it is the VMT itself that exposes those people to the risk of being fatally injured in a crash. Hypothetically, if there were an entire state that had no roads or cars, it could have any size population and there would be no fatalities (in the traffic context). In contrast, if a state had 100 people who each drove 30,000 miles per year, they would have the same risk and the same expected number of fatalities (all other things being equal) as a state with 300 people who drive 10,000 miles a year. Note, however, that because population and VMT are so closely related in the real world, these types of inferences must be made by argument rather than through statistical results. This kind of argument will be made throughout the discussion of results, as analysis of these observational data cannot, in itself, address the causality of relationships among variables. To address the collinearity problem and understand the collective relationship among these variables, a series of factor analyses were conducted. Factor analysis is a method of data reduction that produces a smaller number of dimensions, each of which is a linear combination of the original variables. For this

Page 44 purpose, it is best to think of factor analysis as a way of identifying a subset of the original variables that should be used as predictors of fatality. Factor analysis helps us identify patterns of covariation and select a good predictor subset. (Note that factor analysis as applied here does not evaluate the relationship between any of the predictors and fatalities; that is done in a later step.) Factor analyses were done in groups of similar variables: expenditures, economic measures, population, and VMT. Details of the factor analyses are provided in Appendix A. Based on the results of the factor analysis as well as some univariate exploration of the relationship between each predictor and fatalities, the following variables were used as the predictor set for modeling. General size of population is represented by total VMT. This is correlated with GDP, total VMT, urban road miles, total population, law enforcement safety expenditures, and total capital expenditures. âRuralnessâ is represented by the proportion of VMT that is rural. This is also related to total rural road miles. The variables used to represent economic factors were unemployment for the 16-24 age group, unemployment for everyone else, and median income. Across this timespan, these are related to all other economic factors include employment for different age groups and unemployment for different age groups. State expenditures are represented by safety spending (safety plus HSIP) and capital spending. Occupant protection is represented by belt-use rates, strength of belt laws, strength of motorcycle laws, and the proportion of vehicles on the road that are model year 1991 or later. Alcohol-related causation is represented by total beer consumption and the strength of DUI laws. Finally, pump price was included as a unique predictor that did not fall into other categories. These variables and the variables they are most correlated with are summarized in Table 7-1. Table 7-1 Summary of predictors and correlations VariableÂ GroupÂ (HighâLevelÂ Concept)Â Variable(s)Â ChosenÂ forÂ AnalysisÂ OtherÂ Correlated*Â PredictorsÂ OverallÂ sizeÂ (ofÂ state)Â TotalÂ VMTÂ GDP,Â capitalÂ expenditures,Â â¦Â RuralnessÂ RuralÂ VMTÂ prop.Â TotalÂ VMT,Â capitalÂ &Â safetyÂ expenditures,Â medianÂ householdÂ income,Â beerÂ consumptionÂ perÂ capita.Â EconomyÂ GDP/capita,Â MedianÂ householdÂ income,Â UnemploymentÂ 16Â toÂ 24Â RuralÂ VMTÂ prop.,Â capitalÂ &Â safetyÂ expendituresÂ OccupantÂ ProtectionÂ BeltÂ use,Â beltÂ laws,Â motorcycleÂ helmetÂ laws,Â postÂ 1991Â modelÂ yearÂ penetrationÂ UnemploymentÂ 16Â toÂ 24,Â pumpÂ priceÂ StateÂ RoadwayÂ ExpendituresÂ CapitalÂ expenditures,Â safetyÂ expendituresÂ RuralÂ VMTÂ prop.,Â GDP/capita,Â medianÂ householdÂ income,Â Â AlcoholÂ BeerÂ consumptionÂ perÂ capita,Â DUIÂ lawsÂ RuralÂ VMTÂ prop.Â OtherÂ PumpÂ priceÂ Postâ1991Â modelÂ penetrationÂ *correlationÂ coefficientsÂ >Â Â±0.3Â

Page 45 7.2 Regression models This section describes a series of regression models that were fit. The first was a model to identify the factors that predicted VMT. Next are a set of negative binomial regression models to identify the factors affecting the risk of traffic fatalities. Finally, a log-change model is developed and discussed. This model relates year-over-year changes in predictor variables to changes in the number of traffic fatalities. Appendix B provides descriptive statistics on the variables used in the models. 7.2.1 Regression of factors on vehicle miles traveled During the period from 2007 to 2012, changes in VMT coincided with the change in fatalities. It is therefore logical to first identify the factors that influenced VMT, especially the factors that reflected economic performance. Separate linear regression models were developed for total VMT, urban VMT, and rural VMT. Prior to developing the regression models, a correlation analysis was performed to identify the factors that were correlated with VMT. Table 7-2 presents the results of correlation analysis. A value of +1 means that the variable was perfectly positively correlated with VMT. A value of -1 means that the variable was perfectly negatively correlated with VMT. A value of zero means that the variable had no linear relationship to VMT. The correlation coefficients suggest that population was the most influential factor followed by the unemployment rate in determining vehicle miles traveled. Table 7-2 Correlation of VMT with different variables TotalÂ PopulationÂ UnemploymentÂ ofÂ ageÂ 16Â toÂ 24Â UnemploymentÂ ofÂ ageÂ 25Â andÂ aboveÂ PumpÂ priceÂ GDPÂ perÂ capitaÂ MedianÂ incomeÂ TotalÂ VMTÂ 0.98Â 0.19Â 0.22 â0.04Â 0.07 â0.02Â UrbanÂ VMTÂ 0.98Â 0.17Â 0.23Â 0.00Â 0.12Â 0.07Â RuralÂ VMTÂ 0.81Â 0.19Â 0.15 â0.14 â0.12 â0.27Â Population was most strongly correlated with VMT, measured both as total VMT and urban VMT. It is interesting to note that the correlation between rural VMT and total population was strong, but in practical terms much less than total or urban VMT. States with larger populations almost invariably had larger total VMT and urban VMT. However, the association with rural VMT was less strong, likely because some states with lower populations were also more rural. Note also that there were weak associations between unemployment for the two age groups considered and the different measures of VMT. There was some tendency for states with higher rates of unemployment to also have higher amounts of VMT. Big states in terms of population (and therefore VMT) also may have had higher rates of unemployment. The correlation of the different measures of VMT with pump price was almost non-existent, though there was a weak, negative association with rural VMT; similarly, there was a moderate, weak correlation between rural VMT and median income. States with more rural VMT tended to have lower median incomes and lower GDP per capita.

Page 46 Total VMT A linear regression model was developed with total VMT in millions as the dependent variable and population and unemployment rate as independent variables. Other variables were not considered because the correlation coefficients shown in Table 7-2 were almost zero. Given the strong correlation between the unemployment rates for the two age groups, each one was included independently and the one that gave the best fit to the data was chosen for inclusion in the model. In the total VMT model, unemployment rate of age 25 and above provided better fit. The variables with corresponding p-values less than 0.1 can be considered statistically significant (at the significance level ï¡ = 0.1). The R2 statistic for the total VMT model was 0.97, which shows that the model was able to predict the VMT almost perfectly. The negative coefficient of the unemployment rate in Table 7-3 shows that as the unemployment rate increased, total VMT decreased. The primary factors determining total VMT were population and the state of the economy, as reflected by the unemployment rate. Table 7-3 Parameter estimates for the total VMT model VariableÂ EstimateÂ StandardÂ errorÂ PâvalueÂ InterceptÂ 7075.39Â 1210.30Â <.0001Â TotalÂ populationÂ 0.00898Â 0.0001Â <.0001Â UnemploymentÂ rateÂ ofÂ ageÂ 25Â andÂ aboveÂ â425.43Â 235.21Â 0.071Â R2Â statisticÂ 0.97Â Â Figure 7-1 shows the comparison of the total VMT predicted by the model with the reported VMT from 2001 to 2012. Each of the circles in the graph represent one observation, i.e., one state in one year. Figure 7-1 Predicted total VMT versus reported total VMT, 2001-2012

Page 47 Urban VMT To capture the effect of the economic variables by area type, a linear regression model was developed with urban VMT in millions as the dependent variable and population, unemployment rate, and GDP per capita as independent variables. Other variables were not considered because the correlation coefficients shown in Table 7-2 were almost zero. In the urban VMT model, unemployment rate of age 16 to 24 provided better fit. The R2 statistic for the urban VMT model was 0.97, which shows that the model was able to predict the VMT almost perfectly. The negative coefficient of the unemployment rate in Table 7-4 shows that as the unemployment rate increased, the urban VMT decreased. Although statistically insignificant at the 10% level, the positive coefficient for the GDP per capita shows that the states with higher GDP tended to have more urban travel than states with lower GDP per capita. As in the case of total VMT, the primary determinants of urban VMT were total population and the state of the economy, as measured by the unemployment rate and GDP per capita. Table 7-4 Parameter estimates for the urban VMT model VariableÂ EstimateÂ StandardÂ errorÂ PâvalueÂ Intercept â2588.33Â 2609.50Â 0.322Â TotalÂ populationÂ 0.00705Â 0.0001Â <.0001Â UnemploymentÂ rateÂ ofÂ ageÂ 16Â toÂ 24 â179.40Â 100.70Â 0.075Â GDPÂ perÂ capitaÂ inÂ 2013Â dollarsÂ 0.0140Â 0.0328Â 0.670Â R2Â statisticÂ 0.97Â Figure 7-2 shows the comparison of the urban VMT predicted by the model with the reported urban VMT from 2001 to 2012. Figure 7-2 Predicted urban VMT versus reported urban VMT, 2001-2012

Page 48 Rural VMT More factors were found to contribute to the amount of rural VMT, and the model was less strong. The correlation coefficients in Table 7-2 showed that almost all variables had some correlation with the amount of rural VMT. A linear regression model was developed with rural VMT in millions as the dependent variable and population, unemployment rate, pump price (total gasoline price, including tax, in 2013 dollars), GDP per capita, and median income as independent variables. In the rural VMT model, unemployment rate of age 25 and above provided better fit. The R2 statistic for the rural VMT model was 0.77, which shows that although the model predicted VMT accurately, the model did not account for all the variation in rural VMT. The negative coefficients for the unemployment rate and pump price in Table 7-4 implied that as the quantity of these variables increased, rural VMT decreased. The negative coefficient for GDP per capita and median income showed that states with lower GDP and median income tended to have more rural travel than other states. This may imply that states with more constrained economies and lower median incomes tended to be more rural. Table 7-5 Parameter estimates for the rural VMT model VariableÂ EstimateÂ StandardÂ errorÂ PâvalueÂ InterceptÂ 54822Â 2661.66Â <.0001Â TotalÂ populationÂ 0.00201Â 0.0001Â <.0001Â UnemploymentÂ rateÂ ofÂ ageÂ 25Â andÂ aboveÂ â674.35Â 174.33Â <.0001Â PumpÂ priceÂ â3541.08Â 522.25Â <.0001Â GDPÂ perÂ capitaÂ inÂ 2013Â dollarsÂ â0.140Â 0.039Â 0.0004Â MedianÂ incomeÂ â0.458Â 0.053Â <.0001Â R2Â statisticÂ 0.77Â Â Figure 7-3 shows the comparison of the rural VMT predicted by the model with the reported rural VMT from 2001 to 2012.

Page 49 Figure 7-3 Predicted rural VMT versus reported rural VMT, 2001-2012 7.2.2 Regression of factors on fatalities The models estimated in this part of the work were based on count data regression methodologies (Cameron and Trivedi 1998; Hilbe 2014). The most basic count data models are the Poisson and Poisson- gamma (also known as the negative binomial, or NB). Both models belong to the family of generalized linear models. For the Poisson model to be appropriate for a given dataset, the mean has to equal the variance. However, in practice, it has been found that count data often exhibit over-dispersion, meaning that the variance is larger than the mean (Lord, Washington et al. 2005). On rare occasions, the data or modeling output may show characteristics of under-dispersion (Oh, Washington et al. 2006; Lord, Geedipally et al. 2010). To overcome the problem related to the over-dispersion, the Poisson-gamma model has been proposed as a viable alternative to the Poisson model (Hilbe 2011). The Poisson-gamma model has become very popular because it has a closed-form equation (i.e., the mathematical manipulations can be done by hand; for example, it can be shown that the Poisson-gamma mixture leads to the NB distribution), and the mathematics to manipulate the relationship between the mean and the variance is relatively simple (Lord and Mannering 2010). Furthermore, most statistical software packages have incorporated an NB function that simplifies the analysis of count data. The Poisson-gamma model in highway safety applications has been shown to have the following probabilistic structure: the number of crashes at the i-th entity (state) and t-th time period, Yit, when conditional on its mean, Î¸it, is assumed to be Poisson distributed and independent over all entities and time periods as follows (Miaou and Lord 2003): )(~| ititit PoY ï±ï± i = 1, 2, â¦, I and t = 1, 2, â¦, T Eq. 2

Page 50 The mean of the Poisson is structured as: )exp( ititit ï¥ïï± ï½ Eq. 3 Where, Î¼it = a function of the covariates (X) (e.g., ï¨ ï©0 1 1 2 2expit it it p itpX X Xï ï¢ ï¢ ï¢ ï¢ï½ ï« ï« ï« ï«ï where p is the number of covariates). Î² = a vector of unknown coefficients. Îµit = the model error independent of all the covariates. It is usually assumed that exp( itï¥ ) is independent and gamma distributed with a mean equal to 1 and a variance 1 / ï¦ for all i and t (here ï¦ is the inverse dispersion parameter, with ï¦ > 0). With this characteristic, it can be shown that Yit, conditional on Î¼it and ï¦ , is distributed as a Poisson-gamma random variable with a mean Î¼it and a variance )/1( ï¦ïï itit ï« , respectively. In this modeling work, each year for every state was considered a distinct observation. As discussed by Lord and Persaud (2000), analyzing time-series or panel data in this manner can create temporal or serial correlation. Random effects models and those estimated using the generalized estimating equations (GEE) can be used for handling serial correlation (Lord and Persaud 2000). However, after further investigation, it was determined that the serial correlation had a minimal impact on the modeling results. Hence, to simplify the modeling effort, the models were estimated using the generalized linear models (GLMs). Two types of regression models were estimated: 1) models not controlling for state (MNCS) (i.e., no state fixed-effect in the model), and 2) models controlling for state (MCS) (i.e., state is considered as a fixed effect in the model). Further, MCS models were developed using two different types of exposure: one with vehicle-mile traveled (VMT) as exposure, and the other with population as exposure. For the MCS models, each state has its own intercept. For the MNCS model type, the model tries to establish the general relationship between the number of fatalities and various risk factors without taking into account differences between states that were not captured in the predictor variables. This model captured less of the total variation, but may be able to better explain certain variables that have a direct influence on fatalities on a national scale, independent of each state. The MCS model type removes overall differences in fatality rates between states and focuses on accounting for how factors influence each stateâs fatality rate over time relative to the stateâs base rate. To accomplish this, each state had its own intercept, which captures the effect of variables that were unobserved as well as some that were observed. If the state- specific term was significantly different between states then there is variation between states that is not captured by the variables in the model. When model predictors differ between MNCS and MCS models, this indicates that the effect of a variable over time may be different than its effect between states. The

Page 51 latter is a more stable, long-term pattern, whereas the former is shorter-term and generally has a relative effect for residents of a given state. The functional form used for the MNCS is the following: VMT â Eq. 4 Where, ï = the estimated number of fatalities per year (for each state); VMT = the number of vehicle-mile traveled in millions (for each state); variable i; and, , = estimated coefficients. The functional form used for the MCS model is the following: Model 1: VMT â Eq. 5 Model 2: POP â Eq. 6 Where, state-specific parameter for state s; and, POP = population in millions (for each state). As shown above, two measures of exposure were used. In addition, the exposure variables were used as an offset in the models. What this means in practice is that the dependent variable in the models, the variable that the models were attempting to predict, was a rate, either fatalities per VMT or fatalities per population, depending on whether VMT or population were used as the offset. The two measures of exposure were the VMT for each state and the population of the state. Both have advantages and disadvantages. For VMT, the variable is a more direct measure of the exposure of vehicles traveling on the road network to crashes. On the other hand, the VMT values themselves are estimated values and, like many estimates, the VMT values may be prone to errors or could be unreliable in some cases. The state population variable is usually better estimated, because it is based on decennial censuses, with intercensal surveys, but it may not represent the number of vehicles traveling on the network, since population is not immediately affected by economic conditions, or differences in the amount of travel due to the physical size of states, population density, and other factors that may influence the amount of travel, as shown in the models of VMT, section 7.2.1. Noland and Sun preferred using population as the measure of exposure because they believe that the population estimates were generally more accurate than VMT estimates (Noland and Sun 2014). However, the current study also estimated a model with VMT as exposure.

Page 52 All models consider the following variables in each state: 1. Rural VMT as a percentage of total VMT. 2. Capital expenditures spent per highway mile, excluding capital safety expenditures, in the prior year (2013 dollars). 3. Safety Expenditures (capital expenditures related to safety, law enforcement, education, and HSIP obligations) per highway mile spent in the prior year (in 2013 dollars). 4. State Gross Domestic Product per capita. 5. Percent Unemployment for 16-24 year olds. 6. Total Gasoline Price (in 2013 dollars). 7. Beer consumption per capita. 8. DUI law rating. 9. Motorcycle helmet law rating. 10. Safety Belt law rating. 11. Safety belt usage rates. 12. Median income (2013 dollars). 13. Percent of the vehicle fleet with a model year of 1991 or newer. Note that these variables represent different cells in the Haddon Matrix. Capital and safety spending fall into the Environment domain and affect both the pre-crash (e.g. spending on infrastructure to reduce crash occurrence) and crash levels (e.g., spending on infrastructure features that reduce the severity of crashes). Elements of the safety expenditure variable also falls into the Human domain at the pre-crash level, insofar as law enforcement and educational programs affect who drives and how they drive. State GDP/capita, percent unemployed, median income, fuel prices, and beer consumption all are located in the pre-crash level of the Human domain, since the presumed mechanisms by which they are linked to safety are to affect decisions to drive and how (riskiness) to drive. Belt use, state belt rating, and motorcycle helmet rating fall into the Human domain at the crash level, insofar as they reflect and constrain the choices individuals make to protect themselves in crashes. Similarly, the variable to capture fleet penetration of crashworthiness features, captures the pre-crash and crash levels of the Vehicle domain of the Haddon matrix. Post-crash levels of the Haddon Matrix could not be incorporated, as it was not possible to obtain comprehensive data series on the factors represented there, such as representing advances in crash notification and post-crash medical care. The goodness-of-fit (GOF) of the models were assessed using the Akaike information criterion (AIC), and the error-based Mean Absolute Deviation (MAD) and Mean Squared Prediction Error (MSPE). Additional information about how the GOF criteria work can be found in (Lord and Park 2013). In general, the model with smaller AIC, MAD and MSPE values is considered superior to other models. The modeling results are summarized further below. In this section, the general characteristics of the results are discussed, including coefficient estimates, standard errors of the estimates, and p-values. P- values show the level of significance, in this case, the probability of obtaining the observed or more

Page 53 extreme results if the true value was zero, given the sample data. In observational studies, a p-value of 0.1 or below is often used as the threshold of statistical significance, meaning only a 10% (or less) chance of the obtained results if the true effect was zero, given the sample data. High p-values do not mean the parameter had no effect, just that the data were not sufficient to exclude that the parameter had no effect, at some reasonable level of confidence. With more data, a non-significant parameter may become significant. âNon-significantâ parameters were left in the models to measure the effect of all factors of interest, regardless of statistical significance. The detailed effects of each variable are discussed in Section 8.1. MNCS Model Table 7-6 below shows the modeling results for the MNCS model. As noted above, VMT was used as the exposure. For this model, three variables (DUI rating, belt rating, and motorcycle helmet rating) were found to be not statistically significant, even at 10% level. Table 7-6 Parameter estimates for the MNCS model, VMT offset VariableÂ EstimateÂ StandardÂ errorÂ PâvalueÂ ExponentiatedÂ parameterÂ InterceptÂ 11.7179Â 0.4082Â <.0001Â Â RuralÂ VMTÂ proportionÂ 0.2228Â 0.0667Â 0.0008Â 1.250Â CapitalÂ spendingÂ (inÂ $1000)Â 0.0009Â 0.0003Â 0.0007Â 1.001Â SafetyÂ spendingÂ (inÂ $1000)Â â0.0033Â 0.0013Â 0.0124Â 0.997Â GDPÂ perÂ capitalÂ (inÂ $10,000)Â 0.024Â 0.0106Â 0.0241Â 1.024Â UnemploymentÂ forÂ ageÂ 16Â toÂ 24Â (%)Â â0.0132Â 0.0025Â <.0001Â 0.987Â PumpÂ priceÂ ($Â perÂ gallon)Â â0.0475Â 0.0258Â 0.0651Â 0.954Â BeerÂ (gallons)Â 0.2769Â 0.0425Â <.0001Â 1.319Â DUIÂ ratingÂ â0.0032Â 0.0026Â 0.22Â 0.997Â BeltÂ ratingÂ 0.0008Â 0.007Â 0.9061Â 1.001Â MotorcycleÂ HelmetÂ ratingÂ 0.0065Â 0.0073Â 0.3732Â 1.007Â MedianÂ IncomeÂ (inÂ $10,000)Â â0.2149Â 0.0148Â <.0001Â 0.807Â Post1991Â (%Â ofÂ vehiclesÂ manufacturedÂ afterÂ 1991Â inÂ theÂ fleet)Â Â â0.0138Â 0.0047Â 0.0033Â 0.986Â DispersionÂ ParameterÂ 0.0277Â 0.0019Â ââÂ ââÂ AIC*Â 6537Â MAD*Â 93.07Â MSPE*Â 21185.56Â *Â SmallerÂ valuesÂ areÂ preferred.Â BoldÂ fontÂ denotesÂ pâvaluesÂ <=0.1.Â Â

Page 54 MCS Models Table 7-7 below shows the modeling results for the MCS model with the VMT used as an exposure. For this model, four variables (capital spending, safety spending, pump price and belt rating) were found to be not statistically significant, even at 0.1 level. Table 7-7 Parameter estimates for the MCS model with VMT as exposure VariableÂ EstimateÂ StandardÂ ErrorÂ PâvalueÂ ExponentiatedÂ parameterÂ InterceptÂ 10.6995Â 0.302Â <.0001Â Â RuralÂ VMTÂ proportionÂ â0.1916Â 0.0972Â 0.0486Â 0.826Â CapitalÂ spendingÂ (inÂ $1000Â perÂ mile)Â â0.0002Â 0.0002Â 0.2255Â 1.000Â SafetyÂ spendingÂ (inÂ $1000Â perÂ mile)Â â0.0006Â 0.001Â 0.5176Â 0.999Â GDPÂ perÂ capitaÂ (inÂ $10,000)Â 0.046Â 0.0117Â <.0001Â 1.047Â UnemploymentÂ forÂ ageÂ 16Â toÂ 24Â (%)Â â0.0118Â 0.0015Â <.0001Â 0.988Â PumpÂ priceÂ ($Â perÂ gallon)Â 0.0065Â 0.0125Â 0.6034Â 1.007Â BeerÂ perÂ capitaÂ (gallons)Â 0.4022Â 0.0747Â <.0001Â 1.495Â DUIÂ ratingÂ â0.0074Â 0.003Â 0.0119Â 0.993Â BeltÂ ratingÂ â0.0058Â 0.0073Â 0.4274Â 0.994Â MotorcycleÂ HelmetÂ ratingÂ â0.0347Â 0.0156Â 0.0261Â 0.966Â MedianÂ IncomeÂ (inÂ $10,000)Â 0.0375Â 0.0188Â 0.0459Â 1.038Â Post1991Â (%Â ofÂ vehiclesÂ manufacturedÂ afterÂ 1991Â inÂ theÂ fleet)Â Â â0.0177Â 0.0026Â <.0001Â 0.982Â DispersionÂ ParameterÂ 0.0025Â 0.0003Â ââÂ ââÂ AIC*Â 5643Â MAD*Â 35.80Â MSPE*Â 3165.32Â *Â SmallerÂ valuesÂ areÂ preferred.Â BoldÂ fontÂ denotesÂ pâvaluesÂ <=0.1.Â Note:Â StateÂ fixedÂ effectÂ parametersÂ areÂ presentedÂ inÂ AppendixÂ C.Â Â Â Â Table 7-8 below shows the modeling results for the MCS model with population as an exposure. For this model, three variables (rural VMT proportion, safety spending, and pump price) were found to be not statistically significant, even at 0.1 level.

Page 55 Table 7-8 Parameter estimates for the MCS model with population as exposure VariableÂ EstimateÂ StandardÂ ErrorÂ PâvalueÂ ExponentiatedÂ parameterÂ InterceptÂ 5.3846Â 0.3306Â <.0001 RuralÂ VMTÂ proportion â0.0249Â 0.1068Â 0.8157Â 0.975Â CapitalÂ spendingÂ (inÂ $1000Â perÂ mile) â0.0006Â 0.0002Â 0.0043Â 0.999Â SafetyÂ spendingÂ (inÂ $1000Â perÂ mile) â0.0015Â 0.0011Â 0.156Â 0.999Â GDPÂ perÂ capitaÂ (inÂ $10,000)Â 0.1815Â 0.0124Â <.0001Â 1.199Â UnemploymentÂ forÂ ageÂ 16Â toÂ 24Â (%) â0.0122Â 0.0016Â <.0001Â 0.988Â PumpÂ priceÂ ($Â perÂ gallon)Â 0.0039Â 0.0137Â 0.7759Â 1.004Â BeerÂ perÂ capitaÂ (gallons)Â 0.2376Â 0.0813Â 0.0035Â 1.268Â DUIÂ rating â0.0154Â 0.0032Â <.0001Â 0.985Â BeltÂ rating â0.0254Â 0.008Â 0.0015Â 0.975Â MotorcycleÂ HelmetÂ rating â0.0466Â 0.0174Â 0.0075Â 0.954Â MedianÂ IncomeÂ (inÂ $10,000) â0.0393Â 0.0205Â 0.0553Â 0.961Â Post1991Â (%Â ofÂ vehiclesÂ manufacturedÂ afterÂ 1991Â inÂ theÂ fleet)Â Â â0.0078Â 0.0028Â 0.006Â 0.992Â DispersionÂ ParameterÂ 0.0034Â 0.0004Â ââÂ ââÂ AIC*Â 5785Â MAD*Â 38.16Â MSPE*Â 3673.70Â * SmallerÂ valuesÂ areÂ preferred. BoldÂ fontÂ denotesÂ pâvaluesÂ <=0.1. Model Comparisons The GOF statistics presented from Table 7-6 to Table 7-8 show that the MCS model with VMT as exposure provides best fit to the data. Figure 7-4 depicts how each model compared to the actual number of fatalities observed across the United States from 2007 to 2012. The model values were derived by running the parameter values for each state and summing them to the total found in the United States. The highest level of fidelity was provided by the MCS model with VMT as the exposure, which tracked fatalities quite closely. The MCS model with population as the exposure also tracked fatalities well, but did not reflect the continued decrease in fatalities in 2011 or the rise in fatalities in 2012. Given the underperformance when compared to the MCS model using VMT, the population model will not be discussed in greater detail. Although population estimates may be generally more accurate than VMT estimates, as argued by (Noland and Sun 2014), the results here show that VMT provides a better fit to the data. The MNCS model reflected the trend of crashes, but it is clear that the combined effects of the

Page 56 parameters did not capture all the variation in fatalities over the focus period. The model generally under- predicted the number of traffic fatalities, indicating that other influential factors were not accounted for in the model. However, this model was still valuable because it may be able to better explain certain variables that had a direct influence on fatalities on a national scale, independent of each state. Figure 7-4 Model predictions versus actual fatalities, 2007-2012 7.2.3 Model of annual change in factor levels on the annual change in fatalities. An alternative approach suggested by Elvik (2015) was also used. This approach, referred to as the âchange modelâ, translates the data into percent change from year to year within each state. Thus, the data begin with 2002, for which each independent and dependent variable is represented by its percent change compared to the previous year (Eq. 7). Eq. 7 Where: x=any original variable in the model z is the transformed change variable xt is the value of x in a given year xt-1 is the value in the prior year Taking the log to convert to a linear model, which is mathematically more straightforward to deal with (Eq. 8).

Page 57 Eq. 8 Where yt=number of traffic fatalities in year t yt-1=number of traffic fatalities in year t-1 zt=transformed change variable , = estimated coefficients. When exponentiated, the coefficients ( s) in this model can be interpreted as multipliers on how the rate of change in a predictor influences the rate of change in fatalities (Eq. 9). Eq. 9 A key quality of the change model is that it removes overall differences between states on all variables. Large states may have larger numbers of fatalities, larger expenditures, and larger numbers of miles driven, but change in those states is proportional. Small states do, however, produce more volatile change values because of the smaller samples. The predictors in this model were slightly different from those in the count models, but they capture the same basic sources of variance and represent the same dimensions of the Haddon matrix. The predictors were selected to include: change in total VMT, change in proportion of rural VMT out of all VMT, change in pump price, change in GDP per capita, change in median income, change in unemployment for 16-24 yr-olds, change in capital spending per mile (lagged one year), change in safety spending per mile (lagged one year), change in belt use rate, change in DUI law rating, change in motorcycle helmet law rating, change in beer consumption, change in wine consumption, and change in the proportion of vehicles on the road with model year newer than 1991. Multiple linear regression was run using SAS 9.4 PROC GLM to predict change in fatalities based on change in the predictors (above). Parameter estimates are shown in Table 7-9, along with standard errors of the estimates and p-values.

Page 58 Table 7-9 Parameter estimates for change model VariableÂ EstimateÂ StandardÂ ErrorÂ PâvalueÂ ExponentiatedÂ parameterÂ Intercept â0.011Â 0.008Â 0.1719 TotalÂ VMTÂ chngÂ 0.540Â 0.188Â 0.0042Â 1.716Â PropÂ RuralÂ VMTÂ chngÂ 0.022Â 0.061Â 0.7171Â 1.022Â PumpÂ priceÂ chng â0.024Â 0.040Â 0.5597Â 0.976Â GDPÂ perÂ capÂ chngÂ 0.128Â 0.067Â 0.0552Â 1.137Â MedianÂ IncomeÂ chngÂ 0.505Â 0.155Â 0.0012Â 1.657Â 16â24Â UnempÂ chng â0.138Â 0.026Â <.0001Â 0.871Â CapÂ spend/mileÂ (lag)Â chng â0.008Â 0.022Â 0.7084Â 0.992Â safetyÂ spend/mileÂ (lag)Â chngÂ 0.011Â 0.014Â 0.4071Â 1.011Â BeltÂ useÂ rateÂ chng â0.051Â 0.122Â 0.6748Â 0.950Â DUIÂ lawÂ ratingÂ chng â0.181Â 0.093Â 0.0521Â 0.834Â MotorcycleÂ helmetÂ lawÂ ratingÂ chng â0.013Â 0.100Â 0.8986Â 0.987Â BeerÂ consumptionÂ chngÂ 0.170Â 0.141Â 0.2304Â 1.185Â WineÂ consumptionÂ chng â0.029Â 0.087Â 0.7385Â 0.971Â MY>1991Â chngÂ 0.057Â 0.550Â 0.9179Â 1.059Â RâsquareÂ 0.168Â Adj.Â RâsquareÂ 0.144Â BoldÂ fontÂ denotesÂ pâvaluesÂ <=0.1. Model diagnostics indicated that the assumptions of linear regression were generally met. However, overall R-square was relatively low with only 16.8% of the total variance accounted for. This indicates that additional processes beyond those included influenced the specific change observed. However, these processes are not likely to have been captured in any measures available to us. To interpret the coefficients, the effect of change in each predictor was observed over the period of interest: 2007-2011. Holding all other variables constant, the parameter estimates were used to calculate the percent change in fatalities that was associated with each predictor individually across the range of change seen in that time period. The results are in Table 7-10.

Page 59 Table 7-10 Effects of change-model predictors for 2007-2011 VariableÂ ParameterÂ 2007Â MeanÂ 2011Â MeanÂ PercentÂ changeÂ inÂ predictorÂ 2007â2011Â PercentÂ changeÂ inÂ predictedÂ fatalitiesÂ Â 2007â2011Â InterceptÂ â0.011Â Â Â Â Â TotalÂ VMTÂ 0.540Â 3,031,124Â 2,962,740Â â2.3%Â â1.2%Â ProportionÂ ruralÂ VMTÂ 0.022Â 0.33Â 0.32Â â1.6%Â â0.1%Â PumpÂ priceÂ changeÂ â0.024Â 3.11Â 3.20Â 2.6%Â â0.1%Â GDPÂ perÂ capÂ changeÂ 0.128Â 59,687Â 54,519Â â7.5%Â â1.2%Â MedianÂ IncomeÂ changeÂ 0.505Â 56,081Â 53,621Â â4.3%Â â2.2%Â 16â24Â UnempÂ changeÂ â0.138Â 10.59Â 16.69Â 55.7%Â â6.1%Â CapitalÂ spend/mileÂ (lag)Â changeÂ â0.008Â 73.69Â 81.27Â 7.9%Â â0.1%Â SafetyÂ spend/mileÂ (lag)Â changeÂ 0.011Â 13.61Â 14.68Â 9.3%Â 0.1%Â BeltÂ useÂ rateÂ changeÂ â0.051Â 85.77Â 88.10Â 2.4%Â â0.1%Â DUIÂ lawÂ ratingÂ changeÂ â0.181Â 19.77Â 20.50Â 4.0%Â â0.7%Â MotorcycleÂ helmetÂ lawÂ ratingÂ changeÂ â0.013Â 2.91Â 2.91Â 0.0%Â 0.0%Â BeerÂ consumptionÂ changeÂ 0.170Â 1.21Â 1.15Â â3.5%Â â0.7%Â WineÂ consumptionÂ changeÂ â0.029Â 0.37Â 0.39Â 5.0%Â â0.1%Â MY>1991Â changeÂ 0.057Â 95.80Â 97.11Â 1.4%Â 0.1%Â Â