Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
42 This chapter describes the statistical modeling used to better characterize the relationship between fatalities and potential factors that could explain the major drop in fatalities after 2007. Given the nature of the data (i.e., random variables and unobserved heterogeneity), statistical models are needed to isolate key patterns from random noise associated with traffic fatalities. The data collected and described in Chapter 5 were used for the analyses. The modeling approach was patterned after Elvik (2013), who reviewed a number of statisti- cal methods for looking at predictors of fatalities over time in a group of countries. The three main methods he discussed were negative binomial models of fatalities counts, and two forms of models of year-over-year change. Two of these methods were used for the current study: a Poisson-gamma count model (equivalent to a negative binomial model) and the log-change regression model of year-over-year change. The negative binomial model uses raw fatality counts and incorporates VMT as exposure. This means that coefficients of predictors can be interpreted as influencing fatalities per VMT, or a fatality rate in each state and year. Thus, although different factors can influence risk or exposure, in this model they are interpretable primarily as influencing risk rather than exposure. Two key sources of variation exist in the data set of raw counts. First, differences between states can be thought of as generally more stable differences in environmental, population, cultural, economic, and traffic safety conditions. Second, changes over time within states are more transient. Some factors change very slowly and steadily (e.g., new-vehicle fleet penetration, belt-use rates) and others are more volatile (economic factors) and can have significant changes over short periods of time. Because these factors may operate differently on travel and risk, two negative binomial regres- sion models were developed. One uses a state fixed effect to remove the stable differences among states and focus on changes over time (model controlling for state, or MCS). The other leaves out this fixed effect, allowing differences between states to be captured by the measured pre- dictors (model not controlling for state, or MNCS). When the effect of a predictor is different in these two models, it can indicate that differences in that variable between states have a different effect than change over time (i.e., the effect is relative). When the effect of the pre- dictor is the same in these two models, it can indicate that the predictor has a general effect on risk that transcends local experience. The difference in these mechanisms can be informative. The data set for the change model is a set of differences from one year to the next within a state, expressed as a percentage of the previous year. Thus, all state-to-state variation is removed, as is the magnitude of each variable within a state. The remaining information is simply the percent change from year to year within a state. Here, predictors can influence exposure and/or risk. C H A P T E R 7 Modeling
Modeling 43 Before any modeling can begin, it is useful to address the high level of correlation among the variables assembled for this project. This is done using an analysis tool known as factor analysis, a data-reduction technique that helps identify patterns of correlation among many variables (as opposed to two at a time). The final predictor set chosen by this method captures the array of different factors of interest but with substantially less multi-collinearity that can create problems in interpretation of regression models. The analyses were separated into two parts. In the first part, an exploratory analysis of the data was conducted to identify trends and factors that seem to influence fatalities at the national level. This part also included a factor analysis, which was used to rank the magnitude of the correlation between independent and dependent variables. Using the results obtained from the exploratory analysis of the data and the factor analysis, the second part focused on the two more advanced statistical analyses. By looking at the problem from different angles, a clearer picture emerges of the most sig- nificant influences on traffic fatalities in the United States over this time frame. The following sections provide details on the choice of predictors, the modeling approaches, and the results. Section 7.1 describes the results of the exploratory and factor analyses. Section 7.2 summarizes the results of the modeling effort for the two approaches. 7.1 Factor Analysis to Identify Parameters for the Models A significant challenge of working with this type of data set is that many of the variables in the data set are correlated. While modeling can be done with some collinearity among predictors, it tends to fail when those relationships are very strong. For example, population and VMT within a state are very closely related (r = 0.98), so it is not feasible to include both as predictors. A related issue is that when predictors are correlated, it is not possible to assign respon- sibility statistically for the effect of the shared variance on the outcome. In other words, the analysis itself cannot point to whether VMT or population is responsible for differences in fatalities. In essence, these are the same thing. Instead, a sensible mechanistic argument must be made. In this case, the larger the population, the more people there are to drive, thus pro- ducing greater VMT. However, it is the VMT itself that exposes those people to the risk of being fatally injured in a crash. Hypothetically, if there were an entire state that had no roads or cars, it could have any size population and there would be no traffic fatalities. In contrast, if a state had 100 people who each drove 30,000 miles per year, they would have the same risk and the same expected number of fatalities (all other things being equal) as a state with 300 people who drive 10,000 miles a year. Note, however, that because population and VMT are so closely related in the real world, these types of inferences must be made by argument rather than through statistical results. This kind of argument will be made throughout the dis- cussion of results, as analysis of these observational data cannot, in itself, address the causality of relationships among variables. To address the collinearity problem and understand the collective relationship among these variables, a series of factor analyses was conducted. Factor analysis is a method of data reduction that produces a smaller number of dimensions, each of which is a linear combination of the origi- nal variables. For this purpose, it is best to think of factor analysis as a way of identifying a subset of the original variables that should be used as predictors of fatality. Factor analysis helps us identify patterns of covariation and select a good predictor subset. (Note that factor analysis as applied here does not evaluate the relationship between any of the predictors and fatalities; that is done
44 Identification of Factors Contributing to the Decline of Traffic Fatalities in the United States from 2008 to 2012 in a later step.) Factor analyses were done in groups of similar variables: expenditures, economic measures, population, and VMT. Details of the factor analyses are provided in Appendix A. Based on the results of the factor analysis, as well as some univariate exploration of the rela- tionship between each predictor and fatalities, the following variables were used as the predictor set for modeling. General size of population is represented by total VMT. This is correlated with GDP, total VMT, urban road miles, total population, law enforcement safety expenditures, and total capital expenditures. âRuralnessâ is represented by the proportion of VMT that is rural. This is also related to total rural road miles. The variables used to represent economic factors were unemployment for the 16â24 age group, unemployment for everyone else, and median income. Across the modelâs time span, these are related to all other economic factors, including employment and unemployment for different age groups. State expenditures are represented by safety spending (safety plus HSIP) and capital spending. Occupant protection is represented by belt-use rates, strength of belt laws, strength of motorcycle laws, and the proportion of vehicles on the road that are model year 1991 or later. Alcohol-related causation is represented by total beer consumption and the strength of DUI laws. Finally, pump price was included as a unique predictor that did not fall into other categories. These variables and the variables with which they are most correlated are summarized in Table 7-1. 7.2 Regression Models This section describes a series of regression models that were fit. The first was a model to identify the factors that predicted VMT. Next is a set of negative binomial regression models to identify the factors affecting the risk of traffic fatalities. Finally, a log-change model is developed and discussed. This model relates year-over-year changes in predictor variables to changes in the number of traffic fatalities. Appendix B provides descriptive statistics on the variables used in the models. Variable group (high-level concept) Variable(s) chosen for analysis Other correlated* predictors Overall size (of state) Total VMT GDP, capital expenditures, â¦ Ruralness Rural VMT proportion Total VMT, capital and safety expenditures, median household income, beer consumption per capita Economy GDP/capita, median household income, unemployment 16 to 24 Rural VMT prop., capital and safety expenditures Occupant protection Belt use, belt laws, motorcycle helmet laws, post-1991 model year penetration Unemployment 16 to 24, pump price State roadway expenditures Capital expenditures, safety expenditures Rural VMT proportion, GDP/capita, median household income, Alcohol Beer consumption per capita, DUI laws Rural VMT proportion Other Pump price Post-1991 model penetration *correlation coefficients > Â±0.3 Table 7-1. Summary of predictors and correlations.
Modeling 45 7.2.1 Regression of Factors on VMT During the period from 2007 to 2012, changes in VMT coincided with the change in fatalities. It is therefore logical to first identify the factors that influenced VMT, especially the factors that reflected economic performance. Separate linear regression models were devel- oped for total VMT, urban VMT, and rural VMT. Prior to developing the regression models, a correlation analysis was performed to identify the factors that were correlated with VMT. Table 7-2 presents the results of correlation analysis. A value of +1 means the variable was per- fectly positively correlated with VMT. A value of â1 means the variable was perfectly negatively correlated with VMT. A value of 0 means the variable had no linear relationship to VMT. The correlation coefficients suggest that population was the most influential factor, followed by the unemployment rate, in determining VMT. Population was most strongly correlated with VMT, measured both as total VMT and urban VMT. It is interesting that the correlation between rural VMT and total population was strong, but in practical terms much less than total or urban VMT. States with larger populations almost invariably had larger total VMT and urban VMT. However, the associa- tion with rural VMT was less strong, likely because some states with lower populations were also more rural. Note also that there were weak associations between unemployment for the two age groups considered and the different measures of VMT. There was some tendency for states with higher rates of unemployment to also have higher amounts of VMT. Big states in terms of population (and therefore VMT) also may have had higher rates of unemployment. The correlation of the different measures of VMT with pump price was almost nonexistent, though there was a weak, negative association with rural VMT; similarly, there was a moder- ate, weak correlation between rural VMT and median income. States with more rural VMT tended to have lower median incomes and lower GDP per capita. 220.127.116.11 Total VMT A linear regression model was developed with total VMT in millions as the dependent variable and population and unemployment rate as independent variables. Other variables were not considered because the correlation coefficients shown in Table 7-2 were almost 0. Given the strong correlation between the unemployment rates for the two age groups, each one was included independently, and the one that gave the best fit to the data was chosen for inclusion in the model. In the total VMT model, unemployment rate of age 25 and above provided a better fit. The variables with corresponding p-values less than 0.1 can be considered statistically significant (at the significance level Î± = 0.1). The R2 statistic for the total VMT model was 0.97, which shows that the model was able to predict the VMT almost perfectly. The negative coefficient of the unemployment rate in Table 7-3 shows that as the unemployment rate increased, total VMT decreased. The primary factors determining total VMT were population and the state of the economy, as reflected by the unemployment rate. Total population Unemployment of age 16 to 24 Unemployment of age 25 and above Pump price GDP per capita Median income Total VMT 0.98 0.19 0.22 â0.04 0.07 â0.02 Urban VMT 0.98 0.17 0.23 0.00 0.12 0.07 Rural VMT 0.81 0.19 0.15 â0.14 â0.12 â0.27 Table 7-2. Correlation of VMT with different variables.
46 Identification of Factors Contributing to the Decline of Traffic Fatalities in the United States from 2008 to 2012 Figure 7-1 shows the comparison of the total VMT predicted by the model with the reported VMT from 2001 to 2012. Each of the circles in the graph represents one observation, i.e., one state in one year. 18.104.22.168 Urban VMT To capture the effect of the economic variables by area type, a linear regression model was developed with urban VMT in millions as the dependent variable and population, unemploy- ment rate, and GDP per capita as independent variables. Other variables were not considered because the correlation coefficients shown in Table 7-2 were almost 0. In the urban VMT model, unemployment rate of age 16 to 24 provided better fit. The R2 statistic for the urban VMT model was 0.97, which shows that the model was able to predict the VMT almost perfectly. The negative coefficient of the unemployment rate in Table 7-4 shows that as the unemployment rate increased, the urban VMT decreased. Although statistically insignificant at the 10% level, the positive coefficient for the GDP per capita shows that the states with higher GDP tended to have more urban travel than states with lower GDP per capita. As with total VMT, the primary determinants of urban VMT were total population and the state of the economy, as measured by the unemployment rate and GDP per capita. Figure 7-2 shows the comparison of the urban VMT predicted by the model with the reported urban VMT from 2001 to 2012. 22.214.171.124 Rural VMT More factors were found to contribute to the amount of rural VMT, and the model was less strong. The correlation coefficients in Table 7-2 showed that almost all variables had some Figure 7-1. Predicted total VMT versus reported total VMT, 2001â2012. Variable Estimate Standard error P-value Intercept 7075.39 1210.30 <.0001 Total population 0.00898 0.0001 <.0001 Unemployment rate of age 25 and above â425.43 235.21 0.071 R2 statistic 0.97 Table 7-3. Parameter estimates for the total VMT model.
Modeling 47 Variable Chapter 8 Estimate Chapter 9 Standard error Chapter 10 P-value Intercept â2588.33 2609.50 0.322 Total population 0.00705 0.0001 <.0001 Unemployment rate of age 16 to 24 â179.40 100.70 0.075 GDP per capita in 2013 dollars 0.0140 0.0328 0.670 R2 statistic 0.97 Table 7-4. Parameter estimates for the urban VMT model. Figure 7-2. Predicted urban VMT versus reported urban VMT, 2001â2012. correlation with the amount of rural VMT. A linear regression model was developed with rural VMT in millions as the dependent variable and population, unemployment rate, pump price (total gasoline price, including tax, in 2013 dollars), GDP per capita, and median income as independent variables. In the rural VMT model, unemployment rate of age 25 and above pro- vided better fit. The R2 statistic for the rural VMT model was 0.77, which shows that although the model predicted VMT accurately, the model did not account for all the variation in rural VMT. The negative coefficients for the unemployment rate and pump price in Table 7-5 imply that as the quantity of these variables increased, rural VMT decreased. The negative coefficient for GDP per capita and median income showed that states with lower GDP and median income tended to have more rural travel than other states. This may imply that states with more constrained economies and lower median incomes tended to be more rural. Figure 7-3 shows the comparison of the rural VMT predicted by the model with the reported rural VMT from 2001 to 2012. 7.2.2 Regression of Factors on Fatalities The models estimated in this part of the work were based on count data regression methodologies (Cameron and Trivedi 1998; Hilbe 2014). The most basic count data models are the Poisson and Poisson-gamma (also known as the negative binomial, or NB). Both models belong to the family of generalized linear models. For the Poisson model to be appropriate for a
48 Identification of Factors Contributing to the Decline of Traffic Fatalities in the United States from 2008 to 2012 given data set, the mean has to equal the variance. However, in practice, it has been found that count data often exhibit overdispersion, meaning that the variance is larger than the mean (Lord, Washington et al. 2005). On rare occasions, the data or modeling output may show characteristics of underdispersion (Oh, Washington et al. 2006; Lord, Geedipally et al. 2010). To overcome the problem related to the overdispersion, the Poisson-gamma model has been proposed as a viable alternative to the Poisson model (Hilbe 2011). The Poisson-gamma model has become very popular because it has a closed-form equation (i.e., the mathematical manipulations can be done by hand; for example, it can be shown that the Poisson-gamma mixture leads to the NB distribution), and the mathematics to manipulate the relationship between the mean and the variance are relatively simple (Lord and Mannering 2010). Further- more, most statistical software packages have incorporated an NB function that simplifies the analysis of count data. The Poisson-gamma model in highway safety applications has been shown to have the following probabilistic structure: the number of crashes at the i-th entity (state) and t-th time period, Yit, when conditional on its mean, qit, is assumed to be Poisson distributed and independent over all entities and time periods as follows (Miaou and Lord 2003): Y Po i I t Tit it itïº ( )q q = =1, 2, . . . , and 1, 2, . . . , Eq. 2 Variable Estimate Standard error P-value Intercept 54822 2661.66 <.0001 Total population 0.00201 0.0001 <.0001 Unemployment rate of age 25 and above -674.35 174.33 <.0001 Pump price -3541.08 522.25 <.0001 GDP per capita in 2013 dollars -0.140 0.039 0.0004 Median income -0.458 0.053 <.0001 R2 statistic 0.77 Table 7-5. Parameter estimates for the rural VMT model. Figure 7-3. Predicted rural VMT versus reported rural VMT, 2001â2012.
Modeling 49 The mean of the Poisson is structured as: it it it( )q = Âµ eexp Eq. 3 Where Âµit = a function of the covariates (X) (e.g., Âµit = exp(Î²0 + Î²1Xit1 + Î²2Xit2 + . . . + Î²pXitp) where p is the number of covariates); Î² = a vector of unknown coefficients; eit = the model error independent of all the covariates. It is usually assumed that exp(eit) is independent and gamma distributed with a mean equal to 1 and a variance 1/Ï for all i and t (here Ï is the inverse dispersion parameter, with Ï > 0). With this characteristic, it can be shown that Yit, conditional on Âµit and Ï, is distrib- uted as a Poisson-gamma random variable with a mean Âµit and a variance Âµit (1 + Âµit / Ï), respectively. In this modeling work, each year for every state was considered a distinct observation. As discussed by Lord and Persaud (2000), analyzing time-series or panel data in this manner can create temporal or serial correlation. Random effects models and those estimated using the generalized estimating equations can be used for handling serial correlation (Lord and Persaud 2000). However, after further investigation it was determined that the serial correlation had a minimal impact on the modeling results. To simplify the modeling effort, the models were estimated using the generalized linear models (GLMs). Two types of regression models were estimated: (1) models not controlling for state (MNCS) (i.e., no state fixed effect in the model), and (2) MCS (i.e., state is considered as a fixed effect in the model). Further, MCS models were developed using two different types of exposure: one with VMT as exposure and the other with population as exposure. For the MCS models, each state has its own intercept. For MNCS, the model tries to establish the general relationship between the number of fatalities and various risk factors without taking into account differ- ences between states that were not captured in the predictor variables. This model captured less of the total variation, but may better explain certain variables that have a direct influence on fatalities on a national scale, independent of each state. The MCS model type removes overall differences in fatality rates between states and focuses on accounting for how factors influence each stateâs fatality rate over time relative to the stateâs base rate. To accomplish this, each state had its own intercept, which captures the effect of variables that were unobserved as well as some that were observed. If the state-specific term was significantly different between states then there is variation between states that is not captured by the variables in the model. When model predictors differ between MNCS and MCS models, this indicates that the effect of a variable over time may be different than its effect between states. The latter is a more stable, long-term pattern, whereas the former is shorter-term and generally has a relative effect for residents of a given state. The functional form used for the MNCS is the following: e i i XiÂµ = Ã Î² +Î£ Î²VMT Eq. 40 Where Âµ = the estimated number of fatalities per year (for each state); VMT = the number of vehicle-miles traveled in millions (for each state); Xi = variable i; and Î²0, Î²i = estimated coefficients.
50 Identification of Factors Contributing to the Decline of Traffic Fatalities in the United States from 2008 to 2012 The functional form used for the MCS model is the following: e s i i XiÂµ = Ã ( )Î² +Î³ +Î£ Î²Model 1: VMT Eq. 50 e s i i XiÂµ = Ã ( )Î² +Î³ +Î£ Î²Model 2: POP Eq. 60 Where Î³s = state-specific parameter for state s; and POP = population in millions (for each state). As shown above, two measures of exposure were used. In addition, the exposure variables were used as an offset in the models. What this means in practice is that the dependent variable in the models, the variable that the models were attempting to predict, was a rate, either fatalities per VMT or fatalities per population, depending on whether VMT or population was used as the offset. The two measures of exposure were the VMT for each state and the population of the state. Both have advantages and disadvantages. For VMT, the variable is a more direct measure of the exposure to crashes of vehicles traveling on the road network. On the other hand, the VMT values themselves are estimated values and, like many estimates, the VMT values may be prone to errors or could be unreliable in some cases. The state population variable is usually better estimated, because it is based on decennial censuses with intercensal surveys, but it may not rep- resent the number of vehicles traveling on the network. Population is not immediately affected by economic conditions or differences in the amount of travel due to the physical size of states, population density, and other factors that may influence the amount of travel, as shown in the VMT models in Section 7.2.1. Noland and Sun (2014) preferred using population as the measure of exposure because they believe that the population estimates were generally more accurate than VMT estimates. However, the current study also estimated a model with VMT as exposure. All models consider the following variables in each state: 1. Rural VMT as a percentage of total VMT; 2. Capital expenditures spent per highway mile, excluding capital safety expenditures, in the prior year (2013 dollars); 3. Safety expenditures (capital expenditures related to safety, law enforcement, education, and HSIP obligations) per highway mile spent in the prior year (in 2013 dollars); 4. State GDP per capita; 5. Percent unemployment for 16 â24 year olds; 6. Total gasoline price (in 2013 dollars); 7. Beer consumption per capita; 8. DUI law rating; 9. Motorcycle helmet law rating; 10. Safety-belt law rating; 11. Safety-belt usage rates; 12. Median income (2013 dollars); and 13. Percent of the vehicle fleet with a model year of 1991 or newer. These variables represent different cells in the Haddon Matrix. Capital and safety spending fall into the Environment domain and affect both the pre-crash (e.g., spending on infrastructure to reduce crash occurrence) and crash levels (e.g., spending on infrastructure features that reduce the severity of crashes). Elements of the safety expenditure variable also fall into the Human domain at the pre-crash level, insofar as law enforcement and educational programs affect who drives and how they drive. State GDP/capita, percent unemployed, median income, fuel prices,
Modeling 51 and beer consumption all are located in the pre-crash level of the Human domain, since the presumed mechanisms by which they are linked to safety are to affect decisions to drive and how (riskiness) to drive. Belt use, state belt rating, and motorcycle helmet rating fall into the Human domain at the crash level, insofar as they reflect and constrain the choices individuals make to protect themselves in crashes. Similarly, the variable to capture fleet penetration of crashworthiness features captures the pre-crash and crash levels of the Vehicle domain of the Haddon Matrix. Post-crash levels of the Haddon Matrix could not be incorporated, as it was not possible to obtain comprehensive data series on the factors represented there, such as represent- ing advances in crash notification and post-crash medical care. The goodness-of-fit (GOF) of the models was assessed using the Akaike information criterion (AIC), and the error-based mean absolute deviation (MAD) and mean squared prediction error (MSPE). Additional information about how the GOF criteria work can be found in Lord and Park (2013). In general, the model with smaller AIC, MAD, and MSPE values is considered superior to other models. The modeling results are summarized further later in this chapter. In this section, the general characteristics of the results are discussed, including coefficient estimates, standard errors of the estimates, and p-values. P-values show the level of significance, in this case, the probability of obtaining the observed or more extreme results if the true value was 0, given the sample data. In observational studies, a p-value of 0.1 or below is often used as the threshold of statistical significance, meaning only a 10% (or less) chance of the obtained results if the true effect was 0, given the sample data. High p-values do not mean the parameter had no effect, just that the data were not sufficient to exclude that the parameter had no effect, at some reasonable level of confi- dence. With more data, a non-significant parameter may become significant. âNon-significantâ parameters were left in the models to measure the effect of all factors of interest, regardless of statistical significance. The detailed effects of each variable are discussed in Section 8.1. 126.96.36.199 MNCS Model Table 7-6 shows the modeling results for the MNCS model. As noted above, VMT was used as the exposure. For this model, three variables (DUI rating, belt rating, and motorcycle helmet rating) were found to be not statistically significant, even at the 10% level. 188.8.131.52 MCS Models Table 7-7 shows the modeling results for the MCS model with the VMT used as an exposure. For this model, four variables (capital spending, safety spending, pump price, and belt rating) were found to be not statistically significant, even at the 0.1 level. Table 7-8 shows the modeling results for the MCS model with population as an exposure. For this model, three variables (rural VMT proportion, safety spending, and pump price) were found to be not statistically significant, even at the 0.1 level. 184.108.40.206 Model Comparisons The GOF statistics presented from Table 7-6 to Table 7-8 show that the MCS model with VMT as exposure provides the best fit to the data. Figure 7-4 depicts how each model compared to the actual number of fatalities observed across the United States from 2007 to 2012. The model values were derived by running the parameter values for each state and summing them to the total found in the United States. The highest level of fidelity was provided by the MCS model with VMT as the exposure, which tracked fatalities quite closely. The MCS model with population as the exposure also tracked fatalities well, but did not reflect the continued decrease in fatalities in 2011 or the rise in fatalities in 2012. Given the underperformance when compared to the MCS
52 Identification of Factors Contributing to the Decline of Traffic Fatalities in the United States from 2008 to 2012 model using VMT, the population model will not be discussed in greater detail. Although popu- lation estimates may be generally more accurate than VMT estimates, as argued by Noland and Sun (2014), the results here show that VMT provides a better fit to the data. The MNCS model reflected the trend of crashes, but it is clear that the combined effects of the parameters did not capture all the variation in fatalities over the focus period. The model generally under-predicted the number of traffic fatalities, indicating that other influential factors were not accounted for in the model. However, this model is still valuable because it may be able to better explain certain variables that had a direct influence on fatalities on a national scale, independent of each state. 7.2.3 Model of Annual Change in Factor Levels on the Annual Change in Fatalities An alternative approach suggested by Elvik (2013) was also used. This approach, referred to as the âchange model,â translates the data into percent change from year to year within each state. The data begin with 2002, for which each independent and dependent variable is represented by its percent change compared to the previous year (Eq. 7): z x x t t t = â Eq. 7 1 Variable Estimate Standard error P-value Exponentiated parameter Intercept 11.7179 0.4082 <.0001 Rural VMT proportion 0.2228 0.0667 0.0008 1.250 Capital spending (in $1,000) 0.0009 0.0003 0.0007 1.001 Safety spending (in $1,000) â0.0033 0.0013 0.0124 0.997 GDP per capital (in $10,000) 0.024 0.0106 0.0241 1.024 Unemployment for age 16 to 24 (%) â0.0132 0.0025 <.0001 0.987 Pump price ($ per gallon) â0.0475 0.0258 0.0651 0.954 Beer (gallons) 0.2769 0.0425 <.0001 1.319 DUI rating â0.0032 0.0026 0.22 0.997 Belt rating 0.0008 0.007 0.9061 1.001 Motorcycle helmet rating 0.0065 0.0073 0.3732 1.007 Median income (in $10,000) â0.2149 0.0148 <.0001 0.807 Post-1991 (% of vehicles manufactured after 1991 in the fleet) â0.0138 0.0047 0.0033 0.986 Dispersion parameter 0.0277 0.0019 -- -- AIC* 6537 MAD* 93.07 MSPE* 21185.56 * Smaller values are preferred. Bold font denotes p-values <=0.1. Table 7-6. Parameter estimates for the MNCS model, VMT offset.
Modeling 53 Variable Estimate Standard error P-value Exponentiated parameter Intercept 10.6995 0.302 <.0001 Rural VMT proportion â0.1916 0.0972 0.0486 0.826 Capital spending (in $1,000 per mile) â0.0002 0.0002 0.2255 1.000 Safety spending (in $1,000 per mile) â0.0006 0.001 0.5176 0.999 GDP per capita (in $10,000) 0.046 0.0117 <.0001 1.047 Unemployment for age 16 to 24 (%) â0.0118 0.0015 <.0001 0.988 Pump price ($ per gallon) 0.0065 0.0125 0.6034 1.007 Beer per capita (gallons) 0.4022 0.0747 <.0001 1.495 DUI rating â0.0074 0.003 0.0119 0.993 Belt rating â0.0058 0.0073 0.4274 0.994 Motorcycle helmet rating â0.0347 0.0156 0.0261 0.966 Median income (in $10,000) 0.0375 0.0188 0.0459 1.038 Post-1991 (% of vehicles manufactured after 1991 in the fleet) â0.0177 0.0026 <.0001 0.982 Dispersion parameter 0.0025 0.0003 -- -- AIC* 5643 MAD* 35.80 MSPE* 3165.32 * Smaller values are preferred. Bold font denotes p-values <=0.1. Note: State fixed effect parameters are presented in Appendix C. Table 7-7. Parameter estimates for the MCS model with VMT as exposure. Where x = any original variable in the model; z = is the transformed change variable; xt = is the value of x in a given year; and xt-1 = is the value in the prior year. Taking the log to convert to a linear model, which is mathematically more straightforward (Eq. 8): ln y ln y z ln x ln xt t j t j t t j k j k ââ( ) ( ) ( ) ( )( )â = Î² + Î² = Î² + Î² ââ â == Eq. 81 0 0 1 11 Where yt = number of traffic fatalities in year t; yt-1 = number of traffic fatalities in year t-1; zt = transformed change variable; and Î²0, Î²j = estimated coefficients.
54 Identification of Factors Contributing to the Decline of Traffic Fatalities in the United States from 2008 to 2012 Variable Estimate Standard error P-value Exponentiated parameter Intercept 5.3846 0.3306 <.0001 Rural VMT proportion â0.0249 0.1068 0.8157 0.975 Capital spending (in $1,000 per mile) â0.0006 0.0002 0.0043 0.999 Safety spending (in $1,000 per mile) â0.0015 0.0011 0.156 0.999 GDP per capita (in $10,000) 0.1815 0.0124 <.0001 1.199 Unemployment for age 16 to 24 (%) â0.0122 0.0016 <.0001 0.988 Pump price ($ per gallon) 0.0039 0.0137 0.7759 1.004 Beer per capita (gallons) 0.2376 0.0813 0.0035 1.268 DUI rating â0.0154 0.0032 <.0001 0.985 Belt rating â0.0254 0.008 0.0015 0.975 Motorcycle helmet rating â0.0466 0.0174 0.0075 0.954 Median income (in $10,000) â0.0393 0.0205 0.0553 0.961 Post-1991 (% of vehicles manufactured after 1991 in the fleet) â0.0078 0.0028 0.006 0.992 Dispersion parameter 0.0034 0.0004 -- -- AIC* 5785 MAD* 38.16 MSPE* 3673.70 * Smaller values are preferred. Bold font denotes p-values <=0.1. Table 7-8. Parameter estimates for the MCS model with population as exposure. Figure 7-4. Model predictions versus actual fatalities, 2007â2012.
Modeling 55 When exponentiated, the coefficients (Î²s) in this model can be interpreted as multipliers on how the rate of change in a predictor influences the rate of change in fatalities (Eq. 9): y y e e x x t t i t tj k â= â Î² Î² â= Eq. 9 1 0 11 A key quality of the change model is that it removes overall differences between states on all variables. Large states may have larger numbers of fatalities, larger expenditures, and larger numbers of miles driven, but change in those states is proportional. Small states do, however, produce more volatile change values because of the smaller samples. The predictors in this model were slightly different from those in the count models, but they capture the same basic sources of variance and represent the same dimensions of the Haddon Matrix. The predictors were selected to include change in total VMT, change in proportion of rural VMT out of all VMT, change in pump price, change in GDP per capita, change in median income, change in unemployment for 16- to 24-year-olds, change in capital spending per mile (lagged one year), change in safety spending per mile (lagged one year), change in belt-use rate, change in DUI law rating, change in motorcycle helmet law rating, change in beer consumption, change in wine consumption, and change in the proportion of vehicles on the road with model year newer than 1991. Multiple linear regression was run using SAS 9.4 PROC GLM to predict change in fatalities based on change in the predictors. Parameter estimates are shown in Table 7-9, along with stan- dard errors of the estimates and p-values. Variable Estimate Standard error P-value Exponentiated parameter Intercept â0.011 0.008 0.1719 Total VMT chng 0.540 0.188 0.0042 1.716 Proportion rural VMT chng 0.022 0.061 0.7171 1.022 Pump price chng â0.024 0.040 0.5597 0.976 GDP per cap chng 0.128 0.067 0.0552 1.137 Median income chng 0.505 0.155 0.0012 1.657 16-24 Unemp chng â0.138 0.026 <.0001 0.871 Cap spend/mile (lag) chng â0.008 0.022 0.7084 0.992 Safety spend/mile (lag) chng 0.011 0.014 0.4071 1.011 Belt-use rate chng â0.051 0.122 0.6748 0.950 DUI law rating chng â0.181 0.093 0.0521 0.834 Motorcycle helmet law rating chng â0.013 0.100 0.8986 0.987 Beer consumption chng 0.170 0.141 0.2304 1.185 Wine consumption chng â0.029 0.087 0.7385 0.971 MY>1991 chng 0.057 0.550 0.9179 1.059 R2 0.168 Adj. R2 0.144 Bold font denotes p-values <=0.1. Table 7-9. Parameter estimates for change model.
56 Identification of Factors Contributing to the Decline of Traffic Fatalities in the United States from 2008 to 2012 Model diagnostics indicated that the assumptions of linear regression were generally met. However, overall R2 was relatively low with only 16.8% of the total variance accounted for. This indicates that additional processes beyond those included influenced the specific change observed. However, these processes are not likely to have been captured in any measures avail- able to the researchers. To interpret the coefficients, the effect of change in each predictor was observed over the period of interest: 2007â2011. Holding all other variables constant, the parameter estimates were used to calculate the percent change in fatalities associated with each predictor individually across the range of change seen in that time period. The results are in Table 7-10. Variable Parameter 2007 Mean 2011 Mean Percent change in predictor 2007â2011 Percent change in predicted fatalities 2007â2011 Intercept â0.011 Total VMT 0.540 3,031,124 2,962,740 â2.3% â1.2% Proportion rural VMT 0.022 0.33 0.32 â1.6% â0.1% Pump price change â0.024 3.11 3.20 2.6% â0.1% GDP per cap change 0.128 59,687 54,519 â7.5% â1.2% Median income change 0.505 56,081 53,621 â4.3% â2.2% 16-24 Unemp change â0.138 10.59 16.69 55.7% â6.1% Capital spend/mile (lag) change â0.008 73.69 81.27 7.9% â0.1% Safety spend/mile (lag) change 0.011 13.61 14.68 9.3% 0.1% Belt-use rate change â0.051 85.77 88.10 2.4% â0.1% DUI law rating change â0.181 19.77 20.50 4.0% â0.7% Motorcycle helmet law rating change â0.013 2.91 2.91 0.0% 0.0% Beer consumption change 0.170 1.21 1.15 â3.5% â0.7% Wine consumption change â0.029 0.37 0.39 5.0% â0.1% MY>1991 change 0.057 95.80 97.11 1.4% 0.1% Table 7-10. Effects of change-model predictors for 2007â2011.